Lesson 19 of 107 3 min

System Design: Designing a Global Distributed Rate Limiter

How to prevent service abuse at scale. A deep dive into Rate Limiting algorithms (Token Bucket, Sliding Window), Redis Lua scripting, and capacity math.

Reading Mode

Hide the curriculum rail and keep the lesson centered for focused reading.

Case Study: Design a Rate Limiter (Token Bucket / Leaky Bucket)

Mental Model

Connecting isolated components into a resilient, scalable, and observable distributed web.

Rate limiting is a critical component for protecting your services from abuse, intentional attacks (DDoS), or unintentional spikes in traffic (the "noisy neighbor" problem).

1. Why Rate Limiting?

  • Prevent Starvation: Stop one user from using all resources.
  • Cost Control: Many APIs (like OpenAI or Stripe) charge per request.
  • Security: Mitigate brute-force and DDoS attacks.

2. Common Algorithms

  • Token Bucket: Constant refill rate, allows bursts. (Best for general use).
  • Leaky Bucket: Constant output rate, smoothens traffic. (Best for consistent throughput).
  • Fixed Window: Simple, but suffers from "edge bursts."
  • Sliding Window Log/Counter: Precise but memory-heavy.

3. Distributed Rate Limiting (Redis)

In a cluster, you need a central place to store counts. Redis is the standard. Use Lua Scripting to perform "check-and-increment" atomically to avoid race conditions.

4. Implementation Checklist

  • Return HTTP 429 (Too Many Requests).
  • Include X-Ratelimit-Retry-After headers.
  • Choose the right bucket key (IP, UserID, or API Key).

Final Takeaway

Rate limiting is the "Shield" of your architecture. Without it, your system is vulnerable to the chaos of the open internet.

Technical Trade-offs: Database Choice

Model Consistency Latency Complexity Best Use Case
Relational (ACID) Strong High Medium Financial Ledgers, Transactions
NoSQL (Wide-Column) Eventual Low High Large-Scale Analytics, High Write Load
In-Memory Variable Ultra-Low Low Caching, Real-time Sessions

Key Takeaways

  • Prevent Starvation: Stop one user from using all resources.
  • Cost Control: Many APIs (like OpenAI or Stripe) charge per request.
  • Security: Mitigate brute-force and DDoS attacks.

Verbal Interview Script

Interviewer: "How would you ensure high availability and fault tolerance for this specific architecture?"

Candidate: "To achieve 'Five Nines' (99.999%) availability, we must eliminate all Single Points of Failure (SPOF). I would deploy the API Gateway and stateless microservices across multiple Availability Zones (AZs) behind an active-active load balancer. For the data layer, I would use asynchronous replication to a read-replica in a different region for disaster recovery. Furthermore, it's not enough to just deploy redundantly; we must protect the system from cascading failures. I would implement strict timeouts, retry mechanisms with exponential backoff and jitter, and Circuit Breakers (using a library like Resilience4j) on all synchronous network calls between microservices."

Want to track your progress?

Sign in to save your progress, track completed lessons, and pick up where you left off.