Case Study: Design a Rate Limiter (Token Bucket / Leaky Bucket)
Mental Model
Connecting isolated components into a resilient, scalable, and observable distributed web.
Rate limiting is a critical component for protecting your services from abuse, intentional attacks (DDoS), or unintentional spikes in traffic (the "noisy neighbor" problem).
1. Why Rate Limiting?
- Prevent Starvation: Stop one user from using all resources.
- Cost Control: Many APIs (like OpenAI or Stripe) charge per request.
- Security: Mitigate brute-force and DDoS attacks.
2. Common Algorithms
- Token Bucket: Constant refill rate, allows bursts. (Best for general use).
- Leaky Bucket: Constant output rate, smoothens traffic. (Best for consistent throughput).
- Fixed Window: Simple, but suffers from "edge bursts."
- Sliding Window Log/Counter: Precise but memory-heavy.
3. Distributed Rate Limiting (Redis)
In a cluster, you need a central place to store counts. Redis is the standard. Use Lua Scripting to perform "check-and-increment" atomically to avoid race conditions.
4. Implementation Checklist
- Return HTTP 429 (Too Many Requests).
- Include
X-Ratelimit-Retry-Afterheaders. - Choose the right bucket key (IP, UserID, or API Key).
Final Takeaway
Rate limiting is the "Shield" of your architecture. Without it, your system is vulnerable to the chaos of the open internet.
Technical Trade-offs: Database Choice
| Model | Consistency | Latency | Complexity | Best Use Case |
|---|---|---|---|---|
| Relational (ACID) | Strong | High | Medium | Financial Ledgers, Transactions |
| NoSQL (Wide-Column) | Eventual | Low | High | Large-Scale Analytics, High Write Load |
| In-Memory | Variable | Ultra-Low | Low | Caching, Real-time Sessions |
Key Takeaways
- Prevent Starvation: Stop one user from using all resources.
- Cost Control: Many APIs (like OpenAI or Stripe) charge per request.
- Security: Mitigate brute-force and DDoS attacks.
Read Next
Verbal Interview Script
Interviewer: "How would you ensure high availability and fault tolerance for this specific architecture?"
Candidate: "To achieve 'Five Nines' (99.999%) availability, we must eliminate all Single Points of Failure (SPOF). I would deploy the API Gateway and stateless microservices across multiple Availability Zones (AZs) behind an active-active load balancer. For the data layer, I would use asynchronous replication to a read-replica in a different region for disaster recovery. Furthermore, it's not enough to just deploy redundantly; we must protect the system from cascading failures. I would implement strict timeouts, retry mechanisms with exponential backoff and jitter, and Circuit Breakers (using a library like Resilience4j) on all synchronous network calls between microservices."