Introduction to Distributed Locking
Mental Model
Connecting isolated components into a resilient, scalable, and observable distributed web.
graph TD
App[Application Server] -->|Read Request| Cache[(Redis Cache)]
Cache -- Cache Miss --> DB[(Primary Database)]
DB -- Return Data --> App
App -- Write Data --> Cache
In a single-threaded process, we use a mutex. In a multi-threaded application, we use synchronized blocks or locks. But in a distributed system with hundreds of nodes, how do you ensure that only one process performs a specific task at any given time?
Whether it's preventing double-billing, ensuring only one worker is crawling a specific site, or managing exclusive access to a shared resource, you need a Distributed Lock Service.
Designing one is a classic FAANG interview question because it tests your understanding of Consistency, Liveness, and Clock Drifts.
The Problem: Why is this hard?
A distributed lock isn't just a "key in Redis." In a distributed environment, several things can go wrong:
- Network Partition: The process holding the lock is separated from the lock service.
- Process Pauses: A JVM Garbage Collection (GC) pause or an OS context switch happens after the process acquires the lock but before it performs the action.
- Clock Drift: The time on the lock service and the client are not perfectly synchronized.
The Requirements
Functional
- Acquire(LockName, Timeout): Returns a
LockIDor fails. - Release(LockID): Frees the lock.
- Renew(LockID, ExtensionTime): Extends the lease.
Non-Functional
- Consistency (Safety): Only one process can hold the lock at a time. This is non-negotiable.
- Liveness: If a process crashes while holding a lock, the lock must eventually be released (Lease-based).
- High Availability: The lock service itself must not be a single point of failure.
High-Level Architecture
A production-grade lock service (like Google's Chubby or Apache Zookeeper) is usually built on a Consensus Algorithm (Paxos or Raft).
1. The Storage Layer (Replicated State Machine)
The lock state is replicated across 3 or 5 nodes. A client request is only successful if a majority (quorum) of nodes agree. This ensures that even if one node fails, the lock state is preserved and consistent.
2. The Lease Mechanism
We never grant a lock indefinitely. Every lock has a Time-To-Live (TTL).
- If the client doesn't renew the lock before the TTL expires, the service assumes the client has crashed and releases the lock.
The "GC Pause" Death Trap
Imagine this scenario:
- Process A acquires a lock with a 10s TTL.
- Process A starts a "Stop the World" GC pause that lasts 15s.
- The lock expires at 10s. The Lock Service grants the lock to Process B.
- Process A wakes up at 15s. It thinks it still has the lock and writes to the database.
- Process B is also writing to the database.
- Data Corruption occurs.
The Fix: Fencing Tokens
To solve this, the lock service must return a monotonically increasing version number (a token) with every lock acquisition.
- When the process writes to the database, it includes the token.
- The database (or the storage layer) checks if the token is the highest it has seen. If Process B has already written with a higher token, Process A's write is rejected.
Redis Redlock vs. Zookeeper
In interviews, you must know the trade-offs between a CP system (Zookeeper) and an AP system (Redis).
Zookeeper (The Safe Choice)
Zookeeper uses ephemeral nodes and a sequence number. If the client loses its session, the node is automatically deleted. It is Strongly Consistent.
- Pros: Safety guarantees are built-in.
- Cons: Higher latency for writes due to consensus.
Redis Redlock (The AP Choice)
Redlock tries to achieve safety by acquiring locks from multiple independent Redis instances.
- The Controversy: Distributed systems expert Martin Kleppmann famously argued that Redlock is not safe because it relies on synchronized system clocks, which can drift or jump.
- Senior Verdict: Use Redis for performance/caching where occasional "double-processing" is acceptable. Use Zookeeper/Etcd for critical resources where safety is paramount.
Scalability and Performance
To handle millions of lock requests:
- Hierarchical Locks: Store locks in a tree structure (like Chubby).
- Client-Side Caching: Chubby allows clients to cache the "Lock Not Held" state to reduce load on the server.
- Proxy Layer: Use a proxy to batch lock requests from multiple clients on the same machine.
Interview Script
"To design a Distributed Lock Service, I would prioritize Safety and Liveness. I'd use a consensus-based approach like Raft to ensure the lock state is consistent across a quorum of nodes. I'd implement a lease-based mechanism to prevent deadlocks if a client crashes. Crucially, to handle process pauses like GC, I would implement Fencing Tokens, where every lock comes with a monotonically increasing ID that the storage layer validates before any write operation. While Redis Redlock is an option for high-throughput, low-criticality tasks, for a FAANG-scale production system, I'd lean towards a CP system like Zookeeper for its stronger safety guarantees."
Technical Trade-offs: Database Choice
| Model | Consistency | Latency | Complexity | Best Use Case |
|---|---|---|---|---|
| Relational (ACID) | Strong | High | Medium | Financial Ledgers, Transactions |
| NoSQL (Wide-Column) | Eventual | Low | High | Large-Scale Analytics, High Write Load |
| In-Memory | Variable | Ultra-Low | Low | Caching, Real-time Sessions |
Key Takeaways
- ****Acquire(LockName, Timeout): Returns a
LockIDor fails. - ****Release(LockID): Frees the lock.
- ****Renew(LockID, ExtensionTime): Extends the lease.
Production Readiness Checklist
Before deploying this architecture to a production environment, ensure the following Staff-level criteria are met:
- High Availability: Have we eliminated single points of failure across all layers?
- Observability: Are we exporting structured JSON logs, custom Prometheus metrics, and OpenTelemetry traces?
- Circuit Breaking: Do all synchronous service-to-service calls have timeouts and fallbacks (e.g., via Resilience4j)?
- Idempotency: Can our APIs handle retries safely without causing duplicate side effects?
- Backpressure: Does the system gracefully degrade or return HTTP 429 when resources are saturated?