Mental Model
Connecting isolated components into a resilient, scalable, and observable distributed web.
Distributed locking is a common pattern used to coordinate exclusive access to shared resources. However, in distributed systems, client nodes can experience unpredictable pauses—such as JVM Stop-the-World Garbage Collection loops, VM migrations, or network cuts. These pauses break lock leases, allowing multiple clients to access a shared storage resource simultaneously, leading to silent data corruption. Preventing this requires Fencing Tokens—validating lock state directly at the storage boundary.
1. Functional & Non-Functional Requirements
To establish a bulletproof locking system, we define these requirements:
Functional Requirements
- Exclusive Access: At most one client can write to a specific storage resource at any time.
- Lease Expiry: If a client crashes while holding a lock, the lock must eventually release automatically via a time-to-live (TTL) lease.
- Monotonic Validation: The storage layer must reject writes carrying outdated fencing keys, preventing out-of-order execution.
Non-Functional Requirements
- High Lock Throughput: The lock coordinator must support up to 5,000 lock acquisitions per second with sub-5ms latency.
- Storage Validation Overhead: The fencing token check at the database layer must add less than 1ms of overhead to write paths.
- Network Partition Safety: If the lock coordinator is partitioned, the storage layer must preserve consistency at the cost of write availability.
2. Interface Design & APIs
To coordinate locks, the lock manager and storage API exchange explicit lease payloads. Below is a structured JSON API payload returned by a lock coordinator (like ZooKeeper or Redis) upon lock acquisition, followed by the storage API endpoint payload used to process a validated write:
Lock Acquisition API Response (Lock Manager to Client)
{
"resource_id": "storage:customer-orders-bucket",
"lock_acquired": true,
"lock_token": "lock-lease-uuid-998811",
"fencing_token": 34,
"lease_duration_ms": 10000,
"acquired_at": "2026-05-23T10:00:00.123Z"
}
Storage Write API Payload (Client to Storage Service)
{
"resource_id": "storage:customer-orders-bucket",
"fencing_token": 34,
"write_payload": {
"file_path": "/uploads/orders-2026-05.csv",
"mutation_type": "APPEND",
"bytes": "T1JERVJfSUQsQU1PVU5U\n"
}
}
3. High-Level Design & Topology
The core vulnerability in distributed locking is the JVM GC Pause Lock Hazard (detailed by Martin Kleppmann).
1. The GC Pause Lock Hazard (No Fencing)
If Client 1 acquires a lock lease for 10 seconds and experiences a 15-second JVM Stop-the-World GC pause immediately after, its lease expires. Client 2 acquires the lock and writes to the shared storage. When Client 1 resumes, it continues its write under the false assumption that it still holds the lock, overwriting and corrupting Client 2's data.
sequenceDiagram
autonumber
participant Client1 as Client 1 (JVM)
participant LockMgr as Lock Manager
participant Client2 as Client 2
participant Storage as Shared Storage
Client1->>LockMgr: Acquire Lock (Lease=10s)
LockMgr-->>Client1: Lock Granted (fencing_token=33)
rect rgb(255, 235, 235)
Note over Client1: Client 1 hits JVM GC Pause! (Stop-the-World, 15s)
end
Note over LockMgr: Lease expires after 10s
Client2->>LockMgr: Acquire Lock
LockMgr-->>Client2: Lock Granted (fencing_token=34)
Client2->>Storage: Write File (fencing_token=34)
Storage-->>Client2: Write Succeeded
rect rgb(240, 248, 255)
Note over Client1: Client 1 resumes after GC!
end
Client1->>Storage: Write File (Outdated token 33, no check)
Note over Storage: Silent Corruption!<br/>Client 1 overwrites Client 2's changes.
2. Fencing Token Mitigation
With fencing tokens, every lock acquisition returns a monotonic counter. The storage layer stores the last processed token. When Client 1 resumes and attempts to write with token 33, the storage layer rejects the write because it has already processed token 34.
sequenceDiagram
autonumber
participant Client1 as Client 1
participant LockMgr as Lock Manager
participant Client2 as Client 2
participant Storage as Shared Storage (Fenced)
Client1->>LockMgr: Acquire Lock
LockMgr-->>Client1: Lock Granted (fencing_token=33)
Note over Client1: Client 1 enters GC Pause
Client2->>LockMgr: Acquire Lock
LockMgr-->>Client2: Lock Granted (fencing_token=34)
Client2->>Storage: Write (fencing_token=34)
Note over Storage: Storage stores last_token = 34
Storage-->>Client2: Write Succeeded
Note over Client1: Client 1 resumes after GC
Client1->>Storage: Write (fencing_token=33)
rect rgb(255, 235, 235)
Storage--xClient1: REJECT WRITE! (33 is less than last_token 34)
end
4. Low-Level Design & Data Models
To enforce fencing tokens, the storage layer must perform atomic checks. Below is a structured SQL DDL and a compilable Java class modeling a fenced storage manager. It validates incoming operations against the active database token using transactional SQL bounds:
Database Fencing Schema DDL
CREATE TABLE resource_records (
resource_id VARCHAR(100) PRIMARY KEY,
resource_data TEXT NOT NULL,
last_fencing_token BIGINT NOT NULL DEFAULT 0,
updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);
Fenced Storage Manager
package com.codesprintpro.locking;
import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
public class FencedStorageManager {
private final Connection connection;
public FencedStorageManager(Connection connection) {
this.connection = connection;
}
/**
* Attempts to write to a shared resource. Rejects mutations carrying
* a fencing token lower than the last successfully committed token.
*/
public boolean writeFenced(String resourceId, String data, long fencingToken) throws SQLException {
this.connection.setAutoCommit(false);
try {
// 1. SELECT FOR UPDATE to serialize checks on the resource record
String selectSql = "SELECT last_fencing_token FROM resource_records WHERE resource_id = ? FOR UPDATE";
long lastToken = -1;
try (PreparedStatement selectStmt = this.connection.prepareStatement(selectSql)) {
selectStmt.setString(1, resourceId);
try (ResultSet rs = selectStmt.executeQuery()) {
if (rs.next()) {
lastToken = rs.getLong("last_fencing_token");
}
}
}
// 2. Reject write if incoming fencing token is outdated
if (fencingToken <= lastToken) {
this.connection.rollback();
return false; // Write rejected (outdated token)
}
// 3. Perform write mutation and update the last fencing token record
String updateSql = "UPDATE resource_records SET resource_data = ?, last_fencing_token = ?, updated_at = NOW() WHERE resource_id = ?";
try (PreparedStatement updateStmt = this.connection.prepareStatement(updateSql)) {
updateStmt.setString(1, data);
updateStmt.setLong(2, fencingToken);
updateStmt.setString(3, resourceId);
updateStmt.executeUpdate();
}
this.connection.commit();
return true; // Write succeeded
} catch (SQLException e) {
this.connection.rollback();
throw e;
} finally {
this.connection.setAutoCommit(true);
}
}
}
5. Scaling Bottlenecks & Mitigations
Enforcing fencing tokens shifts operational limits directly onto the storage systems:
1. Database Serialization Saturation
Because the storage layer must perform a read-before-write check (FOR UPDATE) on the fencing token, every write operation turns into a serialized database transaction, capping maximum update throughput.
- Mitigation: Implement Optimistic Concurrency Control (OCC) instead of pessimistic locks. Incorporate the fencing token directly into the SQL
WHEREclause:UPDATE resource_records SET data = ?, last_token = ? WHERE id = ? AND ? > last_token, bypassing the need for separate select locks.
2. Lock Manager Network Hop Overhead
Querying the lock coordinator for every single write adds extra RTT hops, degrading application response times.
- Mitigation: Implement Lease Buffering. Clients cache the lock lease locally for a fraction of the duration, ensuring they only refresh the lease in background loops before it expires, reducing network latency.
6. Strategic Trade-offs & Alternatives
Different coordination strategies present distinct strategic costs:
| Strategy | Write Performance | Correctness Bounds | Coordination Cost | Implementation Complexity |
|---|---|---|---|---|
| No Fencing (Standard Redlock) | High (Client-side time only) | Low (Vulnerable to GC pauses) | None | Low |
| Monotonic Fencing Tokens | Medium (Validates at storage) | Absolute (Correctness guaranteed) | High (Requires database coordination) | Medium |
| ZooKeeper Ephemeral Nodes | High (High consensus speeds) | High | Medium (Consensus heartbeat) | High (Session management is complex) |
| DynamoDB Conditional Writes | Extremely High | Absolute | Low (Native OCC support) | Low |
7. Failure Scenarios & Resiliency
Fencing locks must be designed to survive catastrophic infrastructure breaks:
Scenario A: Network Partition in Lock Coordinator
If the lock manager is partitioned, it might grant the same lock to Client 1 and Client 2 on different sides of the network slice.
- Resiliency Mitigation: The storage layer acts as the absolute arbiter of truth. Even if the partitioned lock coordinator makes a mistake and issues duplicate locks, the storage layer's monotonic fencing check will reject the lower token write, preserving consistency.
Scenario B: Clock Skew on Lock TTLs
If the lock manager uses physical system clocks to expire leases (e.g. System.currentTimeMillis() + TTL), a clock drift on the coordinator will expire leases early or late, causing overlapping lock leases.
- Resiliency Mitigation: Utilize Monotonic Logical Timers (like
System.nanoTime()) on the coordinator for calculating lease expirations, which are immune to physical system clock drift.
8. Staff Engineer Perspective
9. Mock Interview Dialogue
Verbal Interview Script
Interviewer: "Why is distributed locking with simple timeouts/leases vulnerable to data corruption, and how do fencing tokens resolve this?"
Candidate: "A simple lease has a time-to-live. The client acquires the lock, assumes it has say 10 seconds of safe processing time, and starts writing to storage. However, if that client experiences a JVM Stop-the-World garbage collection pause, its execution thread stops completely. If the pause lasts 12 seconds, the lock lease expires on the coordinator. Another client can then acquire the lock and write to storage. When the paused client resumes, it has no native way of knowing its lease expired and continues its write, corrupting the other client's updates. Fencing tokens mitigate this by returning a monotonically increasing number with every lock lease. The storage layer stores the token of the last successful write and rejects any subsequent write with a lower token."
Interviewer: "Excellent. Does this mean we do not need the distributed lock at all, and can just rely on the storage layer conditional writes?"
Candidate: "If the storage layer natively supports conditional writes, the lock manager can indeed be bypassed, which simplifies our architecture. However, the lock manager is still highly valuable. It acts as an optimization layer—it shields the database from massive concurrent write collisions. If 1,000 workers attempt to write to the same database row simultaneously,conditional writes will reject 999 of them, wasting CPU and DB resources. The lock manager acts as a gatekeeper, ensuring only one client executes the expensive write path at a time."
