System Design: Designing Idempotent APIs
In a perfect world, network connections would never drop, databases would never deadlock, and clients would receive every API response instantly. In reality, distributed systems are highly unreliable. A client makes an API call to charge a credit card, the server successfully processes the payment, but the network connection drops before the HTTP response can be sent back.
Left in the dark, the client has no choice but to retry the request. Without an idempotency layer, this simple retry results in a catastrophic duplicate payment.
An operation is idempotent if it can be executed multiple times without changing the state of the system beyond the initial application. Designing APIs with built-in idempotency ensures that clients can safely retry failed operations, recovering gracefully from network partitions, hardware crashes, and timeouts.
Requirements and System Goals
To implement a resilient, industrial-grade idempotency layer, we must establish rigorous functional and operational boundaries:
Functional Requirements
- Exactly-Once Execution Semantics: Ensure that even if a client retries a transaction 10 times, the business transaction (e.g., payment, seat reservation, order creation) is executed exactly once.
- Original Response Cache: When a retried request with a matching idempotency key is received, the server must return the exact HTTP status code and response payload of the original successful execution, without executing any backend business logic.
- Payload Verification (Zero-Trust): If a retry request carries an existing idempotency key but the request payload has changed, the system must reject it immediately to prevent request tampering or key collision.
- Automatic Key Expiration: Idempotency keys must be configured with a configurable Time-to-Live (TTL) to allow the storage layer to clean up expired keys and maintain resource efficiency.
Non-Functional Requirements
- Sub-Millisecond Lookup Latency: The check-and-set idempotency lookup path must introduce negligible latency (less than 1.0 millisecond p99), utilizing in-memory cache solutions (e.g., Redis) to avoid hammering transactional databases.
- Atomic Concurrency Control: The checking of key existence and the acquisition of the execution lock must be strictly atomic to prevent race conditions from high-frequency concurrent retries (the "double-spend" vulnerability).
- High Read/Write Capacity: The system must gracefully handle up to 50 million operations per day, scaling the storage layer horizontally to prevent bottlenecks during high-traffic sales events.
- Decoupled Failure Propagation: A failure of the idempotency caching layer must degrade freshness rather than taking down core transaction services, failing open or closed based on configurable security profiles.
API Interfaces and Service Contracts
An elegant idempotency protocol is driven by client-specified request headers and standard HTTP response status codes.
graph TD
Client[Client App] -->|1. POST /v1/payments with Header: Idempotency-Key| Gateway[API Gateway / Interceptor]
Gateway -->|2. Check Key & Payload Match| IdemDB[(Idempotency Cache)]
Gateway -->|3. Lock & Process Payment| CorePay[Payment Processing Service]
The Idempotency Protocol
Clients signal their desire for idempotent processing by supplying a unique key in the HTTP request header:
Idempotency-Key: <UUIDv4>
The server responds using standard HTTP status codes to communicate the state of the idempotent process:
| HTTP Status Code | Meaning | Action for Client |
|---|---|---|
| 200 OK / 201 Created | Transaction completed successfully. Response payload is returned (either fresh or cached). | Success. Do not retry. |
| 409 Conflict | The request is currently being processed by another worker thread. A concurrent retry is disallowed. | Retry after a delay (e.g., using exponential backoff). |
| 422 Unprocessable Entity | A request with the same Idempotency-Key was received, but the request payload or HTTP parameters do not match the original. |
Do not retry. Generate a new key and update the payload. |
| 400 Bad Request | The idempotency key format is invalid or missing when strictly required. | Do not retry. Fix the request headers. |
HTTP Request Example
POST /v1/payments HTTP/1.1
Host: api.codesprintpro.com
Content-Type: application/json
Idempotency-Key: e3b0c442-98fc-1c14-9af1-000000000042
{
"amount_minor": 9999,
"currency": "USD",
"source_account_id": "acc_payment_01",
"destination_account_id": "acc_merchant_88"
}
HTTP Response Example (Cached Result)
When a cached response is returned, the server includes a custom header to inform the client that the transaction was not re-executed:
HTTP/1.1 200 OK
Content-Type: application/json
X-Cache-Idempotency: HIT
X-Original-Request-Date: 2026-06-01T11:45:00Z
{
"transaction_id": "tx_abc123xyz",
"status": "COMPLETED",
"amount_minor": 9999,
"currency": "USD",
"processed_at": "2026-06-01T11:45:01Z"
}
High-Level Design and Visualizations
An idempotency engine operates as a high-performance interceptor sitting upstream of your core business services. Below is the end-to-end routing flow for handling incoming requests:
Idempotent Request Interceptor Sequence
sequenceDiagram
autonumber
participant Client
participant Interceptor as Idempotency Interceptor
participant Redis as Redis Cache (Distributed Lock & Storage)
participant DB as Postgres Transactional Database
Client->>Interceptor: POST /v1/payments (Idempotency-Key: e3b0c442)
Note over Interceptor: Calculate cryptographic hash of request payload
Interceptor->>Redis: ATOMIC Lock & Lookup (e3b0c442)
alt State: KEY EXISTS (HIT)
Redis-->>Interceptor: Status: COMPLETED, Response: {amount: 9999, tx_id: tx_abc}
Note over Interceptor: Verify incoming payload hash matches cached hash
Interceptor-->>Client: HTTP 200 OK (X-Cache-Idempotency: HIT)
else State: CONCURRENT RUNNING (LOCKED)
Redis-->>Interceptor: Status: PROCESSING
Interceptor-->>Client: HTTP 409 Conflict (Currently executing)
else State: KEY NOT EXISTS (MISS)
Redis-->>Interceptor: Acquired lock, Set State: PROCESSING (TTL: 120s)
Interceptor->>DB: Begin Transaction & Process Business Logic
DB-->>Interceptor: Transaction successfully committed! (tx_id: tx_abc)
Interceptor->>Redis: Update State: COMPLETED, Store Response, Set TTL: 86400s (24h)
Interceptor-->>Client: HTTP 201 Created (X-Cache-Idempotency: MISS)
end
Client Retry Flow
If the TCP connection drops during step 12 above, the client will retry using the exact same Idempotency-Key. The request traverses the sequence again, hitting the "KEY EXISTS (HIT)" condition in step 3, returning the cached payload instantly without touching the Postgres database or triggering a duplicate payment.
Low-Level Design and Schema Strategies
To avoid race conditions where two concurrent requests with the same key attempt to write simultaneously, we must combine an in-memory high-speed cache (Redis) with a relational persistent store (Postgres).
Persistent Relational Schema (Postgres)
For high-value financial transactions, we maintain a permanent audit log of idempotency executions:
CREATE TABLE idempotency_records (
idempotency_key VARCHAR(255) PRIMARY KEY,
tenant_id VARCHAR(64) NOT NULL,
request_path VARCHAR(2048) NOT NULL,
request_hash VARCHAR(64) NOT NULL, -- SHA-256 hash of payload
status VARCHAR(32) NOT NULL, -- 'PROCESSING', 'COMPLETED', 'FAILED'
response_status INT NOT NULL, -- HTTP status code (e.g., 201, 400)
response_payload TEXT NOT NULL, -- Serialized JSON response
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
-- Index by tenant and date for regular operations and cleanup routines
CREATE INDEX idx_idempotency_tenant_date ON idempotency_records(tenant_id, created_at);
Atomic Check-and-Set Redis Lua Script
To achieve atomic lock acquisition and cache lookup in a single round-trip without race conditions, we execute a Lua script on the Redis cluster.
-- KEYS[1] = "idempotency:key:" .. key
-- ARGV[1] = request_hash (SHA-256)
-- ARGV[2] = lock_ttl_ms (e.g., 60000ms for 1 min lock)
-- ARGV[3] = completed_state (JSON structure with status, hash, and response)
local record = redis.call('GET', KEYS[1])
if record then
-- Key exists. Parse the stored JSON structure
local data = cjson.decode(record)
-- Verify payload hash to prevent key hijack collisions
if data.hash ~= ARGV[1] then
return { "PAYLOAD_MISMATCH" }
end
if data.status == "PROCESSING" then
return { "PROCESSING" }
else
return { "COMPLETED", data.response_status, data.response_payload }
end
else
-- Key does not exist. Atomically acquire lock and set state to PROCESSING
local lock_data = {
status = "PROCESSING",
hash = ARGV[1],
created_at = redis.call('TIME')[1]
}
redis.call('SET', KEYS[1], cjson.encode(lock_data), 'PX', ARGV[2])
return { "ACQUIRED" }
end
Completing the Transaction in Redis
Once the backend transaction succeeds, the worker thread updates the Redis record with a long-term TTL (e.g., 24 hours):
# Save response and set TTL to 86,400 seconds (24 hours)
SET idempotency:key:e3b0c442 '{"status":"COMPLETED","hash":"sha256hash...","response_status":201,"response_payload":"{...}"}' EX 86400
Scaling and Operational Challenges
Managing idempotency at a scale of 50 million requests per day requires careful capacity planning and architectural foresight.
Storage Capacity Back-of-the-Envelope Calculations
Let us estimate the hardware storage requirements for holding idempotency keys in Redis for 50 million requests/day over a 7-day retention period.
Let:
- $R$ = Total daily requests = $50,000,000$ requests/day.
- $T$ = Retention time = $7$ days.
- $S$ = Average size of an idempotency record in Redis.
- Unique key string: 64 bytes (
idempotency:key:uuidv4). - Stored value (Status, payload hash, HTTP status code, compressed HTTP response body): 1.5 KB.
- Total memory footprint per key in Redis (including hash overhead): $1.8 \text{ KB}$.
- Unique key string: 64 bytes (
The total active keys stored concurrently $K_{\text{total}}$ is:
$$K_{\text{total}} = 50,000,000 \times 7 = 350,000,000 \text{ keys}$$
The total RAM required to hold this active working set in Redis is:
$$\text{RAM}{\text{total}} = 350,000,000 \times 1.8 \text{ KB} = 630,000,000 \text{ KB}$$ $$\text{RAM}{\text{total}} = 630,000,000 \text{ KB} \approx 630 \text{ Gigabytes}$$
Accounting for a standard Redis cluster memory overhead safety margin of 30%:
$$\text{RAM}_{\text{provisioned}} = 630 \text{ GB} \times 1.3 = 819 \text{ Gigabytes of memory}$$
Scaling Redis Through Sharding
To comfortably host 819 GB of RAM and process a peak of 15,000 write IOPS during high-traffic intervals:
- We deploy a Redis Cluster with 16 shards, where each master node holds approximately 52 GB of RAM.
- We select consistent hashing on the key
{idempotency:key:e3b0c442}to distribute keys uniformly across shards, avoiding hotspots on a single node.
Trade-offs and Architectural Alternatives
When designing the storage and locking layer for idempotency, engineers must evaluate critical trade-offs:
| Dimension | Database Unique Constraint (Postgres) | Distributed Locking Layer (Redis/Memcached) |
|---|---|---|
| Lookup Latency | High (Requires a relational index lookup, which can take 5ms to 15ms under load). | Ultra-Low (Redis hash lookup executes in less than 1.0ms). |
| Strict Correctness | Perfect (Provides absolute ACID transaction isolation directly inside the business database). | Medium (Subject to edge-case lock expiration leaks if the backend process hangs too long). |
| Write Amplification | High (Writing to relational disk blocks increases disk IOPS limits rapidly). | Low (Highly efficient memory-based keys are pruned using native TTL expiration). |
| Implementation Complexity | Very Low (Utilizes UNIQUE(tenant_id, idempotency_key) constraints). |
High (Requires managing distributed state, connection pools, and Lua scripting pipelines). |
Failure Modes and Fault Tolerance Strategies
1. The Slow Processing Hang (Lock Expiration)
If a worker takes 65 seconds to process a payment, but the Redis lock TTL is set to 60 seconds, the lock will expire in Redis. A concurrent client retry at second 61 will acquire the lock again, spawning a duplicate transaction!
- Resolution Strategy (The Heartbeat Sentinel): During long-running executions, the background worker thread must run a dynamic "lock renewal daemon." Every 15 seconds, it updates the Redis lock's TTL (e.g., executing
EXPIRE key 60), ensuring the lock remains held until the thread exits.
2. Redis Cluster Total Outage
If the Redis cluster experiences a total crash, the API interceptor is blocked from checking keys.
- Resolution Strategy:
- Fail Open (Availability Focused): Bypass the cache, execute the request directly against the database, and rely on Postgres unique constraints.
- Fail Closed (Security Focused): Reject the request with an HTTP
503 Service Unavailable, forcing the client to retry later once the cache recovers. For financial ledgers, failing closed is the non-negotiable choice.
3. Key Collisions / Request Tampering
An attacker uses an existing Idempotency-Key from a previous successful payment, but swaps out the amount or destination account.
- Resolution Strategy: Calculate a SHA-256 hash of the complete incoming JSON payload. Store this hash alongside the key. If the incoming payload hash does not match the cached hash, reject the request with
422 Unprocessable Entityimmediately.
Staff Engineer Perspective
The "Double Spend" Window
Many senior engineers fail to understand that "checking if a key exists" and "creating the record" cannot be separated. Consider this standard Node.js/Java interceptor implementation:
// WARNING: HIGHLY FLAGGED AND DANGEROUS CODE!
const record = await db.getIdempotencyKey(key);
if (record) {
return record.response;
} else {
// A concurrent request can execute this line in parallel!
await paymentService.charge();
await db.saveIdempotencyKey(key);
}
Under heavy network jitter, a client sends Request A and Request B concurrently. Both threads execute getIdempotencyKey(key) at the exact same millisecond, read null, bypass the block, and double-charge the client.
To operate at a principal engineer standard, you must enforce atomicity. The lock acquisition must be tied to the check operation itself. In Redis, this is accomplished via the atomic SETNX (Set if Not Exists) command, and in SQL databases, it is executed via INSERT INTO ... ON CONFLICT DO NOTHING.
Managing Caching Payload Sizes
Caching response bodies is simple for small transactions, but it becomes a major memory drain if your API returns large JSON records (e.g., catalog outputs of 100 KB).
- Optimization: Calculate the size of the response payload. If it exceeds 10 KB, compress it using gzip or Brotli before saving it to Redis. This keeps memory footprint low and drastically reduces network bandwidth between the application nodes and the Redis cluster.
Verbal Script
Interviewer: "How would you design a highly reliable idempotency layer for a payment gateway, and how do you handle concurrency race conditions?"
Candidate:
"To design a robust, industrial-grade idempotency layer, I would implement a custom API Gateway interceptor backed by a high-speed Redis cluster for distributed locking and short-term caching, coupled with Postgres unique database constraints for permanent transaction audit logging.
When an incoming POST request arrives carrying an Idempotency-Key header, the interceptor calculates a SHA-256 hash of the request payload to ensure data integrity and prevent key hijack collisions.
To handle concurrency race conditions and avoid the double-spend vulnerability, the check-and-set operation must be absolutely atomic. I would execute a custom Redis Lua script that checks if the key exists. If the key exists:
- It verifies that the incoming request hash matches the cached hash.
- If it matches, it checks the status. If the status is
PROCESSING, it returns an HTTP409 Conflict, blocking concurrent parallel retries. If the status isCOMPLETED, it returns the cached HTTP response directly from Redis.
If the key does not exist, the Lua script atomically sets the status to PROCESSING with a 60-second TTL lock, allowing the worker thread to safely process the transaction.
Inside the database transaction, we write to a persistent idempotency_records audit table. If the process crashes mid-execution, the Redis lock naturally expires, allowing the client to safely retry. Once the transaction completes successfully, we update the Redis cache state to COMPLETED and cache the HTTP status code and response payload with a 24-hour TTL, ensuring fast response playback on subsequent client retries."