System Design: Designing Idempotent APIs for Reliable Services

System Design: Designing Idempotent APIs

In a perfect world, network connections would never drop, databases would never deadlock, and clients would receive every API response instantly. In reality, distributed systems are highly unreliable. A client makes an API call to charge a credit card, the server successfully processes the payment, but the network connection drops before the HTTP response can be sent back.

Left in the dark, the client has no choice but to retry the request. Without an idempotency layer, this simple retry results in a catastrophic duplicate payment.

An operation is idempotent if it can be executed multiple times without changing the state of the system beyond the initial application. Designing APIs with built-in idempotency ensures that clients can safely retry failed operations, recovering gracefully from network partitions, hardware crashes, and timeouts.

Requirements and System Goals

To implement a resilient, industrial-grade idempotency layer, we must establish rigorous functional and operational boundaries:

Functional Requirements

Exactly-Once Execution Semantics: Ensure that even if a client retries a transaction 10 times, the business transaction (e.g., payment, seat reservation, order creation) is executed exactly once.
Original Response Cache: When a retried request with a matching idempotency key is received, the server must return the exact HTTP status code and response payload of the original successful execution, without executing any backend business logic.
Payload Verification (Zero-Trust): If a retry request carries an existing idempotency key but the request payload has changed, the system must reject it immediately to prevent request tampering or key collision.
Automatic Key Expiration: Idempotency keys must be configured with a configurable Time-to-Live (TTL) to allow the storage layer to clean up expired keys and maintain resource efficiency.

Non-Functional Requirements

Sub-Millisecond Lookup Latency: The check-and-set idempotency lookup path must introduce negligible latency (less than 1.0 millisecond p99), utilizing in-memory cache solutions (e.g., Redis) to avoid hammering transactional databases.
Atomic Concurrency Control: The checking of key existence and the acquisition of the execution lock must be strictly atomic to prevent race conditions from high-frequency concurrent retries (the "double-spend" vulnerability).
High Read/Write Capacity: The system must gracefully handle up to 50 million operations per day, scaling the storage layer horizontally to prevent bottlenecks during high-traffic sales events.
Decoupled Failure Propagation: A failure of the idempotency caching layer must degrade freshness rather than taking down core transaction services, failing open or closed based on configurable security profiles.

API Interfaces and Service Contracts

An elegant idempotency protocol is driven by client-specified request headers and standard HTTP response status codes.

graph TD
    Client[Client App] -->|1. POST /v1/payments with Header: Idempotency-Key| Gateway[API Gateway / Interceptor]
    Gateway -->|2. Check Key & Payload Match| IdemDB[(Idempotency Cache)]
    Gateway -->|3. Lock & Process Payment| CorePay[Payment Processing Service]

The Idempotency Protocol

Clients signal their desire for idempotent processing by supplying a unique key in the HTTP request header:

Idempotency-Key: <UUIDv4>

The server responds using standard HTTP status codes to communicate the state of the idempotent process:

HTTP Status Code	Meaning	Action for Client
200 OK / 201 Created	Transaction completed successfully. Response payload is returned (either fresh or cached).	Success. Do not retry.
409 Conflict	The request is currently being processed by another worker thread. A concurrent retry is disallowed.	Retry after a delay (e.g., using exponential backoff).
422 Unprocessable Entity	A request with the same `Idempotency-Key` was received, but the request payload or HTTP parameters do not match the original.	Do not retry. Generate a new key and update the payload.
400 Bad Request	The idempotency key format is invalid or missing when strictly required.	Do not retry. Fix the request headers.

HTTP Request Example

POST /v1/payments HTTP/1.1
Host: api.codesprintpro.com
Content-Type: application/json
Idempotency-Key: e3b0c442-98fc-1c14-9af1-000000000042

{
  "amount_minor": 9999,
  "currency": "USD",
  "source_account_id": "acc_payment_01",
  "destination_account_id": "acc_merchant_88"
}

HTTP Response Example (Cached Result)

When a cached response is returned, the server includes a custom header to inform the client that the transaction was not re-executed:

HTTP/1.1 200 OK
Content-Type: application/json
X-Cache-Idempotency: HIT
X-Original-Request-Date: 2026-06-01T11:45:00Z

{
  "transaction_id": "tx_abc123xyz",
  "status": "COMPLETED",
  "amount_minor": 9999,
  "currency": "USD",
  "processed_at": "2026-06-01T11:45:01Z"
}

High-Level Design and Visualizations

An idempotency engine operates as a high-performance interceptor sitting upstream of your core business services. Below is the end-to-end routing flow for handling incoming requests:

Idempotent Request Interceptor Sequence

sequenceDiagram
    autonumber
    participant Client
    participant Interceptor as Idempotency Interceptor
    participant Redis as Redis Cache (Distributed Lock & Storage)
    participant DB as Postgres Transactional Database

    Client->>Interceptor: POST /v1/payments (Idempotency-Key: e3b0c442)
    Note over Interceptor: Calculate cryptographic hash of request payload
    Interceptor->>Redis: ATOMIC Lock & Lookup (e3b0c442)
    
    alt State: KEY EXISTS (HIT)
        Redis-->>Interceptor: Status: COMPLETED, Response: {amount: 9999, tx_id: tx_abc}
        Note over Interceptor: Verify incoming payload hash matches cached hash
        Interceptor-->>Client: HTTP 200 OK (X-Cache-Idempotency: HIT)
    else State: CONCURRENT RUNNING (LOCKED)
        Redis-->>Interceptor: Status: PROCESSING
        Interceptor-->>Client: HTTP 409 Conflict (Currently executing)
    else State: KEY NOT EXISTS (MISS)
        Redis-->>Interceptor: Acquired lock, Set State: PROCESSING (TTL: 120s)
        Interceptor->>DB: Begin Transaction & Process Business Logic
        DB-->>Interceptor: Transaction successfully committed! (tx_id: tx_abc)
        Interceptor->>Redis: Update State: COMPLETED, Store Response, Set TTL: 86400s (24h)
        Interceptor-->>Client: HTTP 201 Created (X-Cache-Idempotency: MISS)
    end

Client Retry Flow

If the TCP connection drops during step 12 above, the client will retry using the exact same Idempotency-Key. The request traverses the sequence again, hitting the "KEY EXISTS (HIT)" condition in step 3, returning the cached payload instantly without touching the Postgres database or triggering a duplicate payment.

Low-Level Design and Schema Strategies

To avoid race conditions where two concurrent requests with the same key attempt to write simultaneously, we must combine an in-memory high-speed cache (Redis) with a relational persistent store (Postgres).

Persistent Relational Schema (Postgres)

For high-value financial transactions, we maintain a permanent audit log of idempotency executions:

CREATE TABLE idempotency_records (
    idempotency_key VARCHAR(255) PRIMARY KEY,
    tenant_id VARCHAR(64) NOT NULL,
    request_path VARCHAR(2048) NOT NULL,
    request_hash VARCHAR(64) NOT NULL, -- SHA-256 hash of payload
    status VARCHAR(32) NOT NULL,       -- 'PROCESSING', 'COMPLETED', 'FAILED'
    response_status INT NOT NULL,      -- HTTP status code (e.g., 201, 400)
    response_payload TEXT NOT NULL,    -- Serialized JSON response
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- Index by tenant and date for regular operations and cleanup routines
CREATE INDEX idx_idempotency_tenant_date ON idempotency_records(tenant_id, created_at);

Atomic Check-and-Set Redis Lua Script

To achieve atomic lock acquisition and cache lookup in a single round-trip without race conditions, we execute a Lua script on the Redis cluster.

-- KEYS[1] = "idempotency:key:" .. key
-- ARGV[1] = request_hash (SHA-256)
-- ARGV[2] = lock_ttl_ms (e.g., 60000ms for 1 min lock)
-- ARGV[3] = completed_state (JSON structure with status, hash, and response)

local record = redis.call('GET', KEYS[1])

if record then
    -- Key exists. Parse the stored JSON structure
    local data = cjson.decode(record)
    
    -- Verify payload hash to prevent key hijack collisions
    if data.hash ~= ARGV[1] then
        return { "PAYLOAD_MISMATCH" }
    end
    
    if data.status == "PROCESSING" then
        return { "PROCESSING" }
    else
        return { "COMPLETED", data.response_status, data.response_payload }
    end
else
    -- Key does not exist. Atomically acquire lock and set state to PROCESSING
    local lock_data = {
        status = "PROCESSING",
        hash = ARGV[1],
        created_at = redis.call('TIME')[1]
    }
    redis.call('SET', KEYS[1], cjson.encode(lock_data), 'PX', ARGV[2])
    return { "ACQUIRED" }
end

Completing the Transaction in Redis

Once the backend transaction succeeds, the worker thread updates the Redis record with a long-term TTL (e.g., 24 hours):

# Save response and set TTL to 86,400 seconds (24 hours)
SET idempotency:key:e3b0c442 '{"status":"COMPLETED","hash":"sha256hash...","response_status":201,"response_payload":"{...}"}' EX 86400

Scaling and Operational Challenges

Managing idempotency at a scale of 50 million requests per day requires careful capacity planning and architectural foresight.

Storage Capacity Back-of-the-Envelope Calculations

Let us estimate the hardware storage requirements for holding idempotency keys in Redis for 50 million requests/day over a 7-day retention period.

Let:

$R$ = Total daily requests = $50,000,000$ requests/day.
$T$ = Retention time = $7$ days.
$S$ = Average size of an idempotency record in Redis.
- Unique key string: 64 bytes (idempotency:key:uuidv4).
- Stored value (Status, payload hash, HTTP status code, compressed HTTP response body): 1.5 KB.
- Total memory footprint per key in Redis (including hash overhead): $1.8 \text{ KB}$.

The total active keys stored concurrently $K_{\text{total}}$ is:

$$K_{\text{total}} = 50,000,000 \times 7 = 350,000,000 \text{ keys}$$

The total RAM required to hold this active working set in Redis is:

$$\text{RAM}{\text{total}} = 350,000,000 \times 1.8 \text{ KB} = 630,000,000 \text{ KB}$$ $$\text{RAM}{\text{total}} = 630,000,000 \text{ KB} \approx 630 \text{ Gigabytes}$$

Accounting for a standard Redis cluster memory overhead safety margin of 30%:

$$\text{RAM}_{\text{provisioned}} = 630 \text{ GB} \times 1.3 = 819 \text{ Gigabytes of memory}$$

Scaling Redis Through Sharding

To comfortably host 819 GB of RAM and process a peak of 15,000 write IOPS during high-traffic intervals:

We deploy a Redis Cluster with 16 shards, where each master node holds approximately 52 GB of RAM.
We select consistent hashing on the key {idempotency:key:e3b0c442} to distribute keys uniformly across shards, avoiding hotspots on a single node.

Trade-offs and Architectural Alternatives

When designing the storage and locking layer for idempotency, engineers must evaluate critical trade-offs:

Dimension	Database Unique Constraint (Postgres)	Distributed Locking Layer (Redis/Memcached)
Lookup Latency	High (Requires a relational index lookup, which can take 5ms to 15ms under load).	Ultra-Low (Redis hash lookup executes in less than 1.0ms).
Strict Correctness	Perfect (Provides absolute ACID transaction isolation directly inside the business database).	Medium (Subject to edge-case lock expiration leaks if the backend process hangs too long).
Write Amplification	High (Writing to relational disk blocks increases disk IOPS limits rapidly).	Low (Highly efficient memory-based keys are pruned using native TTL expiration).
Implementation Complexity	Very Low (Utilizes `UNIQUE(tenant_id, idempotency_key)` constraints).	High (Requires managing distributed state, connection pools, and Lua scripting pipelines).

Failure Modes and Fault Tolerance Strategies

1. The Slow Processing Hang (Lock Expiration)

If a worker takes 65 seconds to process a payment, but the Redis lock TTL is set to 60 seconds, the lock will expire in Redis. A concurrent client retry at second 61 will acquire the lock again, spawning a duplicate transaction!

Resolution Strategy (The Heartbeat Sentinel): During long-running executions, the background worker thread must run a dynamic "lock renewal daemon." Every 15 seconds, it updates the Redis lock's TTL (e.g., executing EXPIRE key 60), ensuring the lock remains held until the thread exits.

2. Redis Cluster Total Outage

If the Redis cluster experiences a total crash, the API interceptor is blocked from checking keys.

Resolution Strategy:
- Fail Open (Availability Focused): Bypass the cache, execute the request directly against the database, and rely on Postgres unique constraints.
- Fail Closed (Security Focused): Reject the request with an HTTP 503 Service Unavailable, forcing the client to retry later once the cache recovers. For financial ledgers, failing closed is the non-negotiable choice.

3. Key Collisions / Request Tampering

An attacker uses an existing Idempotency-Key from a previous successful payment, but swaps out the amount or destination account.

Resolution Strategy: Calculate a SHA-256 hash of the complete incoming JSON payload. Store this hash alongside the key. If the incoming payload hash does not match the cached hash, reject the request with 422 Unprocessable Entity immediately.

Staff Engineer Perspective

The "Double Spend" Window

Many senior engineers fail to understand that "checking if a key exists" and "creating the record" cannot be separated. Consider this standard Node.js/Java interceptor implementation:

// WARNING: HIGHLY FLAGGED AND DANGEROUS CODE!
const record = await db.getIdempotencyKey(key);
if (record) {
    return record.response;
} else {
    // A concurrent request can execute this line in parallel!
    await paymentService.charge();
    await db.saveIdempotencyKey(key);
}

Under heavy network jitter, a client sends Request A and Request B concurrently. Both threads execute getIdempotencyKey(key) at the exact same millisecond, read null, bypass the block, and double-charge the client.

To operate at a principal engineer standard, you must enforce atomicity. The lock acquisition must be tied to the check operation itself. In Redis, this is accomplished via the atomic SETNX (Set if Not Exists) command, and in SQL databases, it is executed via INSERT INTO ... ON CONFLICT DO NOTHING.

Managing Caching Payload Sizes

Caching response bodies is simple for small transactions, but it becomes a major memory drain if your API returns large JSON records (e.g., catalog outputs of 100 KB).

Optimization: Calculate the size of the response payload. If it exceeds 10 KB, compress it using gzip or Brotli before saving it to Redis. This keeps memory footprint low and drastically reduces network bandwidth between the application nodes and the Redis cluster.

Verbal Script

Interviewer: "How would you design a highly reliable idempotency layer for a payment gateway, and how do you handle concurrency race conditions?"

Candidate:

"To design a robust, industrial-grade idempotency layer, I would implement a custom API Gateway interceptor backed by a high-speed Redis cluster for distributed locking and short-term caching, coupled with Postgres unique database constraints for permanent transaction audit logging.

When an incoming POST request arrives carrying an Idempotency-Key header, the interceptor calculates a SHA-256 hash of the request payload to ensure data integrity and prevent key hijack collisions.

To handle concurrency race conditions and avoid the double-spend vulnerability, the check-and-set operation must be absolutely atomic. I would execute a custom Redis Lua script that checks if the key exists. If the key exists:

It verifies that the incoming request hash matches the cached hash.
If it matches, it checks the status. If the status is PROCESSING, it returns an HTTP 409 Conflict, blocking concurrent parallel retries. If the status is COMPLETED, it returns the cached HTTP response directly from Redis.

If the key does not exist, the Lua script atomically sets the status to PROCESSING with a 60-second TTL lock, allowing the worker thread to safely process the transaction.

Inside the database transaction, we write to a persistent idempotency_records audit table. If the process crashes mid-execution, the Redis lock naturally expires, allowing the client to safely retry. Once the transaction completes successfully, we update the Redis cache state to COMPLETED and cache the HTTP status code and response payload with a 24-hour TTL, ensuring fast response playback on subsequent client retries."

System Design: Designing Idempotent APIs for Reliable Services

Distributed systems mechanics for engineers building serious backend platforms.