Idempotency Keys in APIs: Retries, Duplicate Requests, and Exactly-Once Illusions

Any API that changes state will eventually receive the same request more than once. Mobile clients retry on flaky networks. Load balancers retry after connection resets. Users double-click submit buttons. Background workers crash after executing the work but before acknowledging the job. Payment providers call your webhooks repeatedly because they never received a 200 OK response.

Exactly-once execution in distributed systems is a mathematical myth. The network is fundamentally untrusted. Therefore, your application tier must make repeated processing of identical requests safe.

This playbook provides a comprehensive engineering guide to designing and implementing production-grade API idempotency.

System Requirements and Goals

To build a zero-downtime, fault-tolerant idempotency engine, we must define the scaling bounds and functional constraints.

Functional Requirements

Uniqueness Enforcement: Block the processing of duplicate transactions within a specific timeline.
Response Replay: Retrieve and return the exact HTTP status code, headers, and body of the first execution for all subsequent retry requests.
Payload Verification: Prevent key-hijacking by verifying that retried request bodies match the initial request payload.
Processing Synchronization: Block concurrent overlapping requests from executing simultaneously (prevent double-spending race conditions).

Non-Functional Requirements

Low Latency Overhead: The idempotency verification check must add less than 2ms to overall API execution times.
High Write Concurrency: Support thousands of concurrent writes to the idempotency storage layer without database deadlock.
Automatic Expiry (TTL): Automatically clean up expired keys after 7 days to keep storage size bounded.
High Availability: If the idempotency datastore goes offline, the system should gracefully degrade rather than crashing.

High-Level Design Architecture

To enforce idempotency safely, we intercept incoming requests using an API Gateway Filter or an Application Middleware layer before they reach transactional business logic handlers.

Here is the high-level request lifecycle showing how retries are caught and replayed:

sequenceDiagram
    autonumber
    participant Client as Client Device
    participant GW as API Gateway / Middleware
    participant Cache as Redis Cache (Idempotency Store)
    participant DB as Core database (PostgreSQL)

    Client->>GW: POST /v1/payments (Idempotency-Key: key_123)
    GW->>Cache: SETNX key_123 (status = PROCESSING)
    alt Key is NEW (Cache returns 1)
        GW->>DB: Execute Payment Transaction (Debit/Credit)
        DB-->>GW: Payment Completed (ID: pay_999)
        GW->>Cache: UPDATE key_123 (status = SUCCEEDED, response = {id: pay_999})
        GW-->>Client: HTTP 201 Created ({id: pay_999})
    else Key ALREADY EXISTS (Cache returns 0)
        GW->>Cache: GET key_123
        Cache-->>GW: Return status = SUCCEEDED, response = {id: pay_999}
        GW-->>Client: HTTP 201 Created (Replayed: true, {id: pay_999})
    end

In this architecture:

The API Gateway / Middleware intercepts requests and calculates the payload's cryptographic hash.
Redis (or a fast Key-Value store) tracks the active state of idempotency keys.
If the request is a duplicate, the middleware directly intercepts and replays the response, preventing downstream transaction databases from performing redundant operations.

API Design and Request flow

An idempotent API requires standard HTTP headers and explicit error definitions. Below is the request and response contract for a production system.

1. Initial Request (Success Path)

Endpoint: POST /v1/charges
Request Headers:
- Idempotency-Key: idemp_99aa-88bb-77cc
- Content-Type: application/json
Request Body:

{
  "account_id": "acc_user_44",
  "amount": 5000,
  "currency": "USD"
}

Response Headers:
- HTTP/1.1 201 Created
- Content-Type: application/json
Response Body:

{
  "charge_id": "chg_abc123xyz",
  "status": "succeeded",
  "amount": 5000,
  "created_at": "2026-05-22T17:32:00Z"
}

2. Repeated Request (Replayed Success Path)

If the client loses connection after the server finishes processing but before transmitting the response, the client retries with the same header and body:

Request Headers: Same as initial request.
Response Headers:
- HTTP/1.1 201 Created
- X-Cache-Lookup: HIT
- X-Idempotency-Replayed: true
Response Body: Identical to initial response.

3. Key Reuse with Different Payload (Error Path)

If a client attempts to reuse the same key for a different charge:

Request Body:

{
  "account_id": "acc_user_44",
  "amount": 10000,
  "currency": "USD"
}

Response Headers:
- HTTP/1.1 409 Conflict
Response Body:

{
  "error_code": "idempotency_payload_mismatch",
  "error_message": "The idempotency key is already registered for a different request payload."
}

Low-Level Design & Database Schema

To prevent race conditions, the storage engine must support atomic operations. If two concurrent threads attempt to insert the same key, only one must succeed.

1. Relational Database Schema (PostgreSQL)

When using a relational database to store idempotency states, a composite index and unique constraint act as the system lock:

CREATE TABLE idempotency_records (
    idempotency_key   VARCHAR(255) PRIMARY KEY,
    request_hash      CHAR(64) NOT NULL,          -- SHA-256 fingerprint of payload
    status            VARCHAR(30) NOT NULL,       -- PROCESSING, SUCCEEDED, FAILED
    response_code     SMALLINT,                   -- Replay HTTP status code
    response_headers  JSONB,                      -- Replay custom headers
    response_body     JSONB,                      -- Replay JSON payload
    expires_at        TIMESTAMPTZ NOT NULL,       -- TTL expiration timestamp
    created_at        TIMESTAMPTZ DEFAULT NOW(),
    updated_at        TIMESTAMPTZ DEFAULT NOW()
);

-- Index to clean up expired keys in batches
CREATE INDEX idx_idempotency_cleanup ON idempotency_records (expires_at) 
WHERE status IN ('SUCCEEDED', 'FAILED');

2. State Machine Transition Flow

Every idempotency record moves through a strict lifecycle to ensure concurrency safety:

stateDiagram-v2
    [*] --> NEW : Client Request Arrives
    NEW --> PROCESSING : ON CONFLICT DO NOTHING (Lock Acquired)
    PROCESSING --> SUCCEEDED : Business Transaction Completes Successfully
    PROCESSING --> RETRYABLE_FAILED : Transient Error (Release / Delete key)
    PROCESSING --> FAILED : Deterministic Error (Save bad response code & body)
    
    SUCCEEDED --> [*] : Return Replayed Response
    FAILED --> [*] : Return Replayed Error Response
    RETRYABLE_FAILED --> [*] : Key Deleted; Allowed to retry

3. Node/TypeScript Middleware Implementation

Below is a complete, production-ready TypeScript middleware implementing request hashing, locking, and replay logic using a unique PostgreSQL constraint:

import { Request, Response, NextFunction } from 'express';
import { createHash } from 'crypto';
import { Pool } from 'pg';

const dbPool = new Pool({ connectionString: process.env.DATABASE_URL });

export async function idempotencyFilter(req: Request, res: Response, next: NextFunction) {
  const key = req.headers['idempotency-key'];
  if (!key || typeof key !== 'string') {
    return res.status(400).json({ error: 'Idempotency-Key header is missing or invalid' });
  }

  // Calculate SHA-256 fingerprint of request body to detect key reuse
  const payloadString = req.body ? JSON.stringify(req.body) : '';
  const requestHash = createHash('sha256').update(payloadString).digest('hex');

  try {
    // Phase 1: Attempt to acquire the PROCESSING lock atomically
    const insertQuery = `
      INSERT INTO idempotency_records (idempotency_key, request_hash, status, expires_at)
      VALUES ($1, $2, 'PROCESSING', NOW() + INTERVAL '24 hours')
      ON CONFLICT (idempotency_key) DO NOTHING
      RETURNING status;
    `;
    const result = await dbPool.query(insertQuery, [key, requestHash]);
    const isNewKey = result.rowCount > 0;

    if (isNewKey) {
      // We won the lock. Intercept res.send to save the response when completed.
      const originalSend = res.send;
      res.send = function (body: any): Response {
        res.send = originalSend;
        
        // Save execution details asynchronously
        dbPool.query(`
          UPDATE idempotency_records
          SET status = $1, response_code = $2, response_body = $3, updated_at = NOW()
          WHERE idempotency_key = $4
        `, [res.statusCode >= 500 ? 'RETRYABLE_FAILED' : 'SUCCEEDED', res.statusCode, body, key])
        .catch(console.error);

        return originalSend.call(this, body);
      };
      
      return next();
    }

    // Phase 2: Key exists. Retrieve active status.
    const retrieveQuery = `
      SELECT status, request_hash, response_code, response_body 
      FROM idempotency_records 
      WHERE idempotency_key = $1;
    `;
    const recordRes = await dbPool.query(retrieveQuery, [key]);
    if (recordRes.rowCount === 0) {
      return res.status(500).json({ error: 'Lock tracking error occurred' });
    }

    const record = recordRes.rows[0];

    // Verify key hashing
    if (record.request_hash !== requestHash) {
      return res.status(409).json({
        error: 'idempotency_payload_mismatch',
        message: 'The idempotency key is already registered for a different request payload.'
      });
    }

    if (record.status === 'PROCESSING') {
      return res.status(409).json({
        error: 'request_in_progress',
        message: 'A duplicate request with this key is currently being processed. Please retry later.'
      });
    }

    if (record.status === 'SUCCEEDED' || record.status === 'FAILED') {
      res.setHeader('X-Idempotency-Replayed', 'true');
      return res.status(record.response_code).send(record.response_body);
    }

    // Key was released due to a transient failure. Clear and retry.
    await dbPool.query(`DELETE FROM idempotency_records WHERE idempotency_key = $1`, [key]);
    return res.status(503).json({ error: 'Previous request failed transiently. Please try again.' });

  } catch (err) {
    console.error('Idempotency middleware error', err);
    // Graceful degradation: let processing proceed if the idempotency layer is down
    return next();
  }
}

Scaling Challenges & Distributed Environments

In distributed cloud infrastructures running across multiple geo-regions, implementing idempotency keys introduces specific scaling bottlenecks.

1. The Distributed Lock Problem (Split-Brain)

If the database layer spans multiple read-write clusters (e.g. multi-primary replication) with asynchronous latency, two identical requests hitting separate servers in different regions could check and insert the key concurrently, leading to duplicate side effects. Scaling Strategy:

Centralized Cache Lock: Use Redis (with a Redis Cluster or Redlock algorithm) for rapid, synchronous distributed locking.
Route Key Affinity: Configure the global API load balancer to hash requests based on the Idempotency-Key header, routing duplicate retries to the exact same datacenter/region where the lock can be local and atomic.

2. High-Frequency Storage Bloat

At 100M users, processing 10 million transactions a day can bloat the idempotency database rapidly, dragging down index performance. Scaling Strategy:

Automatic TTL Expiry: Set a strict expire time (e.g. 24 hours for minor actions, 7 days for heavy banking ledgers).
Batch Deletions: Run cron-like background scripts to prune records in clean batches using a pagination strategy rather than large locking queries:

-- Batch delete clean pattern
DELETE FROM idempotency_records
WHERE idempotency_key IN (
  SELECT idempotency_key FROM idempotency_records 
  WHERE expires_at < NOW() 
  LIMIT 5000
);

Technical Trade-offs & Storage Design

Designing the idempotency store requires compromises between latency, durability, and operational cost:

storage Layer	Read/Write Latency	Durability guarantees	Operational cost	Split-Brain Risk
In-Memory Cache (Redis)	Extremely Low (<1ms)	Medium (Prone to data loss on node reboot)	Medium	Low (Single-threaded execution)
Relational DB (Postgres)	Medium (10ms - 20ms)	High (Fully ACID transaction compliant)	Low (Uses existing database)	Zero (Strong unique constraints)
NoSQL KV (DynamoDB)	Low (2ms - 5ms)	High (Decoupled regional auto-scaling)	High (Per-write pricing)	Low (Uses conditional expressions)

Failure Scenarios and Resilience Strategy

Your system must handle failure vectors within the idempotency layer itself.

1. Managing Transient vs. Deterministic Failures

If a downstream payment service crashes or a database socket times out, the API transaction rolls back. If the middleware records this as a permanent FAILED outcome, the user will be permanently blocked from retrying, even though the charge was never completed. Resilience Strategy:

Capture standard status codes. If a request throws a 5xx Server Error, standard downstream transactions roll back. The idempotency key must be released (deleted) from the database to allow subsequent attempts to succeed.
Only cache deterministic validation failures (e.g. 400 Bad Request or 422 Unprocessable Entity), where retry would always fail.

2. Idempotency DB Outages (Fail-Open vs. Fail-Closed)

What happens if the Redis/Postgres idempotency storage cluster goes offline?

Fail-Open: The system bypasses idempotency verification, allowing all requests to execute. Risk: Risk of double-charging users during active DB outages.
Fail-Closed (Recommended for Finance): Return an HTTP 503 error, instructing the client to wait. Benefit: Protects financial books from ledger drift.

Staff Engineer Perspective

Verbal Script & Mock Interview

Here is a mock systems design interview script for a Staff Software Engineer position:

Interviewer: "How do you design a robust, distributed idempotency engine for our high-throughput financial gateway?"

Candidate: "To build a highly available, bulletproof idempotency system, I would design a distributed check-and-set mechanism implemented as a Gateway Middleware Filter. This layer intercepts all mutating requests (like POST or PATCH) before they hit our transactional services.

First, the middleware enforces the presence of an Idempotency-Key header, validating it as a high-entropy string (like a UUIDv4). To prevent key hijacking or accidental collisions, I would generate a Request Fingerprint by computing a SHA-256 hash of the request body, excluding unstable fields like timestamp or trace ID.

When a request arrives, the middleware uses an atomic operations engine. For the common path, we execute an INSERT statement against our PostgreSQL metadata database, using a UNIQUE constraint on the idempotency_key column: INSERT ON CONFLICT DO NOTHING RETURNING status. If the insert is successful, we know this is a brand-new transaction. The status is initialized to PROCESSING. We proceed to execute the business logic, and when it succeeds, we update the record to SUCCEEDED, saving the HTTP status code and response body.

If the insert fails due to a unique constraint violation, we immediately fetch the active state. If the status is PROCESSING, we return an HTTP 409 Conflict indicating a duplicate execution is underway, which prevents double-spending. If the status is SUCCEEDED, we check the request fingerprint. If the hash matches exactly, we replay the cached response. If the hashes mismatch, we throw an HTTP 409 Conflict due to key reuse.

Finally, to handle failures gracefully, if the business transaction throws a transient 5xx server error, the middleware rolls back the database state and deletes the idempotency key, allowing the client to safely retry when the database recovers."

Interviewer: "What if our Redis cache or database goes down? Should we fail-open or fail-closed?"

Candidate: "This depends entirely on the domain. For a payment gateway processing money transfers, I would mandate a Fail-Closed design. The cost of double-spending or ledger discrepancies is far higher than temporary system unavailability, so we return a structured HTTP 503 during database outages. However, for non-critical resources like generating an export file or updating user preferences, we can safely Fail-Open, logging a warning but allowing the transaction to proceed without locking."

Production Readiness Checklist

Ensure the following criteria are checked off before deploying the idempotency engine to production:

Concurrency Safety: Atomic ON CONFLICT database constraints verified under high-load unit testing.
Fingerprint Verification: Payloads validated against SHA-256 request hashes to block key hijacking.
Transient Failures handling: 5xx and timeout exceptions verified to delete/release keys.
Expiration & TTL: Batched cleanup SQL scripts scheduled for a rolling 7-day window.
HTTP Headers Compliance: Standard X-Idempotency-Replayed headers successfully returned on cache hits.