Any API that changes state will eventually receive the same request more than once. Mobile clients retry on flaky networks. Load balancers retry after connection resets. Users double-click submit buttons. Background workers crash after executing the work but before acknowledging the job. Payment providers call your webhooks repeatedly because they never received a 200 OK response.
Exactly-once execution in distributed systems is a mathematical myth. The network is fundamentally untrusted. Therefore, your application tier must make repeated processing of identical requests safe.
This playbook provides a comprehensive engineering guide to designing and implementing production-grade API idempotency.
System Requirements and Goals
To build a zero-downtime, fault-tolerant idempotency engine, we must define the scaling bounds and functional constraints.
Functional Requirements
- Uniqueness Enforcement: Block the processing of duplicate transactions within a specific timeline.
- Response Replay: Retrieve and return the exact HTTP status code, headers, and body of the first execution for all subsequent retry requests.
- Payload Verification: Prevent key-hijacking by verifying that retried request bodies match the initial request payload.
- Processing Synchronization: Block concurrent overlapping requests from executing simultaneously (prevent double-spending race conditions).
Non-Functional Requirements
- Low Latency Overhead: The idempotency verification check must add less than 2ms to overall API execution times.
- High Write Concurrency: Support thousands of concurrent writes to the idempotency storage layer without database deadlock.
- Automatic Expiry (TTL): Automatically clean up expired keys after 7 days to keep storage size bounded.
- High Availability: If the idempotency datastore goes offline, the system should gracefully degrade rather than crashing.
High-Level Design Architecture
To enforce idempotency safely, we intercept incoming requests using an API Gateway Filter or an Application Middleware layer before they reach transactional business logic handlers.
Here is the high-level request lifecycle showing how retries are caught and replayed:
sequenceDiagram
autonumber
participant Client as Client Device
participant GW as API Gateway / Middleware
participant Cache as Redis Cache (Idempotency Store)
participant DB as Core database (PostgreSQL)
Client->>GW: POST /v1/payments (Idempotency-Key: key_123)
GW->>Cache: SETNX key_123 (status = PROCESSING)
alt Key is NEW (Cache returns 1)
GW->>DB: Execute Payment Transaction (Debit/Credit)
DB-->>GW: Payment Completed (ID: pay_999)
GW->>Cache: UPDATE key_123 (status = SUCCEEDED, response = {id: pay_999})
GW-->>Client: HTTP 201 Created ({id: pay_999})
else Key ALREADY EXISTS (Cache returns 0)
GW->>Cache: GET key_123
Cache-->>GW: Return status = SUCCEEDED, response = {id: pay_999}
GW-->>Client: HTTP 201 Created (Replayed: true, {id: pay_999})
end
In this architecture:
- The API Gateway / Middleware intercepts requests and calculates the payload's cryptographic hash.
- Redis (or a fast Key-Value store) tracks the active state of idempotency keys.
- If the request is a duplicate, the middleware directly intercepts and replays the response, preventing downstream transaction databases from performing redundant operations.
API Design and Request flow
An idempotent API requires standard HTTP headers and explicit error definitions. Below is the request and response contract for a production system.
1. Initial Request (Success Path)
- Endpoint:
POST /v1/charges - Request Headers:
Idempotency-Key: idemp_99aa-88bb-77ccContent-Type: application/json
- Request Body:
{
"account_id": "acc_user_44",
"amount": 5000,
"currency": "USD"
}
- Response Headers:
HTTP/1.1 201 CreatedContent-Type: application/json
- Response Body:
{
"charge_id": "chg_abc123xyz",
"status": "succeeded",
"amount": 5000,
"created_at": "2026-05-22T17:32:00Z"
}
2. Repeated Request (Replayed Success Path)
If the client loses connection after the server finishes processing but before transmitting the response, the client retries with the same header and body:
- Request Headers: Same as initial request.
- Response Headers:
HTTP/1.1 201 CreatedX-Cache-Lookup: HITX-Idempotency-Replayed: true
- Response Body: Identical to initial response.
3. Key Reuse with Different Payload (Error Path)
If a client attempts to reuse the same key for a different charge:
- Request Body:
{
"account_id": "acc_user_44",
"amount": 10000,
"currency": "USD"
}
- Response Headers:
HTTP/1.1 409 Conflict
- Response Body:
{
"error_code": "idempotency_payload_mismatch",
"error_message": "The idempotency key is already registered for a different request payload."
}
Low-Level Design & Database Schema
To prevent race conditions, the storage engine must support atomic operations. If two concurrent threads attempt to insert the same key, only one must succeed.
1. Relational Database Schema (PostgreSQL)
When using a relational database to store idempotency states, a composite index and unique constraint act as the system lock:
CREATE TABLE idempotency_records (
idempotency_key VARCHAR(255) PRIMARY KEY,
request_hash CHAR(64) NOT NULL, -- SHA-256 fingerprint of payload
status VARCHAR(30) NOT NULL, -- PROCESSING, SUCCEEDED, FAILED
response_code SMALLINT, -- Replay HTTP status code
response_headers JSONB, -- Replay custom headers
response_body JSONB, -- Replay JSON payload
expires_at TIMESTAMPTZ NOT NULL, -- TTL expiration timestamp
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
);
-- Index to clean up expired keys in batches
CREATE INDEX idx_idempotency_cleanup ON idempotency_records (expires_at)
WHERE status IN ('SUCCEEDED', 'FAILED');
2. State Machine Transition Flow
Every idempotency record moves through a strict lifecycle to ensure concurrency safety:
stateDiagram-v2
[*] --> NEW : Client Request Arrives
NEW --> PROCESSING : ON CONFLICT DO NOTHING (Lock Acquired)
PROCESSING --> SUCCEEDED : Business Transaction Completes Successfully
PROCESSING --> RETRYABLE_FAILED : Transient Error (Release / Delete key)
PROCESSING --> FAILED : Deterministic Error (Save bad response code & body)
SUCCEEDED --> [*] : Return Replayed Response
FAILED --> [*] : Return Replayed Error Response
RETRYABLE_FAILED --> [*] : Key Deleted; Allowed to retry
3. Node/TypeScript Middleware Implementation
Below is a complete, production-ready TypeScript middleware implementing request hashing, locking, and replay logic using a unique PostgreSQL constraint:
import { Request, Response, NextFunction } from 'express';
import { createHash } from 'crypto';
import { Pool } from 'pg';
const dbPool = new Pool({ connectionString: process.env.DATABASE_URL });
export async function idempotencyFilter(req: Request, res: Response, next: NextFunction) {
const key = req.headers['idempotency-key'];
if (!key || typeof key !== 'string') {
return res.status(400).json({ error: 'Idempotency-Key header is missing or invalid' });
}
// Calculate SHA-256 fingerprint of request body to detect key reuse
const payloadString = req.body ? JSON.stringify(req.body) : '';
const requestHash = createHash('sha256').update(payloadString).digest('hex');
try {
// Phase 1: Attempt to acquire the PROCESSING lock atomically
const insertQuery = `
INSERT INTO idempotency_records (idempotency_key, request_hash, status, expires_at)
VALUES ($1, $2, 'PROCESSING', NOW() + INTERVAL '24 hours')
ON CONFLICT (idempotency_key) DO NOTHING
RETURNING status;
`;
const result = await dbPool.query(insertQuery, [key, requestHash]);
const isNewKey = result.rowCount > 0;
if (isNewKey) {
// We won the lock. Intercept res.send to save the response when completed.
const originalSend = res.send;
res.send = function (body: any): Response {
res.send = originalSend;
// Save execution details asynchronously
dbPool.query(`
UPDATE idempotency_records
SET status = $1, response_code = $2, response_body = $3, updated_at = NOW()
WHERE idempotency_key = $4
`, [res.statusCode >= 500 ? 'RETRYABLE_FAILED' : 'SUCCEEDED', res.statusCode, body, key])
.catch(console.error);
return originalSend.call(this, body);
};
return next();
}
// Phase 2: Key exists. Retrieve active status.
const retrieveQuery = `
SELECT status, request_hash, response_code, response_body
FROM idempotency_records
WHERE idempotency_key = $1;
`;
const recordRes = await dbPool.query(retrieveQuery, [key]);
if (recordRes.rowCount === 0) {
return res.status(500).json({ error: 'Lock tracking error occurred' });
}
const record = recordRes.rows[0];
// Verify key hashing
if (record.request_hash !== requestHash) {
return res.status(409).json({
error: 'idempotency_payload_mismatch',
message: 'The idempotency key is already registered for a different request payload.'
});
}
if (record.status === 'PROCESSING') {
return res.status(409).json({
error: 'request_in_progress',
message: 'A duplicate request with this key is currently being processed. Please retry later.'
});
}
if (record.status === 'SUCCEEDED' || record.status === 'FAILED') {
res.setHeader('X-Idempotency-Replayed', 'true');
return res.status(record.response_code).send(record.response_body);
}
// Key was released due to a transient failure. Clear and retry.
await dbPool.query(`DELETE FROM idempotency_records WHERE idempotency_key = $1`, [key]);
return res.status(503).json({ error: 'Previous request failed transiently. Please try again.' });
} catch (err) {
console.error('Idempotency middleware error', err);
// Graceful degradation: let processing proceed if the idempotency layer is down
return next();
}
}
Scaling Challenges & Distributed Environments
In distributed cloud infrastructures running across multiple geo-regions, implementing idempotency keys introduces specific scaling bottlenecks.
1. The Distributed Lock Problem (Split-Brain)
If the database layer spans multiple read-write clusters (e.g. multi-primary replication) with asynchronous latency, two identical requests hitting separate servers in different regions could check and insert the key concurrently, leading to duplicate side effects. Scaling Strategy:
- Centralized Cache Lock: Use Redis (with a Redis Cluster or Redlock algorithm) for rapid, synchronous distributed locking.
- Route Key Affinity: Configure the global API load balancer to hash requests based on the
Idempotency-Keyheader, routing duplicate retries to the exact same datacenter/region where the lock can be local and atomic.
2. High-Frequency Storage Bloat
At 100M users, processing 10 million transactions a day can bloat the idempotency database rapidly, dragging down index performance. Scaling Strategy:
- Automatic TTL Expiry: Set a strict expire time (e.g. 24 hours for minor actions, 7 days for heavy banking ledgers).
- Batch Deletions: Run cron-like background scripts to prune records in clean batches using a pagination strategy rather than large locking queries:
-- Batch delete clean pattern
DELETE FROM idempotency_records
WHERE idempotency_key IN (
SELECT idempotency_key FROM idempotency_records
WHERE expires_at < NOW()
LIMIT 5000
);
Technical Trade-offs & Storage Design
Designing the idempotency store requires compromises between latency, durability, and operational cost:
| storage Layer | Read/Write Latency | Durability guarantees | Operational cost | Split-Brain Risk |
|---|---|---|---|---|
| In-Memory Cache (Redis) | Extremely Low (<1ms) | Medium (Prone to data loss on node reboot) | Medium | Low (Single-threaded execution) |
| Relational DB (Postgres) | Medium (10ms - 20ms) | High (Fully ACID transaction compliant) | Low (Uses existing database) | Zero (Strong unique constraints) |
| NoSQL KV (DynamoDB) | Low (2ms - 5ms) | High (Decoupled regional auto-scaling) | High (Per-write pricing) | Low (Uses conditional expressions) |
Failure Scenarios and Resilience Strategy
Your system must handle failure vectors within the idempotency layer itself.
1. Managing Transient vs. Deterministic Failures
If a downstream payment service crashes or a database socket times out, the API transaction rolls back. If the middleware records this as a permanent FAILED outcome, the user will be permanently blocked from retrying, even though the charge was never completed.
Resilience Strategy:
- Capture standard status codes. If a request throws a
5xx Server Error, standard downstream transactions roll back. The idempotency key must be released (deleted) from the database to allow subsequent attempts to succeed. - Only cache deterministic validation failures (e.g.
400 Bad Requestor422 Unprocessable Entity), where retry would always fail.
2. Idempotency DB Outages (Fail-Open vs. Fail-Closed)
What happens if the Redis/Postgres idempotency storage cluster goes offline?
- Fail-Open: The system bypasses idempotency verification, allowing all requests to execute. Risk: Risk of double-charging users during active DB outages.
- Fail-Closed (Recommended for Finance): Return an HTTP 503 error, instructing the client to wait. Benefit: Protects financial books from ledger drift.
Staff Engineer Perspective
Verbal Script & Mock Interview
Here is a mock systems design interview script for a Staff Software Engineer position:
Interviewer: "How do you design a robust, distributed idempotency engine for our high-throughput financial gateway?"
Candidate: "To build a highly available, bulletproof idempotency system, I would design a distributed check-and-set mechanism implemented as a Gateway Middleware Filter. This layer intercepts all mutating requests (like POST or PATCH) before they hit our transactional services.
First, the middleware enforces the presence of an Idempotency-Key header, validating it as a high-entropy string (like a UUIDv4). To prevent key hijacking or accidental collisions, I would generate a Request Fingerprint by computing a SHA-256 hash of the request body, excluding unstable fields like timestamp or trace ID.
When a request arrives, the middleware uses an atomic operations engine. For the common path, we execute an INSERT statement against our PostgreSQL metadata database, using a UNIQUE constraint on the idempotency_key column: INSERT ON CONFLICT DO NOTHING RETURNING status. If the insert is successful, we know this is a brand-new transaction. The status is initialized to PROCESSING. We proceed to execute the business logic, and when it succeeds, we update the record to SUCCEEDED, saving the HTTP status code and response body.
If the insert fails due to a unique constraint violation, we immediately fetch the active state. If the status is PROCESSING, we return an HTTP 409 Conflict indicating a duplicate execution is underway, which prevents double-spending. If the status is SUCCEEDED, we check the request fingerprint. If the hash matches exactly, we replay the cached response. If the hashes mismatch, we throw an HTTP 409 Conflict due to key reuse.
Finally, to handle failures gracefully, if the business transaction throws a transient 5xx server error, the middleware rolls back the database state and deletes the idempotency key, allowing the client to safely retry when the database recovers."
Interviewer: "What if our Redis cache or database goes down? Should we fail-open or fail-closed?"
Candidate: "This depends entirely on the domain. For a payment gateway processing money transfers, I would mandate a Fail-Closed design. The cost of double-spending or ledger discrepancies is far higher than temporary system unavailability, so we return a structured HTTP 503 during database outages. However, for non-critical resources like generating an export file or updating user preferences, we can safely Fail-Open, logging a warning but allowing the transaction to proceed without locking."
Production Readiness Checklist
Ensure the following criteria are checked off before deploying the idempotency engine to production:
- Concurrency Safety: Atomic
ON CONFLICTdatabase constraints verified under high-load unit testing. - Fingerprint Verification: Payloads validated against SHA-256 request hashes to block key hijacking.
- Transient Failures handling: 5xx and timeout exceptions verified to delete/release keys.
- Expiration & TTL: Batched cleanup SQL scripts scheduled for a rolling 7-day window.
- HTTP Headers Compliance: Standard
X-Idempotency-Replayedheaders successfully returned on cache hits.