Mental Model
Designing an idempotent payment system in a distributed architecture is a mission-critical security requirement where a single duplicate write can trigger severe financial and brand damage. In distributed cloud environments, networks are fundamentally unreliable: clients retry on timeouts, load balancers reissue failed packets, and message brokers deliver events at-least-once. The system must assume that duplicate requests are a certainty. To prevent double charges, the system introduces a unique client-generated Idempotency Key, which is checked and locked atomically in a unified database transaction before any downstream gateway interaction is triggered.
Requirements and System Goals
When building an enterprise payment system (similar to Stripe or Adyen), we must enforce absolute data durability and transactional guarantees.
1. Functional Requirements
- Duplicate Charge Prevention: Guarantee that under no circumstances can a customer be charged twice for the same logical purchase.
- Payload Mismatch Detection: If a client reuses an idempotency key but submits a different payment amount or currency, the system must reject the request.
- Automatic Status Tracking: Expose the dynamic states of active payments:
PROCESSING(locked in-flight),COMPLETED(card charged successfully), orFAILED(safe to retry).
2. Non-Functional Requirements & Performance Budgets
- Ultra-Low Latency Overhead: The idempotency checking layer must introduce less than 5ms of overhead on the payment hot path.
- High Transaction Throughput: Support up to 5,000 concurrent payment executions per second globally.
- 100% ACID Transaction Co-location: The idempotency checking state and the local payment ledger record must be updated in a single atomic transaction.
- Extreme Availability & Fault Tolerance: If our fast-path cache (Redis) fails, the system must degrade gracefully and fall through to PostgreSQL without dropping active transaction blocks.
API Interfaces and Service Contracts
Distributed payment engines utilize a strict REST API with a mandatory custom header to pass the idempotency token.
1. Create Payment Request API Contract
Clients generate a unique UUIDv4 string on the frontend before their first call and attach it inside the Idempotency-Key header.
POST /api/v1/payments
HTTP Request Headers:
Content-Type: application/json
Idempotency-Key: idem_uuid_a8b9c2d1-4433-2211-bb00-eeddccbbaa99
Request Payload:
{
"user_id": "usr_9a8b7c6d5e",
"amount_cents": 9900, // $99.00 USD
"currency": "USD",
"payment_method_token": "tok_visa_4821",
"purchase_ref": "invoice_2026_06_01_abc"
}
Response Payload (200 OK - Successful Payment):
{
"payment_id": "pay_3b29c8e1-5544-3322-1100-eeddccbbaa99",
"idempotency_key": "idem_uuid_a8b9c2d1-4433-2211-bb00-eeddccbbaa99",
"status": "COMPLETED",
"gateway_charge_id": "ch_stripe_5f8a2c1d3e",
"amount_cents": 9900,
"currency": "USD",
"processed_at": "2026-06-01T11:08:00Z"
}
Response Payload (409 Conflict - Payment Already In Progress):
{
"error_code": "PAYMENT_IN_PROGRESS",
"message": "A transaction with this idempotency key is currently being processed. Please retry in 2 seconds.",
"idempotency_key": "idem_uuid_a8b9c2d1-4433-2211-bb00-eeddccbbaa99",
"status": "PROCESSING"
}
High-Level Design and Visualizations
To protect the core transactional database from read exhaustion under client retries, we couple a high-speed caching layer with our persistent relational engine.
1. Unified Idempotent Payment Architecture
This diagram displays how incoming requests are intercepted at the API Gateway, verified via Redis (fast path), locked in PostgreSQL, and then safely routed to external payment gateways.
graph TD
subgraph Client Tier
Client[Mobile App / Web Client] -->|1. POST /payments with Header| APIGateway[API Gateway Cluster]
end
subgraph Caching Plane (Fast-Path Check)
APIGateway -->|2. Check Cache| RedisCache[(Redis Cluster)]
end
subgraph Transactional Core Store
APIGateway -->|3. ACID Insert Lock| PostgreSQL[(PostgreSQL Primary)]
end
subgraph External Networks
APIGateway -->|4. Secure Charge Request| Stripe[Stripe / Adyen Gateway]
end
subgraph Async Reconciliation
ReconJob[Reconciliation Service] -->|5. Audit Stuck PROCESSING| PostgreSQL
ReconJob -->|6. Query Charge Status| Stripe
end
2. Parallel Client Retry and Gateway Outage sequence
The sequence diagram below displays the detailed execution loop when a client times out, retries, and gets a cached result without triggering a double charge.
sequenceDiagram
autonumber
participant Client as Game / Purchase Client
participant Service as Payment Service Core
participant Redis as Redis Cache
participant DB as PostgreSQL Store
participant Stripe as Stripe Gateway
Client->>Service: POST /payments (Header: idem_key_abc)
Service->>Redis: GET idem:idem_key_abc
Redis-->>Service: Return null (Cache Miss)
Service->>DB: INSERT INTO idempotency_keys (idem_key, status='PROCESSING')
Note over DB: Atomic Unique Constraint Lock
DB-->>Service: Insert Success
Service->>Stripe: POST /charges (Pass idem_key_abc to Stripe)
Note over Stripe: Card Charged Successfully!
Stripe-->>Service: Return HTTP 200 (charge_id = ch_123)
Note over Service, Client: Network drops! Client times out at 10s mark
Service->>DB: UPDATE idempotency_keys SET status='COMPLETED', response_body='...'
Service->>Redis: SET idem:idem_key_abc = response_body (TTL 24h)
rect rgb(255, 240, 240)
Note over Client, Stripe: Client Retry Loop (Using Same Idempotency Key)
Client->>Service: POST /payments (Header: idem_key_abc)
Service->>Redis: GET idem:idem_key_abc
Redis-->>Service: Return cached JSON response (Cache Hit!)
Service-->>Client: Return payment status COMPLETED (Double charge prevented!)
end
Low-Level Design and Schema Strategies
To support strict ACID properties, the data layout enforces co-location of idempotency blocks and payment details.
1. The Mechanics of Request Hashing
When a client sends an idempotency key, we must protect against "Idempotency Key Pollution." This occurs when a client reuses an idempotency key for two completely separate transactions—for example, first attempting to pay $99.00 USD, and then using the same key to charge $9.00 USD.
- The Hashing Workflow:
- Upon receiving the request, the Payment Service extracts the raw JSON body.
- It removes dynamic ephemeral fields (such as
timestamportracking_correlation_id). - It passes the normalized JSON byte array through a cryptographically secure SHA-256 hashing algorithm.
- The resulting 64-character hexadecimal signature is stored in the
request_hashcolumn. - If a subsequent request arrives matching an existing key, the service hashes the new request body and compares the signatures. If they do not match, the request is rejected with an HTTP 422 Unprocessable Entity error, protecting the system from payload tampering.
2. PostgreSQL Transactional Storage Schema
This schema registers the idempotency keys and ensures the payments are mapped directly to unique constraint ledger tables.
-- Authoritative Idempotency Store
CREATE TABLE idempotency_keys (
idempotency_key VARCHAR(255) PRIMARY KEY,
user_id VARCHAR(64) NOT NULL,
request_hash VARCHAR(64) NOT NULL, -- SHA-256 hash of JSON body to detect payload modifications
status VARCHAR(20) NOT NULL DEFAULT 'PROCESSING', -- PROCESSING, COMPLETED, FAILED
response_status INT, -- HTTP Status Code
response_body JSONB, -- Stored response payload
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
completed_at TIMESTAMP WITH TIME ZONE,
expires_at TIMESTAMP WITH TIME ZONE DEFAULT NOW() + INTERVAL '24 hours'
);
-- Core Financial Payments Table
CREATE TABLE customer_payments (
payment_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
idempotency_key VARCHAR(255) NOT NULL UNIQUE REFERENCES idempotency_keys(idempotency_key),
user_id VARCHAR(64) NOT NULL,
amount_cents BIGINT NOT NULL,
currency CHAR(3) NOT NULL,
payment_method_token VARCHAR(255) NOT NULL,
gateway_charge_id VARCHAR(255), -- External transaction ID from Stripe
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);
-- Indexing for fast historical audit queries
CREATE INDEX idx_idempotency_expiry ON idempotency_keys(expires_at) WHERE status = 'COMPLETED';
CREATE INDEX idx_payments_user ON customer_payments(user_id, created_at DESC);
2. Dynamic Redis Cache Key Mapping
The caching layer holds serialized responses to completely bypass database execution for repeat calls.
- Cache Key format:
idem:keyspace:bucket:<idempotency_key>(e.g.idem:payment:production:idem_uuid_a8b9c2d1). - Cache Expiry (TTL): 24 hours (matching the PostgreSQL table expiry window, ensuring that repeat attempts within a day are resolved instantly).
- Graceful Degradation State: If a Redis connection times out (e.g., due to network drops), the client service catches the exception, logs a telemetry warning, and queries PostgreSQL directly, guaranteeing transactional correctness.
Scaling and Operational Challenges
1. The TOCTOU (Time-of-Check to Time-of-Use) Race Condition Math
A classic design error in payment architecture is executing a read check before writing the lock:
SELECT status FROM idempotency_keys WHERE key = 'idem_123';- If
null, proceed to call the external gateway. - Save the key as
COMPLETED.
- The Race Mathematics:
- Suppose two identical retry packets arrive concurrently at the API Gateway within 100 microseconds of each other.
- Both execute Step 1 in parallel. Both see a
nullstatus. - Both proceed to charge the card via Stripe, resulting in a Double Charge of the customer's account!
- The Atomic Prevention:
- We enforce a strict unique constraint at the database storage engine layer by inserting
PROCESSINGinside an active transaction. - The PostgreSQL primary uses index locks.
- The second concurrent transaction trying to write the same primary key is blocked instantly, throwing a
DuplicateKeyException. - This atomic storage constraint cuts the duplicate window to 0.00 milliseconds, guaranteeing absolute safety under heavy concurrent loads.
- We enforce a strict unique constraint at the database storage engine layer by inserting
2. Stuck PROCESSING Key Resolution
If a payment microservice instance crashes after inserting a PROCESSING key but before completing the Stripe charge, the idempotency key remains locked in a PROCESSING state indefinitely. Subsequent client retries are blocked with PAYMENT_IN_PROGRESS conflict errors.
- The Resolution Engine:
- We run an automated background Reconciliation Cron Job every 60 seconds.
- The job queries stuck keys:
SELECT idempotency_key FROM idempotency_keys WHERE status = 'PROCESSING' AND created_at < NOW() - INTERVAL '2 minutes'; - For every stuck key, the job polls the Stripe Gateway API using the same idempotency key.
- If Stripe reports a successful charge, the job updates PostgreSQL to
COMPLETEDwith the Stripecharge_id. - If Stripe has no record of the charge, the job updates PostgreSQL to
FAILED, releasing the lock and allowing the client to safely retry.
Idempotency Storage Trade-offs
Selecting a storage model dictates latency bounds and transaction consistency.
| Operational Dimension | Persisted Relational Store (PostgreSQL) | Fast In-Memory Cache (Redis) | Hybrid (Cache Fast Path, DB Source of Truth) |
|---|---|---|---|
| Lookup Latency | Medium (1ms to 5ms database query) | Ultra-Low (less than 1ms) | Ultra-Low (less than 1ms cache hits) |
| ACID Transaction Support | Excellent (Co-located with payment records) | None (Supports atomic locks but no cross-system ACID) | Excellent (Uses DB for writes, Cache for repeat reads) |
| Data Durability | Absolute (WAL persisted on physical disks) | Medium (Risk of data loss on reboot/cluster drops) | Absolute (DB is authoritative, Cache degrades gracefully) |
| Write Complexity | Low (Standard SQL commands) | Low (Simple key-value operations) | High (Requires handling read-through caching syncs) |
| Best Use Case | Payments under 5,000 requests/sec. | Volatile caching, non-financial APIs. | Enterprise scale financial payment architectures. |
Failure Modes and Fault Tolerance Strategies
1. Gateway Success with Lost Service Response
If the Stripe gateway successfully charges a credit card but the return network packet is dropped before reaching our server, the database records the transaction as FAILED or remains stuck in PROCESSING.
- The Strategy: We pass our unique client-generated
Idempotency-Keydirectly through to Stripe. - Stripe's internal engines recognize the key. If we retry the request, Stripe does NOT charge the card again. Instead, it returns the existing
charge_id. - The payment service interceptor updates the local database to
COMPLETED, updates the cache, and returns the successful state, aligning both clusters.
2. API Gateway Retry Storms and Circuit Breaking
During global network outages, client retries can multiply, triggering Retry Storms that saturate payment gateway thread pools.
- The Mitigation Plan:
- We enforce a strict Rate Limiter at the API Gateway.
- If a client retries excessively, the gateway throws
HTTP 429 Too Many Requestscontaining aRetry-Afterheader. - We run
Resilience4jcircuit breakers on our Stripe integration modules. - If Stripe's error rate exceeds 20% or latency spikes greater than 5 seconds, the circuit breaker opens, rejecting subsequent new payments instantly with a graceful degradation response while allow cache-hit idempotency retrievals to resolve.
Staff Engineer Perspective
Production Readiness Checklist
Ensure these checks are satisfied before putting your idempotent payment service into active production:
- DB Unique Constraints Active: Verify that a unique index exists on the
idempotency_keycolumn in both the idempotency and payment tables. - Pass-Through Keys: Confirm that client-generated idempotency keys are passed through to Stripe/external payment APIs.
- Reconciliation Cron Configured: Ensure the background recovery job is active and configured to query Stripe status for stuck keys every 60 seconds.
- BigInt for Currencies: Check that all database amount representations use
BIGINT(amount in lowest currency unit, e.g., cents) rather than floats to prevent rounding errors.
Read Next
- High Availability: Building a Five Nines Infrastructure
- Saga Orchestration: Managing Distributed Transactions
- api-rate-limiting-redis-strategies
Verbal Script
Interviewer: "How would you design a highly reliable and idempotent payment system that guarantees no customer is double-charged, and how do you handle failures on the hot path?"
Candidate: "To design a highly reliable and idempotent payment system, I would build a multi-layered architecture that combines a high-speed Redis caching tier for fast-path check resolutions, an authoritative PostgreSQL database for transactional consistency, and pass-through idempotency key propagation to external payment gateways.
When a client initiates a purchase, it generates a unique UUIDv4 on the frontend and submits it inside the Idempotency-Key header. The request is intercepted by our Payment Service.
First, the service queries our Redis Cluster. If the key exists, it means this is a duplicate request that has already been processed. The service returns the cached JSON response in less than 1ms, completely bypassing the database and the gateway, preventing a double charge.
If it is a cache miss, we fall through to PostgreSQL. To prevent TOCTOU (Time-of-Check to Time-of-Use) race conditions where concurrent retries arrive within microseconds of each other, we do not execute a naive read-before-write logic. Instead, we perform an atomic INSERT of the key with a status of PROCESSING inside a PostgreSQL transaction. Under heavy concurrent load, the database unique index locks the row. The second concurrent insert is instantly blocked and throws a DuplicateKeyException, ensuring that only one thread can proceed.
Once the database lock is acquired, the service calls the Stripe Gateway, passing our unique idempotency key through to their API. This is our second safety net: if a network failure drops the connection between our server and Stripe after the card is charged, a subsequent retry will be recognized by Stripe's engines, which return the existing charge_id without charging the customer again.
Finally, we handle stuck PROCESSING states—such as when a service crashes mid-transaction—by running a background reconciliation job every 60 seconds. The job queries stuck keys, checks their actual transaction status on Stripe, and updates PostgreSQL to COMPLETED or FAILED accordingly, keeping the system clean, consistent, and self-healing."