Introduction: The Complexity of Stock Allocation
Inventory tracking seems straightforward initially: you keep a counter of items in a database table, decrement it when a purchase occurs, and block requests when the counter hits zero. In physical retail, this model works. However, in high-volume e-commerce environments, this simple approach breaks down.
When millions of users browse a storefront concurrently during a flash sale or product launch, they generate extreme transaction contention for a limited pool of hot items (such as ticketing events or limited-edition apparel drops).
If a system decrements inventory only after payment success, it suffers from Oversells, selling items it does not physically possess because multiple customers checked out at the same time. Conversely, if it decrements stock as soon as an item is placed in a cart, it suffers from Undersells, locking stock in abandoned carts ("ghost reservations") and preventing legitimate buyers from purchasing available items.
To bridge this gap, we must build a transactional Inventory Reservation System that implements short-lived temporary holds, handles concurrent purchases, coordinates payments asynchronously, and automatically recovers failed checkout states.
Requirements and System Goals
Functional Requirements
- Inventory Balance Tracking: Maintain clear records of inventory balances scoped by warehouse location, merchant tenant, and SKU.
- Temporary Reservation Holds: Support creating short-lived, lock-safe inventory holds (e.g., reserving a product for 10 minutes during the checkout phase) to allow users to complete their payment forms.
- Idempotent State Promotions: Confirm and promote temporary reservations into committed allocations once payment succeeds. Replays must return success without double-deducting stock.
- Automated Expiry & Release: Reclaim held stock automatically when reservations expire without payment, or when order cancellations occur.
- Durable Inventory Audits: Log every inventory event (reserves, confirms, releases, warehouse stock adjustments) to a persistent append-only event log for reconciliation.
Non-Functional Requirements
- Zero Oversell Guarantee: The system must guarantee that sellable stock never drops below zero under any concurrency or retry conditions.
- Sub-10ms Reservation Latency: The reservation transaction must complete in less than 10 milliseconds under peak read-write contention to avoid bottlenecking checkout.
- Eventual Storefront Consistency: Storefront inventory availability reads can tolerate minor lag (cache reads), but the reservation path must operate against transactionally consistent data.
- Partition Isolation: Contention on one hot SKU must not block checkout traffic for other unrelated products.
API Interfaces and Service Contracts
We define the core RESTful interfaces for managing the reservation lifecycle.
1. Create Temporary Reservation (Lock Hold)
- Endpoint:
POST /v1/reservations - Headers:
Idempotency-Key: cart_981_laptop_1 - Request Payload (JSON):
{
"tenant_id": "merchant_42",
"sku": "laptop-16gb",
"warehouse_id": "blr-1",
"quantity": 1,
"cart_id": "cart_981",
"customer_id": "cust_77",
"expires_in_seconds": 600
}
- Response Payload (JSON - 201 Created):
{
"reservation_id": "res_890182-90a",
"status": "RESERVED",
"expires_at": "2026-06-06T08:50:00Z"
}
2. Confirm Reservation (Post-Payment Promotion)
Once the payment gateway acknowledges success, the payment worker calls this endpoint:
- Endpoint:
POST /v1/reservations/res_890182-90a/confirm - Request Payload (JSON):
{
"payment_id": "pay_9812a893",
"order_id": "ord_5528a901"
}
- Response:
200 OKwith JSON{ "status": "CONFIRMED", "committed_at": "2026-06-06T08:45:00Z" }
High-Level Design and Visualizations
An inventory reservation system must handle fast checkouts and shield databases from flash-sale lock thrashing.
Storefront Checkout & Reservation Workflow
This diagram traces how inventory reservation interacts with payment processing and order completion:
sequenceDiagram
participant User as Storefront Client
participant GW as API Gateway / Checkout Service
participant Inv as Inventory Reservation Service
participant Pay as Payment Gateway (Stripe)
participant DB as SQL Datastore (Balances & Holds)
User->>GW: Click "Proceed to Checkout"
GW->>Inv: POST /v1/reservations (Qty: 1)
Inv->>DB: Start Transaction: Check & Reserve (Update Balances)
DB-->>Inv: Invariant Checked (Success)
Inv->>DB: Insert Reservation Hold (Status: RESERVED)
Inv-->>GW: Return Reservation ID (res_123, Status: RESERVED)
GW-->>User: Show Payment Page (Starts 10m timer)
User->>GW: Submit Payment Details
GW->>Pay: Process Charge
Pay-->>GW: Payment SUCCESS
GW->>Inv: POST /v1/reservations/res_123/confirm
Inv->>DB: Start Transaction: Move Reserved to Committed
DB-->>Inv: Success
Inv-->>GW: Confirm ACK
GW-->>User: Show Order Confirmation
Hot SKU Write Contention Mitigation Queue
Under high contention (e.g., 50,000 requests for 100 units), direct database updates trigger lock timeout failures. We serialize updates using topic partition queues.
graph TD
User[50,000 Concurrent Requests] -->|POST /v1/reservations| GW[API Gateway Cluster]
GW -->|Hash by SKU / Route to Topic| Kafka[Kafka Event Bus: Topic SKU-Laptop]
subgraph Serialized Execution Worker
Kafka -->|Pull ordered batches| Worker[Single-Threaded Partition Worker]
Worker -->|Execute batch inserts| DB[(Durable Database Row)]
end
style User fill:#f8f9fa,stroke:#343a40
style Kafka fill:#f8d7da,stroke:#dc3545
style Worker fill:#fff3cd,stroke:#ffc107
Low-Level Design and Schema Strategies
To support temporary holds, we must model inventory as distinct state buckets instead of a single integer value.
State Model Formulation
Each inventory item record tracks:
on_hand: Physical stock present in the warehouse.reserved: Stock currently locked in active checkout holds.committed: Stock allocated to paid orders, awaiting warehouse fulfillment.available(derived): Sellable stock capacity.
The system enforces the following core invariant:
$$\text{Available Stock} = \text{On Hand} - \text{Reserved} - \text{Committed}$$
$$\text{Available Stock} \ge 0$$
SQL Schema: Balances and Reservations
CREATE TABLE inventory_balances (
tenant_id VARCHAR(64) NOT NULL,
sku VARCHAR(64) NOT NULL,
warehouse_id VARCHAR(64) NOT NULL,
on_hand INT NOT NULL CHECK (on_hand >= 0),
reserved INT NOT NULL DEFAULT 0 CHECK (reserved >= 0),
committed INT NOT NULL DEFAULT 0 CHECK (committed >= 0),
updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
version BIGINT NOT NULL DEFAULT 0,
PRIMARY KEY (tenant_id, sku, warehouse_id),
-- Database level invariant safety check
CONSTRAINT check_sellable_stock_limit CHECK (on_hand - reserved - committed >= 0)
);
CREATE TABLE inventory_reservations (
reservation_id UUID PRIMARY KEY,
tenant_id VARCHAR(64) NOT NULL,
sku VARCHAR(64) NOT NULL,
warehouse_id VARCHAR(64) NOT NULL,
quantity INT NOT NULL CHECK (quantity > 0),
status VARCHAR(16) NOT NULL CHECK (status IN ('RESERVED', 'CONFIRMED', 'EXPIRED', 'CANCELLED')),
idempotency_key VARCHAR(256) NOT NULL,
expires_at TIMESTAMP WITH TIME ZONE NOT NULL,
created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
UNIQUE (tenant_id, idempotency_key),
FOREIGN KEY (tenant_id, sku, warehouse_id) REFERENCES inventory_balances(tenant_id, sku, warehouse_id)
);
Pessimistic Row Locking SQL Query (Atomic Reservation)
To execute reservations, we must combine the availability check and the increment update into a single write query. This blocks concurrent processes from reading outdated stock numbers:
-- Acquire row lock and update atomically
UPDATE inventory_balances
SET reserved = reserved + :quantity,
version = version + 1,
updated_at = NOW()
WHERE tenant_id = :tenant_id
AND sku = :sku
AND warehouse_id = :warehouse_id
-- Enforce availability invariant inside the write condition
AND (on_hand - reserved - committed) >= :quantity;
-- The database returns the number of rows updated.
-- If the return value is 0, it indicates insufficient inventory.
Scaling and Operational Challenges: Calculations & Formulations
When flash sales occur, thousands of transaction requests attempt to acquire a write lock on the same database row. Let us calculate this lock saturation threshold.
Back-of-the-Envelope Contention Calculation
Let us define:
- $N_{\text{checkouts}}$: Concurrent checkout requests targeting the last 100 units of a hot product (e.g., 50,000 requests).
- $T_{\text{hold}}$: Average database lock duration for a single update transaction (including connection acquisition, index updates, and WAL commits $\approx$ 1.5ms).
- $C_{\text{pool}}$: Number of parallel connections configured in our database connection pool (e.g., 50 connections).
Step 1: Calculate Total Lock Time Required
If all 50,000 requests attempt to update the same inventory balance row sequentially:
$$\text{Total Lock Time} = N_{\text{checkouts}} \times T_{\text{hold}} = 50,000 \times 1.5 \text{ ms} = 75,000 \text{ ms} = 75 \text{ seconds}$$
Step 2: Connection Pool Saturation
Because every write to the same row must block, parallel connections do not increase throughput; they only increase connection wait queues. If our gateway has a transaction timeout limit of 3,000 ms:
$$\text{Max Successful Requests} = \frac{\text{Timeout Limit}}{T_{\text{hold}}} = \frac{3,000 \text{ ms}}{1.5 \text{ ms}} = 2,000 \text{ requests}$$
The remaining 48,000 requests will time out, resulting in database connection pool starvation and cascading gateway crashes.
Mitigation: Redis In-Memory Pre-Allocation
To protect the SQL database:
- Storefront Pre-Allocation: Maintain available stock counters in Redis memory:
- Key:
inventory:pre_alloc:sku_123
- Key:
- Atomic Inbound Filtering: Use a Redis Lua script to decrement the counter:
local val = tonumber(redis.call('GET', KEYS[1]))
local qty = tonumber(ARGV[1])
if val and val >= qty then
redis.call('DECRBY', KEYS[1], qty)
return 1 -- Allowed to proceed to DB write
else
return 0 -- Reject immediately
end
By filtering out the 49,900 failed requests in Redis memory (adding less than 1.5ms per request), only the 100 successful requests proceed to execute the heavier SQL writes, avoiding database lock saturation entirely.
Trade-offs and Architectural Alternatives
Backend teams must choose their position on the inventory consistency-performance spectrum.
Reservation Architecture Comparison
| Dimension / Choice | Pessimistic DB Locking (SQL) | Redis Pre-Allocation Counter | Message Queue Serialization (FIFO) |
|---|---|---|---|
| Write Path Latency | High (10 - 20ms under lock contention) | Very Low (less than 2ms) | High (Dependent on queue processing lag) |
| Oversell Risk | Absolute zero (Enforced by DB invariants) | Low (Possible if Redis crashes without persistence sync) | Absolute zero |
| Operational Complexity | Low (Uses standard SQL engine) | Medium (Requires managing dual Redis-SQL sync) | High (Requires managing Kafka topic partitions) |
| Contention Performance | Poor (Rows lock, threads saturate) | Excellent (Handles 100K+ QPS easily) | Good (Smooths write peaks) |
| Audit Trails | High (Easy to log in same transaction) | Low (Requires out-of-band sync logic) | High (Durable event log) |
Key Trade-offs
- DB Locking vs. Redis Pre-Allocation:
- DB Locking: Best for low-contention environments or high-value items (e.g., booking plane tickets) where absolute correctness is mandatory.
- Redis Pre-Allocation: Best for high-traffic storefronts (e.g., clothing drops, flash sales) where we must shield the database from collapsing under lock contention, tolerating minor caching sync complexities.
Failure Modes and Fault Tolerance Strategies
Operating inventory locks across microservices exposes the system to consistency gaps.
1. Payment Success, Reservation Expired (The Oversell Gap)
If Client A reserves an item for 10 minutes, but their payment processor takes 12 minutes to acknowledge success, the background sweeper has already expired the reservation and freed the stock. Client B has purchased the unit, leaving the system in an oversell state when Client A's payment finally confirms.
- Mitigation: Implement a Post-Expiry Verification Workflow. When the
confirmRPC arrives, the system validates the status. If the reservation has expired, it attempts to acquire a new lock on on-hand stock. If no stock is available, it routes the order to a manual queue, marks the transaction as "Backordered," and emits a notification event to alert support or trigger an automated refund.
2. Expiry Sweeper Crash
If the background cron worker crashes, expired reservations are never cleaned up, locking available stock and creating ghost reservation leaks.
- Mitigation: Incorporate Lazy Evaluation on Read. When availability checks or reservations are processed, the SQL engine ignores records where
status = 'RESERVED'andexpires_at <= NOW(). This ensures that cleanup lag does not affect storefront correctness.
Verbal Script
Interviewer: "How would you design an inventory reservation system for a flash sale platform, ensuring zero oversell while handling high contention?"
Candidate: "To design a zero-oversell, high-contention inventory reservation system, I would use a hybrid architecture that separates the storefront read path from the transactional write path.
We model inventory using three distinct balance buckets in our database: on_hand, reserved, and committed.
The system enforces the strict invariant: available = on_hand - reserved - committed must be greater than or equal to zero.
During checkout, we create a temporary hold.
To prevent oversells under high contention without freezing our database, we implement Redis Pre-Allocation.
When a flash sale starts, we load the available stock count into Redis.
Every checkout request executes an atomic Redis Lua script that checks the available balance and decrements it.
If Redis indicates the stock is exhausted, we reject the request immediately at the edge.
If Redis allows it, we proceed to write to our PostgreSQL database.
In the database, the reservation transaction updates the inventory_balances table.
We write the check condition directly inside the SQL UPDATE statement, ensuring atomicity at the storage level:
SET reserved = reserved + :qty WHERE sku = :sku AND (on_hand - reserved - committed) >= :qty.
If the database returns 1 row updated, we insert a record into the inventory_reservations table with a status of RESERVED and a TTL of 10 minutes.
Once the payment succeeds, a worker calls the confirm endpoint, shifting the status from RESERVED to CONFIRMED and moving the balance from reserved to committed inside a single SQL transaction.
This hybrid Redis-SQL model protects our database from thread lock saturation while maintaining absolute durability and consistency."
Interviewer: "What happens if a user's payment completes successfully but the webhook takes longer than the 10-minute reservation lease, and the stock is sold to someone else?"
Candidate: "This is a classic asynchronous race condition that leads to an oversell state.
To handle this, when the payment webhook attempts to confirm the reservation, the database transaction locks the reservation row using SELECT FOR UPDATE.
If the transaction status is already EXPIRED because the 10-minute lease passed, the system triggers our compensation workflow.
First, the system searches the balance tables to see if we can acquire a replacement lock for the SKU in the same warehouse, or route the allocation to an alternate nearby warehouse.
If replacement stock is available, we confirm the order and proceed normally.
If the product is entirely sold out, we cannot fulfill the order.
The system generates a 'Backordered' state, stamps the order status for manual reconciliation, and writes a compensation event to our Outbox table.
This event is picked up by a listener that initiates an automated refund call to Stripe and sends an email to the customer explaining the inventory mismatch.
This ensures that we resolve the database inconsistency gracefully without violating warehouse reality or losing customer trust."