Lesson 77 of 105 13 minFlagship

System Design: Building an Inventory Reservation System

Design a production inventory reservation system that prevents oversells across carts, checkout, payments, cancellations, and expirations with idempotency, hold timeouts, and operational recovery.

Reading Mode

Hide the curriculum rail and keep the lesson centered for focused reading.

Key Takeaways

  • **oversell**: you sold inventory you do not actually have.
  • **undersell**: inventory is available but stuck in ghost reservations.
  • **inconsistent truth**: product page, checkout, warehouse, and finance all disagree.
Recommended Prerequisites
System Design Interview Framework

Premium outcome

From vague architecture answers to staff-level trade-off thinking.

Backend engineers preparing for senior, staff, and architecture rounds.

What you unlock

  • A reusable system design answer framework for ambiguous prompts
  • Clear language for consistency, scaling, and reliability trade-offs
  • Case-study depth across feeds, payments, storage, and messaging systems

Introduction: The Complexity of Stock Allocation

Inventory tracking seems straightforward initially: you keep a counter of items in a database table, decrement it when a purchase occurs, and block requests when the counter hits zero. In physical retail, this model works. However, in high-volume e-commerce environments, this simple approach breaks down.

When millions of users browse a storefront concurrently during a flash sale or product launch, they generate extreme transaction contention for a limited pool of hot items (such as ticketing events or limited-edition apparel drops).

If a system decrements inventory only after payment success, it suffers from Oversells, selling items it does not physically possess because multiple customers checked out at the same time. Conversely, if it decrements stock as soon as an item is placed in a cart, it suffers from Undersells, locking stock in abandoned carts ("ghost reservations") and preventing legitimate buyers from purchasing available items.

To bridge this gap, we must build a transactional Inventory Reservation System that implements short-lived temporary holds, handles concurrent purchases, coordinates payments asynchronously, and automatically recovers failed checkout states.


Requirements and System Goals

Functional Requirements

  1. Inventory Balance Tracking: Maintain clear records of inventory balances scoped by warehouse location, merchant tenant, and SKU.
  2. Temporary Reservation Holds: Support creating short-lived, lock-safe inventory holds (e.g., reserving a product for 10 minutes during the checkout phase) to allow users to complete their payment forms.
  3. Idempotent State Promotions: Confirm and promote temporary reservations into committed allocations once payment succeeds. Replays must return success without double-deducting stock.
  4. Automated Expiry & Release: Reclaim held stock automatically when reservations expire without payment, or when order cancellations occur.
  5. Durable Inventory Audits: Log every inventory event (reserves, confirms, releases, warehouse stock adjustments) to a persistent append-only event log for reconciliation.

Non-Functional Requirements

  1. Zero Oversell Guarantee: The system must guarantee that sellable stock never drops below zero under any concurrency or retry conditions.
  2. Sub-10ms Reservation Latency: The reservation transaction must complete in less than 10 milliseconds under peak read-write contention to avoid bottlenecking checkout.
  3. Eventual Storefront Consistency: Storefront inventory availability reads can tolerate minor lag (cache reads), but the reservation path must operate against transactionally consistent data.
  4. Partition Isolation: Contention on one hot SKU must not block checkout traffic for other unrelated products.

API Interfaces and Service Contracts

We define the core RESTful interfaces for managing the reservation lifecycle.

1. Create Temporary Reservation (Lock Hold)

  • Endpoint: POST /v1/reservations
  • Headers: Idempotency-Key: cart_981_laptop_1
  • Request Payload (JSON):
{
  "tenant_id": "merchant_42",
  "sku": "laptop-16gb",
  "warehouse_id": "blr-1",
  "quantity": 1,
  "cart_id": "cart_981",
  "customer_id": "cust_77",
  "expires_in_seconds": 600
}
  • Response Payload (JSON - 201 Created):
{
  "reservation_id": "res_890182-90a",
  "status": "RESERVED",
  "expires_at": "2026-06-06T08:50:00Z"
}

2. Confirm Reservation (Post-Payment Promotion)

Once the payment gateway acknowledges success, the payment worker calls this endpoint:

  • Endpoint: POST /v1/reservations/res_890182-90a/confirm
  • Request Payload (JSON):
{
  "payment_id": "pay_9812a893",
  "order_id": "ord_5528a901"
}
  • Response: 200 OK with JSON { "status": "CONFIRMED", "committed_at": "2026-06-06T08:45:00Z" }

High-Level Design and Visualizations

An inventory reservation system must handle fast checkouts and shield databases from flash-sale lock thrashing.

Storefront Checkout & Reservation Workflow

This diagram traces how inventory reservation interacts with payment processing and order completion:

sequenceDiagram
    participant User as Storefront Client
    participant GW as API Gateway / Checkout Service
    participant Inv as Inventory Reservation Service
    participant Pay as Payment Gateway (Stripe)
    participant DB as SQL Datastore (Balances & Holds)

    User->>GW: Click "Proceed to Checkout"
    GW->>Inv: POST /v1/reservations (Qty: 1)
    
    Inv->>DB: Start Transaction: Check & Reserve (Update Balances)
    DB-->>Inv: Invariant Checked (Success)
    Inv->>DB: Insert Reservation Hold (Status: RESERVED)
    Inv-->>GW: Return Reservation ID (res_123, Status: RESERVED)
    
    GW-->>User: Show Payment Page (Starts 10m timer)
    
    User->>GW: Submit Payment Details
    GW->>Pay: Process Charge
    Pay-->>GW: Payment SUCCESS
    
    GW->>Inv: POST /v1/reservations/res_123/confirm
    Inv->>DB: Start Transaction: Move Reserved to Committed
    DB-->>Inv: Success
    Inv-->>GW: Confirm ACK
    
    GW-->>User: Show Order Confirmation

Hot SKU Write Contention Mitigation Queue

Under high contention (e.g., 50,000 requests for 100 units), direct database updates trigger lock timeout failures. We serialize updates using topic partition queues.

graph TD
    User[50,000 Concurrent Requests] -->|POST /v1/reservations| GW[API Gateway Cluster]
    GW -->|Hash by SKU / Route to Topic| Kafka[Kafka Event Bus: Topic SKU-Laptop]
    
    subgraph Serialized Execution Worker
        Kafka -->|Pull ordered batches| Worker[Single-Threaded Partition Worker]
        Worker -->|Execute batch inserts| DB[(Durable Database Row)]
    end

    style User fill:#f8f9fa,stroke:#343a40
    style Kafka fill:#f8d7da,stroke:#dc3545
    style Worker fill:#fff3cd,stroke:#ffc107

Low-Level Design and Schema Strategies

To support temporary holds, we must model inventory as distinct state buckets instead of a single integer value.

State Model Formulation

Each inventory item record tracks:

  • on_hand: Physical stock present in the warehouse.
  • reserved: Stock currently locked in active checkout holds.
  • committed: Stock allocated to paid orders, awaiting warehouse fulfillment.
  • available (derived): Sellable stock capacity.

The system enforces the following core invariant:

$$\text{Available Stock} = \text{On Hand} - \text{Reserved} - \text{Committed}$$

$$\text{Available Stock} \ge 0$$

SQL Schema: Balances and Reservations

CREATE TABLE inventory_balances (
    tenant_id VARCHAR(64) NOT NULL,
    sku VARCHAR(64) NOT NULL,
    warehouse_id VARCHAR(64) NOT NULL,
    on_hand INT NOT NULL CHECK (on_hand >= 0),
    reserved INT NOT NULL DEFAULT 0 CHECK (reserved >= 0),
    committed INT NOT NULL DEFAULT 0 CHECK (committed >= 0),
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    version BIGINT NOT NULL DEFAULT 0,
    PRIMARY KEY (tenant_id, sku, warehouse_id),
    -- Database level invariant safety check
    CONSTRAINT check_sellable_stock_limit CHECK (on_hand - reserved - committed >= 0)
);

CREATE TABLE inventory_reservations (
    reservation_id UUID PRIMARY KEY,
    tenant_id VARCHAR(64) NOT NULL,
    sku VARCHAR(64) NOT NULL,
    warehouse_id VARCHAR(64) NOT NULL,
    quantity INT NOT NULL CHECK (quantity > 0),
    status VARCHAR(16) NOT NULL CHECK (status IN ('RESERVED', 'CONFIRMED', 'EXPIRED', 'CANCELLED')),
    idempotency_key VARCHAR(256) NOT NULL,
    expires_at TIMESTAMP WITH TIME ZONE NOT NULL,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    UNIQUE (tenant_id, idempotency_key),
    FOREIGN KEY (tenant_id, sku, warehouse_id) REFERENCES inventory_balances(tenant_id, sku, warehouse_id)
);

Pessimistic Row Locking SQL Query (Atomic Reservation)

To execute reservations, we must combine the availability check and the increment update into a single write query. This blocks concurrent processes from reading outdated stock numbers:

-- Acquire row lock and update atomically
UPDATE inventory_balances
SET reserved = reserved + :quantity,
    version = version + 1,
    updated_at = NOW()
WHERE tenant_id = :tenant_id
  AND sku = :sku
  AND warehouse_id = :warehouse_id
  -- Enforce availability invariant inside the write condition
  AND (on_hand - reserved - committed) >= :quantity;

-- The database returns the number of rows updated.
-- If the return value is 0, it indicates insufficient inventory.

Scaling and Operational Challenges: Calculations & Formulations

When flash sales occur, thousands of transaction requests attempt to acquire a write lock on the same database row. Let us calculate this lock saturation threshold.

Back-of-the-Envelope Contention Calculation

Let us define:

  • $N_{\text{checkouts}}$: Concurrent checkout requests targeting the last 100 units of a hot product (e.g., 50,000 requests).
  • $T_{\text{hold}}$: Average database lock duration for a single update transaction (including connection acquisition, index updates, and WAL commits $\approx$ 1.5ms).
  • $C_{\text{pool}}$: Number of parallel connections configured in our database connection pool (e.g., 50 connections).

Step 1: Calculate Total Lock Time Required

If all 50,000 requests attempt to update the same inventory balance row sequentially:

$$\text{Total Lock Time} = N_{\text{checkouts}} \times T_{\text{hold}} = 50,000 \times 1.5 \text{ ms} = 75,000 \text{ ms} = 75 \text{ seconds}$$

Step 2: Connection Pool Saturation

Because every write to the same row must block, parallel connections do not increase throughput; they only increase connection wait queues. If our gateway has a transaction timeout limit of 3,000 ms:

$$\text{Max Successful Requests} = \frac{\text{Timeout Limit}}{T_{\text{hold}}} = \frac{3,000 \text{ ms}}{1.5 \text{ ms}} = 2,000 \text{ requests}$$

The remaining 48,000 requests will time out, resulting in database connection pool starvation and cascading gateway crashes.

Mitigation: Redis In-Memory Pre-Allocation

To protect the SQL database:

  1. Storefront Pre-Allocation: Maintain available stock counters in Redis memory:
    • Key: inventory:pre_alloc:sku_123
  2. Atomic Inbound Filtering: Use a Redis Lua script to decrement the counter:
local val = tonumber(redis.call('GET', KEYS[1]))
local qty = tonumber(ARGV[1])
if val and val >= qty then
    redis.call('DECRBY', KEYS[1], qty)
    return 1 -- Allowed to proceed to DB write
else
    return 0 -- Reject immediately
end

By filtering out the 49,900 failed requests in Redis memory (adding less than 1.5ms per request), only the 100 successful requests proceed to execute the heavier SQL writes, avoiding database lock saturation entirely.


Trade-offs and Architectural Alternatives

Backend teams must choose their position on the inventory consistency-performance spectrum.

Reservation Architecture Comparison

Dimension / Choice Pessimistic DB Locking (SQL) Redis Pre-Allocation Counter Message Queue Serialization (FIFO)
Write Path Latency High (10 - 20ms under lock contention) Very Low (less than 2ms) High (Dependent on queue processing lag)
Oversell Risk Absolute zero (Enforced by DB invariants) Low (Possible if Redis crashes without persistence sync) Absolute zero
Operational Complexity Low (Uses standard SQL engine) Medium (Requires managing dual Redis-SQL sync) High (Requires managing Kafka topic partitions)
Contention Performance Poor (Rows lock, threads saturate) Excellent (Handles 100K+ QPS easily) Good (Smooths write peaks)
Audit Trails High (Easy to log in same transaction) Low (Requires out-of-band sync logic) High (Durable event log)

Key Trade-offs

  1. DB Locking vs. Redis Pre-Allocation:
    • DB Locking: Best for low-contention environments or high-value items (e.g., booking plane tickets) where absolute correctness is mandatory.
    • Redis Pre-Allocation: Best for high-traffic storefronts (e.g., clothing drops, flash sales) where we must shield the database from collapsing under lock contention, tolerating minor caching sync complexities.

Failure Modes and Fault Tolerance Strategies

Operating inventory locks across microservices exposes the system to consistency gaps.

1. Payment Success, Reservation Expired (The Oversell Gap)

If Client A reserves an item for 10 minutes, but their payment processor takes 12 minutes to acknowledge success, the background sweeper has already expired the reservation and freed the stock. Client B has purchased the unit, leaving the system in an oversell state when Client A's payment finally confirms.

  • Mitigation: Implement a Post-Expiry Verification Workflow. When the confirm RPC arrives, the system validates the status. If the reservation has expired, it attempts to acquire a new lock on on-hand stock. If no stock is available, it routes the order to a manual queue, marks the transaction as "Backordered," and emits a notification event to alert support or trigger an automated refund.

2. Expiry Sweeper Crash

If the background cron worker crashes, expired reservations are never cleaned up, locking available stock and creating ghost reservation leaks.

  • Mitigation: Incorporate Lazy Evaluation on Read. When availability checks or reservations are processed, the SQL engine ignores records where status = 'RESERVED' and expires_at <= NOW(). This ensures that cleanup lag does not affect storefront correctness.


Verbal Script

Interviewer: "How would you design an inventory reservation system for a flash sale platform, ensuring zero oversell while handling high contention?"

Candidate: "To design a zero-oversell, high-contention inventory reservation system, I would use a hybrid architecture that separates the storefront read path from the transactional write path.

We model inventory using three distinct balance buckets in our database: on_hand, reserved, and committed.

The system enforces the strict invariant: available = on_hand - reserved - committed must be greater than or equal to zero.

During checkout, we create a temporary hold.

To prevent oversells under high contention without freezing our database, we implement Redis Pre-Allocation.

When a flash sale starts, we load the available stock count into Redis.

Every checkout request executes an atomic Redis Lua script that checks the available balance and decrements it.

If Redis indicates the stock is exhausted, we reject the request immediately at the edge.

If Redis allows it, we proceed to write to our PostgreSQL database.

In the database, the reservation transaction updates the inventory_balances table.

We write the check condition directly inside the SQL UPDATE statement, ensuring atomicity at the storage level: SET reserved = reserved + :qty WHERE sku = :sku AND (on_hand - reserved - committed) >= :qty.

If the database returns 1 row updated, we insert a record into the inventory_reservations table with a status of RESERVED and a TTL of 10 minutes.

Once the payment succeeds, a worker calls the confirm endpoint, shifting the status from RESERVED to CONFIRMED and moving the balance from reserved to committed inside a single SQL transaction.

This hybrid Redis-SQL model protects our database from thread lock saturation while maintaining absolute durability and consistency."

Interviewer: "What happens if a user's payment completes successfully but the webhook takes longer than the 10-minute reservation lease, and the stock is sold to someone else?"

Candidate: "This is a classic asynchronous race condition that leads to an oversell state.

To handle this, when the payment webhook attempts to confirm the reservation, the database transaction locks the reservation row using SELECT FOR UPDATE.

If the transaction status is already EXPIRED because the 10-minute lease passed, the system triggers our compensation workflow.

First, the system searches the balance tables to see if we can acquire a replacement lock for the SKU in the same warehouse, or route the allocation to an alternate nearby warehouse.

If replacement stock is available, we confirm the order and proceed normally.

If the product is entirely sold out, we cannot fulfill the order.

The system generates a 'Backordered' state, stamps the order status for manual reconciliation, and writes a compensation event to our Outbox table.

This event is picked up by a listener that initiates an automated refund call to Stripe and sends an email to the customer explaining the inventory mismatch.

This ensures that we resolve the database inconsistency gracefully without violating warehouse reality or losing customer trust."

Want to track your progress?

Sign in to save your progress, track completed lessons, and pick up where you left off.