System Design: Designing an Online Auction System (eBay Scale)

Designing a high-scale online auction system like eBay or a penny auction site is a classic distributed concurrency challenge. The system must handle thousands of users bidding on the same item in the final seconds of an auction, ensuring that the highest bid is always recorded and no two bids are processed out of order.

The core challenge of an online auction system is the "Last-Second Surge." When thousands of users submit bids in the final milliseconds before an auction closes, we must guarantee that:

Every bid is validated against the current highest bid.
Only the true highest bid is recorded.
The winner is identified with 100% accuracy at the exact closing timestamp.

This guide details the architectural blueprint for designing a resilient, scalable, and audit-compliant online auction system. We will address the structural bottlenecks of high-concurrency database writes, real-time update distribution, and transactional integrity under peak contention.

Requirements and System Goals

To design a high-throughput online auction platform, we must define clear operational boundaries, functional requirements, and strict performance targets.

Functional Requirements

Create Auction: Sellers must be able to list items with a starting price, description, reserve price, and a precise closing timestamp. The reserve price dictates the minimum acceptable amount at which the seller is obligated to sell the item; if the auction closes below this price, the item remains unsold.
Place Bid: Buyers must be able to place bids that are strictly greater than the current highest bid plus a minimum bid increment. The minimum bid increment scales dynamically based on the current price range of the listing (e.g., 50 cents increment for low-value items, 5 dollars increment for high-value items).
Real-Time Bid Updates: All users watching an active auction must see price increases immediately (latency less than 500ms).
Auction Closing & Winner Resolution: The system must close the auction at the exact end time, identify the winner, and trigger the payment claim process.
Bid History: Users must be able to view a chronological, immutable audit log of all validated bids for an auction, providing full transparency and facilitating dispute resolution.

Non-Functional Requirements

Strict Concurrency Controls: Prevent race conditions where two bids are accepted at the same price, or a lower bid overrides a higher one.
Low Latency Bid Processing: Under high contention, bid validations and write confirmations must execute in less than 50ms.
Scalability: Handle 1,000,000 active auctions and support surges of up to 50,000 bids per second on hot auction items in their closing seconds.
High Availability & Fault Tolerance: Ensure that bid ingestion and verification continue running even if secondary notification systems experience delays.
Atomicity (ACID): Ensure that bid state transitions and payment settlements are transactional.

API Interfaces and Service Contracts

To separate bidding activity from general catalog browsing, we define separate REST endpoints for administration and WebSockets/gRPC for runtime bid placements.

Create Auction Listing

Endpoint: POST /v1/auctions
Request Payload:

{
  "sellerId": "usr_99812",
  "title": "Vintage Mechanical Keyboard",
  "startingPrice": 50.00,
  "minIncrement": 5.00,
  "endAt": "2026-06-07T12:00:00Z"
}

Response Payload (HTTP 201 Created):

{
  "auctionId": "auc_7781-b21a-4c92",
  "status": "SCHEDULED",
  "createdAt": "2026-06-07T10:25:00Z"
}

Place Bid via REST API

For web clients that do not maintain persistent WebSocket connections, we support bid submissions via HTTP.

Endpoint: POST /v1/auctions/auc_7781-b21a-4c92/bids
Request Payload:

{
  "bidderId": "usr_11029",
  "bidAmount": 120.00,
  "maxProxyBid": 150.00
}

Response Payload (HTTP 200 OK):

{
  "bidId": "bid_992a-881c-4b11",
  "status": "ACCEPTED",
  "currentPrice": 120.00,
  "message": "You are the current highest bidder."
}

WebSocket Ingest and Update Contract

Web clients open a persistent WebSocket connection to receive real-time bid updates and submit bids.

WebSocket URL: ws://bids.codesprintpro.com/v1/auctions/auc_7781-b21a-4c92/stream
Incoming Bid Event (Client to Server):

{
  "action": "place_bid",
  "bidderId": "usr_11029",
  "bidAmount": 120.00
}

Outgoing Broadcast Event (Server to Clients):

{
  "event": "bid_update",
  "auctionId": "auc_7781-b21a-4c92",
  "currentPrice": 120.00,
  "highestBidderId": "usr_11029",
  "endAt": "2026-06-07T12:00:00Z",
  "timestamp": "2026-06-07T10:26:00.124Z"
}

High-Level Design and Visualizations

Our auction architecture separates the ingestion and validation paths from the historical database writes and real-time update channels.

The WebSocket Gateway Pool handles thousands of active client connections. It is decoupled from the business logic by routing raw messages through the Bid Validation Service. Validations occur in-memory using Redis, ensuring sub-millisecond verification. Once validated, bids are published to Kafka, where a consumer daemon writes them asynchronously to PostgreSQL. Simultaneously, updates are fanned out to all interested viewers.

End-to-End Bid Collection and Notification Fan-out

This diagram tracks the flow of a bid submission through validation, queueing, database ingestion, and update broadcasting.

flowchart TD
    Client[Bidding Client / Browser] -->|1. Submit Bid via WebSocket| WSGateway[WebSocket Gateway Pool]
    WSGateway -->|2. Route Bid Event| BidService[Bid Validation Service]
    
    subgraph Validation [Ultra-Low Latency Validation]
        BidService -->|3. Evaluate & Lock via Lua Script| RedisCache[(Redis In-Memory State Store)]
        RedisCache -->|4. Return Validation Result| BidService
    end
    
    BidService -->|5. Push Validated Bid Event| Kafka[Kafka Event Broker]
    BidService -->|6. Return ACK to Gateway| WSGateway
    WSGateway -->|7. Send Confirmation to Bidder| Client
    
    subgraph AsyncProcessing [Asynchronous Processing]
        Kafka -->|8. Consume Validated Bids| DBWriter[Database Writer Worker]
        DBWriter -->|9. Write to Ledger & History| MetaDB[(PostgreSQL Primary DB)]
        
        Kafka -->|10. Consume Broadcast Events| BroadcastService[Broadcast Service]
        BroadcastService -->|11. Push updates to viewers| WSGateway
    end

Scheduled Auction Closing Worker Workflow

This workflow details how the system closes auctions at the exact expiration time, prevents late bids, and initiates the payment process.

flowchart TD
    Cron[Distributed Cron Scheduler] -->|1. Trigger Expiry Event| CloseWorker[Auction Close Worker]
    CloseWorker -->|2. Update Status to CLOSED| RedisCache[(Redis State Store)]
    
    Note over RedisCache: Any incoming bids after this status change<br/>are instantly rejected at the memory layer
    
    CloseWorker -->|3. Fetch Winning Bid details| RedisCache
    CloseWorker -->|4. Update Auction Record to CLOSED| MetaDB[(PostgreSQL Primary DB)]
    CloseWorker -->|5. Insert Winning Claim Record| MetaDB
    CloseWorker -->|6. Publish Winner Resolved Event| Kafka[Kafka Event Broker]
    
    Kafka -->|7. Consume Claim| PaymentService[Payment Claim Service]
    PaymentService -->|8. Generate Checkout Link & Send Email| Notify[Notification Service]
    Notify -->|9. Email Winner| Winner[Winning Bidder]

Low-Level Design and Schema Strategies

We use a PostgreSQL database to manage auctions and bids. We maintain separate tables for auction listings, bid history, and winning claims.

PostgreSQL Table DDLs

-- Core auction listings table
CREATE TABLE auctions (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    seller_id VARCHAR(64) NOT NULL,
    title VARCHAR(256) NOT NULL,
    description TEXT,
    starting_price NUMERIC(18, 2) NOT NULL,
    min_increment NUMERIC(18, 2) NOT NULL,
    reserve_price NUMERIC(18, 2),
    current_price NUMERIC(18, 2) NOT NULL,
    highest_bidder_id VARCHAR(64),
    status VARCHAR(32) NOT NULL,                 -- 'SCHEDULED', 'ACTIVE', 'CLOSED', 'SUSPENDED'
    created_at TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
    started_at TIMESTAMPTZ,
    end_at TIMESTAMPTZ NOT NULL,
    updated_at TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP
);

-- Indexing for active auctions sorted by expiration time
CREATE INDEX idx_auctions_active_expiry 
ON auctions (status, end_at) 
WHERE status = 'ACTIVE';

-- Immutable historical bid ledger
CREATE TABLE bids (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    auction_id UUID NOT NULL REFERENCES auctions(id) ON DELETE RESTRICT,
    bidder_id VARCHAR(64) NOT NULL,
    bid_amount NUMERIC(18, 2) NOT NULL,
    proxy_amount NUMERIC(18, 2),                  -- Support for automatic bidding agents
    created_at TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
    status VARCHAR(32) NOT NULL                  -- 'ACCEPTED', 'OUTBID', 'REJECTED'
);

-- Compound index to quickly scan bid history for an auction
CREATE INDEX idx_bids_auction_amount 
ON bids (auction_id, bid_amount DESC);

-- Track winning claims and payment completions
CREATE TABLE auction_claims (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    auction_id UUID NOT NULL REFERENCES auctions(id) ON DELETE RESTRICT,
    winner_id VARCHAR(64) NOT NULL,
    winning_amount NUMERIC(18, 2) NOT NULL,
    claim_status VARCHAR(32) NOT NULL,           -- 'PENDING_PAYMENT', 'PAID', 'EXPIRED'
    created_at TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
    expires_at TIMESTAMPTZ NOT NULL,             -- Time limit to claim the item (e.g. 48h)
    payment_reference VARCHAR(256)
);

CREATE INDEX idx_auction_claims_winner 
ON auction_claims (winner_id, claim_status);

Schema Optimization & Indexing Strategies

idx_auctions_active_expiry: A partial index restricted to status = 'ACTIVE'. Since expired or scheduled auctions constitute the majority of historical data, this index remains small and resides entirely in RAM, allowing the close worker to fetch expiring auctions in microseconds.
idx_bids_auction_amount: Bids are read-heavy when rendering historical lists. This index organizes bids by auction_id and sorts them in descending order of bid_amount. It satisfies query lookups instantly without requiring in-memory sorts (which trigger heavy disk swapping when an auction has thousands of bids).
Foreign Key Restraints: We use ON DELETE RESTRICT for referential integrity. Deleting active listings with active bids is disabled to prevent orphan records in our billing paths.

Scaling and Operational Challenges

To design an auction system that scales to handle high-concurrency bidding wars, we must evaluate lock throughput and broadcast networks.

Back-of-the-Envelope Capacity Estimations

Let us evaluate the platform during a hot auction close with 100,000 active viewers watching the listing.

Peak Bidding Rate: Assume a surge of 10,000 bids/second in the final second of the auction.
Network Ingress Bandwidth: If each bid request payload is approximately 250 bytes: $$\text{Ingress volume} = 10,000 \times 250\text{ bytes} = 2.5\text{ MB/sec} = 20\text{ Mbps}$$ This is easily handled by modern server network interfaces.
PostgreSQL Database CPU Lock limits: If we query and lock PostgreSQL rows on every bid request:
```
SELECT current_price FROM auctions WHERE id = :id FOR UPDATE;
```
PostgreSQL locks the row, forcing concurrent requests to wait. This limits write throughput to less than 1,000 locks/second per database server, which is insufficient for a peak rate of 10,000 bids/second.
WebSocket Update Broadcast Bandwidth: When a bid is accepted, we must broadcast the update to all 100,000 viewers. Assume a broadcast event payload size of 500 bytes. $$\text{Single broadcast volume} = 100,000 \times 500\text{ bytes} = 50\text{ MB}$$ If the system accepts 5 bids/second during the peak window, the required egress broadcast bandwidth is: $$\text{Total broadcast egress rate} = 5 \times 50\text{ MB} = 250\text{ MB/sec} = 2\text{ Gbps}$$ To distribute this network load:
- We run a pool of 20 WebSocket servers.
- A Pub/Sub broker (e.g. Redis or Kafka) routes events to the WebSocket nodes, which broadcast the update to their connected clients.
- This reduces host egress requirements to a manageable 100 Mbps per node.

Trade-offs and Architectural Alternatives

Designing a large-scale real-time ingestion pipeline requires balancing write latency against data consistency guarantees.

Bidding Concurrency Models: Database Locking vs. Redis Lua Scripts

Dimension	Database Row Locking	Redis Lua Script Pre-Validation
Write Throughput	Low (limited by disk I/O and transaction lock wait times).	High (single-threaded execution allows up to 100,000 operations/sec).
Data Durability	Strong ACID guarantees; writes are committed to disk before confirmation.	Eventual; writes are flushed asynchronously, introducing a small data loss risk.
Operational Complexity	Low; uses standard SQL database features.	High; requires synchronization logic to sync Redis updates to the database.

We choose a hybrid approach: We route all bids through Redis first. A single-threaded Redis Lua script validates the bid amount and updates the price. This provides high throughput (over 10,000 operations/sec). Validated bids are then written to PostgreSQL asynchronously via a Kafka queue.

Database Architecture: Relational SQL vs. Columnar NoSQL

Relational Database (PostgreSQL):
- Pros: Strong ACID compliance, transactional guarantees, relational joins.
- Cons: Sharding is complex; locks limit throughput.
NoSQL Database (Cassandra):
- Pros: High write throughput; horizontal scaling.
- Cons: Lack of transactions; duplicate checks must be handled at the application layer.

Failure Modes and Fault Tolerance Strategies

In a high-scale transactional environment, components fail. We must design for resilience.

1. Redis Node Crashes and Split-Brain Partitions

If the Redis node containing the active auction state crashes, we could lose recent bid history.

Mitigation: We run Redis in active-passive pairs. We configure Sentinel or Raft consensus to automate failover. To prevent split-brain issues, the backup replica is promoted only if a quorum of Sentinel nodes confirms the primary node is unreachable.

2. Database Write Backlog

If the PostgreSQL writer worker falls behind, the database state could lag behind the in-memory Redis cache state.

Mitigation: We separate active bidding from search and listing queries. Read requests are served from the Redis cache, while the database writer worker uses a Kafka queue to write bids asynchronously, protecting the database from load spikes.

3. Out-of-Order Bid Delivery

Due to network delays, a bid placed at 11:59:59.001 might arrive at the gateway after a bid placed at 11:59:59.003.

Mitigation: We use server arrival timestamps to determine bid order. The gateway assigns a timestamp to each request upon receipt. The Lua script evaluates bids in the order they arrive at the server, ignoring client-side timestamps to prevent cheating.

Staff Engineer Perspective

Applying concurrency controls at scale requires understanding the physical limits of hardware and execution contexts.

Important

Atomic Bid Processing using Redis Lua Scripts To prevent race conditions without database row locks, we use a Redis Lua script. The script runs atomically on the Redis node, validating and updating the bid in a single operation:

-- Redis Lua script for atomic bid validation
local auctionKey = KEYS[1]
local bidAmount = tonumber(ARGV[1])
local bidderId = ARGV[2]

local currentPrice = tonumber(redis.call('HGET', auctionKey, 'currentPrice') or '0')
local status = redis.call('HGET', auctionKey, 'status')

if status ~= 'ACTIVE' then
    return {err = 'Auction is not active'}
end

if bidAmount <= currentPrice then
    return {err = 'Bid amount is too low'}
end

redis.call('HSET', auctionKey, 'currentPrice', bidAmount)
redis.call('HSET', auctionKey, 'highestBidderId', bidderId)
return {ok = 'Bid accepted'}

[!WARNING] Handling Distributed Timezone Clock Drift In distributed systems, clock drift can cause servers to close auctions at slightly different times. If a worker closes the auction 100ms early, legitimate final bids will be rejected. We mitigate this by using a centralized time service (e.g. AWS Time Sync) to keep clock drift under 10ms. Before closing an auction, the worker verifies timestamps against the centralized NTP clock.

Verbal Script

Interviewer: "How would you design a bidding system that can handle 10,000 bids per second in the final second of an auction without dropping bids or allowing race conditions?"

Candidate: "I would use a hybrid architecture combining in-memory pre-validation with asynchronous database writes.

First, I would route all incoming bids through a single-threaded Redis node using a Lua script. Since Redis executes commands sequentially in a single thread, the Lua script can validate and update the bid price atomically, preventing race conditions without requiring database row locks.

Second, once the Redis Lua script accepts a bid, we write a validation event to a Kafka queue. A pool of database writer workers consumes events from the queue and writes the bid records to PostgreSQL asynchronously, protecting the database from write spikes.

Finally, we serve all read requests (e.g., users viewing the current highest bid) from the Redis cache, ensuring query traffic does not impact the database write path."

Interviewer: "What happens if the Redis node crashes and is failover-promoted, but some validated bids are lost before they are written to the database?"

Candidate: "We use Kafka as a buffer to prevent data loss.

When the Redis Lua script validates a bid, the gateway writes the event to Kafka before returning a confirmation to the client.

If the Redis node crashes, the database writer worker continues to process events from the Kafka queue.

Once the backup Redis replica is promoted, it reads the last processed offsets from Kafka to rebuild the active auction state, ensuring no confirmed bids are lost during the failover."

Interviewer: "How would you handle bid retraction or fraud detection asynchronously without blocking the hot path?"

Candidate: "I would delegate bid retraction and fraud detection entirely to an asynchronous worker pool downstream of our Kafka event broker.

The hot ingest path in Redis only performs basic syntactical and mathematical validation (e.g., verifying if the bid is active and exceeds the minimum threshold). A fraud detection worker consumes from the validated bids topic, runs machine learning models or rule engines to analyze the user's bidding behavior, and, if fraud is detected, issues a retraction event.

This retraction event updates PostgreSQL, marks the bid as 'VOIDED' in our ledger, and executes a Redis script to adjust the current price and highest bidder back to the previous legitimate state. This ensures that fraud checks do not degrade ingestion throughput or increase latency on the hot path."