System Design: Designing a Real-time Bidding (RTB) Ad System

Real-time Bidding (RTB) is the operational foundation of modern digital advertising. When you load a website or open a mobile app, an automated auction runs in the background to decide exactly which advertisement you see. The entire process—from the moment the page starts loading to the ad being rendered—must complete within a strict latency window of less than 100 milliseconds.

This case study breaks down how to design a distributed, ultra-low-latency real-time bidding system capable of handling hundreds of billions of daily impressions.

1. Requirements & Core Constraints

To design an ad-bidding architecture at global scale, we must establish strict functional and non-functional requirements.

Functional Requirements

Ad Impression Ingestion: The system must accept ad placement requests from publishers (via Supply-Side Platforms).
Targeted Bid Execution: Advertisers (via Demand-Side Platforms) must be able to bid on individual impressions based on user demographics, geohash locations, and historical browsing behaviors.
Budget Tracking & Pacing: The system must dynamically pacing advertiser budgets across the day, ensuring they do not deplete a 10,000 USD daily budget in the first five minutes.
Ad Delivery & Win Notification: The auction winner's ad markup must be delivered to the client browser, and a win notification event must be fired back to the DSP for billing.
Fraud Filtering: Pre-bid analysis must filter out invalid traffic (bots, crawlers) before executing bids.

Non-Functional Requirements & SLAs

Ultra-Low Latency: The end-to-end auction process must complete within 100 milliseconds. The internal latency budget allocated to a Demand-Side Platform (DSP) is less than 45 milliseconds at P99.
Massive Scale & Throughput: The system must handle an average of 1 Million requests per second (RPS), with peaks reaching 1.5 Million RPS.
Strict Idempotency: Win notifications must be processed exactly once to prevent double-billing.
High Availability: The bidding gateway must achieve 99.999% availability to avoid direct loss of advertising revenue.
Eventual Consistency for Budgets: While bidding is highly available, budget consumption trackers must converge within seconds to prevent massive overspending.

Back-of-the-Envelope Estimates

Let's calculate the system scaling bounds for a platform processing 100 Billion impressions per day:

Average Request Throughput: $$\text{Average QPS} = \frac{100,000,000,000 \text{ impressions}}{86,400 \text{ seconds}} \approx 1,157,400 \text{ QPS}$$ $$\text{Peak QPS} = 1,157,400 \text{ QPS} \times 1.5 \approx 1,736,100 \text{ QPS}$$
Network Bandwidth (SSP to Ad Exchange): If each auction request payload is approximately 2 Kilobytes (containing user agent, IP, geohash, and publisher ad slot constraints) and the response is 1 Kilobyte: $$\text{Total Ingress Bandwidth} = 1,157,400 \text{ QPS} \times 2 \text{ KB} \approx 2.31 \text{ GB/sec}$$ $$\text{Total Egress Bandwidth} = 1,157,400 \text{ QPS} \times 1 \text{ KB} \approx 1.15 \text{ GB/sec}$$
DSP User Profile Storage (Aerospike): Assume the DSP maintains profiles for 1 Billion active users. Each profile stores:
- User ID (UUID): 16 bytes
- Demographic/Segment Tags (e.g., 20 integers): 80 bytes
- Dynamic History (last 50 categories visited): 200 bytes
- Device & Location State: 100 bytes
- Total Size per User Profile: $\approx 400$ bytes $$\text{Persistent Cache Sizing} = 1,000,000,000 \text{ users} \times 400 \text{ bytes} = 400 \text{ GB}$$ Accounting for indices and 3x active-active regional replication, the in-memory/SSD hybrid profile store requires approximately 1.2 Terabytes of RAM/NVMe storage.

2. API Design & Core Contracts

The communication between the Supply-Side Platform (SSP), Ad Exchange, and Demand-Side Platforms (DSP) must utilize an ultra-efficient binary serialization format like gRPC (Protocol Buffers) to minimize processing overhead. Below is the production-ready Proto3 schema defining the bidding interface.

syntax = "proto3";

package rtb.v1;

service BidderService {
  // Executes a low-latency bid assessment against the DSP engine
  rpc ProcessBid(BidRequest) returns (BidResponse);
  
  // Asynchronous callback to confirm auction win and trigger billing
  rpc ConfirmAuctionWin(WinNotificationRequest) returns (WinNotificationResponse);
}

message BidRequest {
  string request_id = 1;
  string user_id = 2;
  string ip_address = 3;
  string user_agent = 4;
  string geohash = 5;
  
  message AdSlot {
    string slot_id = 1;
    uint32 width = 2;
    uint32 height = 3;
    repeated string allowed_categories = 4;
    uint64 floor_price_micros = 5; // e.g., $1.50 CPM is represented as 1,500,000 micros
  }
  
  repeated AdSlot slots = 6;
  uint64 max_timeout_ms = 7;
}

message BidResponse {
  string request_id = 1;
  bool is_bid_placed = 2;
  
  message BidItem {
    string slot_id = 1;
    string creative_id = 2;
    uint64 bid_price_micros = 3; // Bid price in micros (1/1,000,000 of currency unit)
    string ad_markup = 4;        // HTML/JS payload containing ad assets
    string nurl = 5;             // Win notice URL triggered on win
  }
  
  repeated BidItem bids = 3;
  string error_message = 4;
}

message WinNotificationRequest {
  string auction_id = 1;
  string slot_id = 2;
  string creative_id = 3;
  uint64 final_settlement_price_micros = 4; // Second-price auction settlement value
  string billing_account_id = 5;
  string transaction_id = 6;
}

message WinNotificationResponse {
  bool success = 1;
  string message = 2;
}

3. High-Level Design (HLD)

The architecture of a real-time bidding ecosystem is highly distributed and must bypass traditional centralized bottlenecks. The design is separated into two primary frameworks: the Ad Exchange Auction Platform and the Demand-Side Platform (DSP) Bidding Platform.

End-to-End Auction Request Lifecycle

The diagram below details the operational sequence from the moment a user browser requests an ad slot down through the SSP, Ad Exchange, and multi-threaded DSP bidder nodes.

graph TD
    User([User Browser]) -->|1. Load Page & Request Ad| SSP[Supply-Side Platform]
    SSP -->|2. Send OpenRTB Request| Exchange[Ad Exchange Auction Coordinator]
    
    subgraph Ad Exchange Cluster
        Exchange -->|3. Route BidRequest in Parallel| DSP1[DSP 1: Ad-Tech Engine]
        Exchange -->|3. Route BidRequest in Parallel| DSP2[DSP 2: Ad-Tech Engine]
        Exchange -->|3. Route BidRequest in Parallel| DSP3[DSP 3: Ad-Tech Engine]
    end
    
    subgraph DSP 1 Internals
        DSP1 -->|4. Query Profile| Cache[(Aerospike User Profiles)]
        DSP1 -->|5. Check Pace| Redis[(Redis Budget Tracker)]
        DSP1 -->|6. ML Scoring| ModelServer[ML CTR Predictor]
    end
    
    DSP1 -->|7. Return BidResponse| Exchange
    DSP2 -->|7. Return BidResponse| Exchange
    DSP3 -->|7. Timeout / No Bid| Exchange
    
    Exchange -->|8. Determine Winner & Render ad| SSP
    SSP -->|9. Serve HTML Creative| User
    User -->|10. Trigger Win Beacon URL| Exchange
    Exchange -->|11. Async Win Notice| DSP1

DSP Bidding Platform Architecture

The Demand-Side Platform (DSP) requires a massive stateful cache and asynchronous background worker queues to process real-time bid computations, budget paces, and telemetry analytics without introducing blocking bottlenecks.

graph LR
    Exchange[Ad Exchange] -->|gRPC BidRequest| EdgeGate[Envoy Edge Gateway]
    EdgeGate -->|Internal Load Balancing| BiddingAgent[Bidding Engine Core]
    
    BiddingAgent -->|1. Read Profile| Aerospike[(Aerospike User Store)]
    BiddingAgent -->|2. Read Active Budgets| Redis[(Redis Cluster - Budget Pacing)]
    
    BiddingAgent -->|3. ML Inference| Triton[Triton GPU Inference Cluster]
    
    BiddingAgent -->|4. Push Event| Kafka[Kafka Event Bus]
    
    subgraph Event Analytics Pipeline
        Kafka -->|Stream Aggregation| Flink[Apache Flink Engine]
        Flink -->|Hourly Aggregates| ClickHouse[(ClickHouse Analytics)]
        Flink -->|Real-Time Spend Deducts| Redis
    end

4. Low-Level Design (LLD) & Data Models

Operating at peak performance requires specialized data stores. We store user targeting parameters in Aerospike (key-value flash optimization) and maintain campaign spend limits in a relational database like PostgreSQL (for strong ACID compliance) paired with a high-speed Redis layer for hot pacing counters.

Relational Schema (PostgreSQL): Campaign & Budget Configuration

-- Represents advertiser accounts
CREATE TABLE advertiser_accounts (
    account_id VARCHAR(64) PRIMARY KEY,
    company_name VARCHAR(255) NOT NULL,
    account_status VARCHAR(32) NOT NULL DEFAULT 'ACTIVE',
    currency VARCHAR(3) NOT NULL DEFAULT 'USD',
    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);

-- Represents individual advertising campaigns
CREATE TABLE campaigns (
    campaign_id VARCHAR(64) PRIMARY KEY,
    account_id VARCHAR(64) REFERENCES advertiser_accounts(account_id),
    campaign_name VARCHAR(255) NOT NULL,
    daily_budget_micros BIGINT NOT NULL,   -- Daily budget (e.g. 5,000,000,000 for $5000)
    lifetime_budget_micros BIGINT NOT NULL,
    start_time TIMESTAMP WITH TIME ZONE NOT NULL,
    end_time TIMESTAMP WITH TIME ZONE NOT NULL,
    pacing_strategy VARCHAR(32) NOT NULL DEFAULT 'STANDARD', -- STANDARD or ACCELERATED
    campaign_status VARCHAR(32) NOT NULL DEFAULT 'PAUSED',
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);

-- Tracks transactional billing ledger for campaigns
CREATE TABLE campaign_billing_ledger (
    transaction_id VARCHAR(64) PRIMARY KEY,
    campaign_id VARCHAR(64) REFERENCES campaigns(campaign_id),
    amount_spent_micros BIGINT NOT NULL,
    recorded_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);

-- Create index to allow fast lookup of active campaigns by schedule and status
CREATE INDEX idx_campaign_status_schedule ON campaigns (campaign_status, start_time, end_time);

In-Memory/Flash Cache Model (Aerospike): User Targeting

Aerospike stores user records using primary keys matching the user_id.

{
  "key": "user:908234710923847",
  "bins": {
    "segments": [101, 204, 305],      // Demographic segments (int array)
    "geo_last": "dr5reg",              // Geohash coordinate matching region
    "history": ["electronics", "shoes"], // Last visited categories
    "update_ts": 1779435600            // Unix Epoch timestamp of update
  }
}

Compilable Python Implementation: Dynamic Budget Pacing Controller

The budget pacing component is critical. It calculates a dynamic bid multiplier depending on the elapsed time of the day and the remaining balance. If a campaign is spending its budget too fast, the pacing controller lowers the bid multiplier to stretch the budget across the target period.

import time
import math
import logging

class BudgetPacingController:
    def __init__(self, campaign_id: str, daily_budget_micros: int, start_time_epoch: float, duration_seconds: float = 86400.0):
        self.campaign_id = campaign_id
        self.daily_budget_micros = daily_budget_micros
        self.start_time_epoch = start_time_epoch
        self.duration_seconds = duration_seconds
        logging.basicConfig(level=logging.INFO)
        self.logger = logging.getLogger(f"Pacing-{campaign_id}")

    def calculate_bid_multiplier(self, current_spend_micros: int, current_time_epoch: float) -> float:
        """
        Calculates a dynamic bidding multiplier based on budget spend rate.
        Returns a float between 0.0 (stop bidding) and 1.0 (bid full value).
        """
        elapsed_time = current_time_epoch - self.start_time_epoch
        
        # Ensure boundary cases do not throw errors
        if elapsed_time <= 0:
            return 1.0
        
        if elapsed_time >= self.duration_seconds:
            return 0.0 if current_spend_micros >= self.daily_budget_micros else 1.0

        # Calculate the ideal target spend percentage based on time progression
        time_progress_ratio = elapsed_time / self.duration_seconds
        target_spend_micros = self.daily_budget_micros * time_progress_ratio
        
        # Avoid division-by-zero errors if no budget has been spent yet
        if current_spend_micros <= 0:
            return 1.0

        # Calculate deviation factor
        spend_ratio = current_spend_micros / self.daily_budget_micros
        
        if spend_ratio >= 1.0:
            self.logger.warning(f"Campaign {self.campaign_id} budget depleted! Spend: {current_spend_micros}")
            return 0.0

        # Pacing optimization logic: Compare current spend rate to progress rate
        # spend_acceleration = current spend / target spend
        spend_acceleration = current_spend_micros / target_spend_micros

        # If spending too fast, damp the multiplier exponentially
        if spend_acceleration > 1.1:
            # Dampening factor scales inversely with spend acceleration
            multiplier = math.exp(1.0 - spend_acceleration)
            multiplier = max(0.1, min(1.0, multiplier))
            self.logger.info(f"Spending too fast ({spend_acceleration:.2f}x). Multiplier damped to {multiplier:.4f}")
            return multiplier
        
        # If spending too slow, keep the multiplier high (1.0) to win more auctions
        return 1.0

# Integration Verification Test Case
if __name__ == "__main__":
    # Simulate a campaign with a 10,000 USD daily budget ($10,000 = 10,000,000,000 micros)
    daily_budget = 10_000_000_000
    start_time = time.time()
    
    controller = BudgetPacingController(
        campaign_id="camp_9921",
        daily_budget_micros=daily_budget,
        start_time_epoch=start_time
    )

    # Test case 1: 50% through the day, but spent 80% of budget (Overspending)
    simulated_elapsed_time = 43200.0 # 12 hours
    simulated_spend = 8_000_000_000   # 8,000 USD
    current_epoch = start_time + simulated_elapsed_time
    
    multiplier = controller.calculate_bid_multiplier(simulated_spend, current_epoch)
    print(f"Calculated Bidding Multiplier: {multiplier:.4f} (Expected damping under 0.5)")
    
    # Test case 2: 50% through the day, spent 40% of budget (Healthy standard pacing)
    simulated_spend_healthy = 4_000_000_000 # 4,000 USD
    multiplier_healthy = controller.calculate_bid_multiplier(simulated_spend_healthy, current_epoch)
    print(f"Calculated Bidding Multiplier (Healthy): {multiplier_healthy:.4f} (Expected close to 1.0)")

5. Scaling Challenges & Bottlenecks

Operating a digital bidding engine at peak scales demands advanced optimizations across caching, database partitioning, and query routing layers.

Mitigation of Campaign Budget Overdraft

With thousands of bidding nodes checking campaign budgets concurrently, using traditional transaction isolation levels like SERIALIZABLE on core database tables introduces immediate locking bottlenecks.

The Solution: We implement a distributed ticket bucket architecture. Instead of calling a central database on every bid, the central database divides the daily campaign budget into smaller, local ticket pools (e.g., allocating a 100 USD sub-budget) and distributes these pools to regional Redis cache shards every 30 seconds. Bidding servers deduct budget locally from their nearest Redis shard using fast atomic operations (DECRBY). When a shard runs out of its sub-budget, it requests another allocation from the database. This limits the maximum possible budget overspend to the small, active sub-budget allocation size.

Hot Keys & User Profile Caching

If a celebrity loads a web page or a viral domain fires billions of impressions, a small subset of advertiser campaigns or publisher profiles becomes highly active.

The Solution: Consistent Hashing with virtual nodes distributes user profile keys evenly across an Aerospike cluster. Local in-memory caches (Guava/Caffeine caches inside the Bidding Agent JVM) act as L1 layers, storing highly active keys (like system floor-prices and globally popular campaign configs) for 1 second. This offloads 99% of hot key read traffic from the network storage fleet.

6. Technical Trade-offs & Compromises

Every architectural choice in low-latency environments involves critical tradeoffs.

Latency vs. Accuracy in Targeting

                  ┌──────────────────────────────┐
                  │       Latency Budget: 45ms   │
                  └──────────────┬───────────────┘
                                 │
         ┌───────────────────────┴───────────────────────┐
         ▼                                               ▼
┌─────────────────────────────────┐             ┌─────────────────────────────────┐
│     Fast Path (3ms)             │             │     Slow Path (40ms)            │
├─────────────────────────────────┤             ├─────────────────────────────────┤
│ • Simple KV Lookup (Aerospike)  │             │ • Complex Vector Search (ANN)   │
│ • Static Campaign Matching      │             │ • Multi-layered Neural Ranking  │
│ • Lower CTR (Click-Through)     │             │ • Higher CTR (Click-Through)    │
│ • Guaranteed low latency        │             │ • Risk of Timeout (No Bid)      │
└─────────────────────────────────┘             └─────────────────────────────────┘

When a bid request arrives, the DSP Bidding Agent must choose how deep to profile the user and score ad creatives:

The Tradeoff: Using advanced Approximate Nearest Neighbors (ANN) vector database lookups paired with deep-learning multi-stage ranking models provides high ad targeting precision (CTR), but can take up to 40ms to complete. In contrast, retrieving static demographic matches directly from a key-value store (Aerospike) takes under 3ms but yields lower conversion precision.
Operational Strategy: The system executes a hybrid model. If the bidding server's internal thread queues are shallow, it triggers the deep-learning ranking path. If queue depth rises or latency budgets run tight, the gateway dynamically falls back to the low-latency key-value targeting path, guaranteeing a bid response is delivered before the deadline.

CAP Posture: Availability vs. Consistency

Posturing: We prioritize Availability & Partition Tolerance (AP) for the bidding loop. If the database coordinating global budgets becomes partitioned or slow, the bidding agents must continue serving bids using cached sub-budgets. We compromise on immediate consistency, accepting minor budget overruns to avoid losing millions of dollars in unplaced bids due to system downtime.

7. Failure Scenarios & Operational Resiliency

Building a resilient RTB system requires expecting failures at all points of the network.

Handling Network Latency & Slow Bidders

If a single DSP node becomes slow or a regional fiber connection drops, the entire auction coordinator can experience head-of-line blocking while waiting for bid responses.

Mitigation: The Ad Exchange Auction Coordinator enforces a strict, hard timeout of 45ms on all outgoing gRPC calls. If a DSP fails to return its bid response within 45ms, the connection is instantly severed, the bidder is discarded from the auction pool, and the Exchange resolves the auction using the remaining bids. Bidders are protected on their end by local circuit breakers that trip if downstream caches are unresponsive.

Graceful Degradation Under Extreme Peak Load

During massive scale events (e.g., Super Bowl ads, global breaking news), request traffic can double in seconds.

Mitigation: The Envoy edge proxy gateways detect high server resource utilization and activate load shedding filters. The gateway immediately drops lower-value auction channels, rejects requests missing key device signatures, and redirects bids to cached fallback creatives without performing deep target profiling. This protects core computing resources from crashing under volume spikes.

8. Candidate Verbal Script

Mock Interview Sequence

Interviewer: How would you ensure that a Demand-Side Platform (DSP) does not overshoot its advertiser's daily budget when processing millions of bids per second across a global server fleet?

Candidate: "This is a classic concurrency problem where strong global consistency introduces a severe bottleneck. If we used centralized database locks to track and deduct budget balance on every bid, the system would immediately saturate.

To solve this, I would implement a Distributed Budget Allocation Engine. The central database (PostgreSQL) remains the single source of truth for global balances, but we decouple the real-time critical path from it. We break the daily campaign budget into small sub-budgets—for instance, allocating $10 units. These sub-budgets are distributed to regional Redis clusters that are co-located with our bidding agents.

When a bidding agent processes an auction, it performs an atomic decrement (DECRBY) against the local Redis shard. Because this operation runs in-memory within the same local region, latency remains sub-millisecond.

If a Redis shard exhausts its sub-budget, the agent stops bidding for that campaign until the background budget allocator pushes a new allocation. This design keeps the bidding loop highly available, bounds the maximum possible overspend of a campaign to the remaining sub-budget allocations, and completely protects our primary transactional database from real-time bidding traffic."

Interviewer: What happens if the regional Redis shard becomes unreachable? Does the bidding agent fail or fallback?

Candidate: "If a bidding agent loses connection to its local Redis budget shard, it instantly degrades to a Fail-Safe Mode. Rather than failing the request and losing bidding opportunities, the agent falls back to a locally cached memory map that holds the last known active campaigns from its local JVM cache.

However, to mitigate the risk of overspending while disconnected, the agent reduces its bid prices by a pre-configured safe multiplier (e.g., 0.5) and only bids on high-confidence impressions. As soon as the connection to the Redis shard is re-established, the agent syncs its local tracking logs to correct the balances."