System Design: Designing Airbnb (Hotel/Home Booking)

Designing a platform like Airbnb or Booking.com involves two distinct technical challenges: Search (helping users find the perfect place out of millions) and Concurrency (ensuring that once found, the place can be booked for specific dates without double-booking).

1. Requirements & Core Constraints

To build a robust reservation platform, we must first establish the operational boundaries and expectations of the system. The platform serves two primary user bases: listings owners (hosts) who publish properties, and travelers (guests) who browse and book those properties.

Functional Requirements

Property Management (Host Pathway): Property owners must be able to list new homes, upload high-resolution photos, modify property details, set seasonal pricing, and update availability calendars.
Search & Discovery (Traveler Pathway): Users must be able to search listings based on geographic coordinates, a dynamic radius, check-in and check-out dates, price ranges, guest capacity, and customized lists of amenities.
Booking Engine: Travelers must be able to reserve a property for a specific date range. This process must place a temporary transactional lease on the target dates while payment processing occurs.
Payment Processing: Integrates with external payment processors (e.g., Stripe, PayPal) to confirm bookings idempotently.
Review & Rating System: Users can rate and review properties after completing a stay.

Non-Functional Requirements

High Availability for Search: The browse and search pathway must be highly available (99.99% SLA) and responsive, as travelers browse hundreds of listings before booking a single room.
Strong Consistency for Bookings: Under no circumstances should two users be allowed to book the exact same listing for overlapping dates. We must prioritize consistency over availability (CP in the CAP theorem) during the final reservation step.
Low Latency Search: The p99 latency for geospatial and filter queries must remain under $150\text{ms}$ globally.
Idempotency: All booking and payment requests must be fully idempotent to prevent duplicate charges or double-reservations due to retried network operations.

Back-of-the-Envelope Estimation

To understand the scalability demands, we calculate traffic volume, query rates, bandwidth, and storage capacity over a 5-year operating lifespan:

System Scale & QPS:
- Listing Inventory: $10,000,000$ active properties globally.
- Daily Active Users (DAU): $100,000,000$ travelers.
- Search QPS: Assuming an average traveler performs 10 search queries per day: $$\text{Daily Search Volume} = 100,000,000 \times 10 = 1,000,000,000 \text{ searches/day}$$ $$\text{Average Search QPS} = \frac{1,000,000,000}{86,400} \approx 11,574 \text{ QPS}$$ Assuming a peak traffic multiplier of $2\times$ during high-intent booking seasons: $$\text{Peak Search QPS} \approx 23,148 \text{ QPS}$$
- Booking QPS: Assuming $1%$ of DAUs complete a booking daily ($1,000,000$ bookings per day): $$\text{Average Booking QPS} = \frac{1,000,000}{86,400} \approx 11.6 \text{ QPS}$$ With a peak booking multiplier of $2\times$: $$\text{Peak Booking QPS} \approx 23.2 \text{ QPS}$$
Storage Footprint (5-Year Plan):
- Listing Metadata: 10 Million properties $\times$ 10 KB of structured data (titles, descriptions, amenities, address) = 100 GB.
- Temporal Availability (Inventory Slots): To optimize temporal lookup performance, we pre-generate listing slots day-by-day. Keeping a rolling window of 2 years (730 days) of availability per listing: $$10,000,000 \text{ listings} \times 730 \text{ days} = 7,300,000,000 \text{ daily inventory slots}$$ At 60 bytes per daily slot (status, price, version): $$7,300,000,000 \text{ slots} \times 60 \text{ bytes} \approx 438 \text{ GB (active working database footprint)}$$
- Media Storage (Photos): Assuming 10 photos per listing, averaging 2 MB each: $$10,000,000 \text{ listings} \times 10 \text{ photos} \times 2\text{ MB} = 200 \text{ TB}$$ This media is stored in an object store (e.g., AWS S3) and distributed worldwide via CloudFront or Akamai CDN edge nodes.

2. API Design & Core Contracts

The system requires simple, highly-optimized API endpoints to separate query pathways from command-based reservation workflows.

Search Listings

POST /api/v1/listings/search Retrieves all open listings near a target location for a specific date range.

Request Payload:

{
  "latitude": 48.8566,
  "longitude": 2.3522,
  "radius_km": 5.0,
  "check_in": "2026-06-01",
  "check_out": "2026-06-05",
  "guests": 2,
  "filters": {
    "min_price": 50,
    "max_price": 250,
    "amenities": ["wifi", "air_conditioning"]
  },
  "page_size": 20,
  "cursor": "eyJvZmZzZXQiOjIwfQ=="
}

Response Payload:

{
  "listings": [
    {
      "listing_id": "list_83901a8f",
      "title": "Charming Parisian Loft near Seine",
      "latitude": 48.8584,
      "longitude": 2.3530,
      "price_per_night": 120.0,
      "rating": 4.92,
      "thumbnail_url": "https://cdn.codesprintpro.com/listings/83901a8f/thumb.jpg"
    }
  ],
  "next_cursor": "eyJvZmZzZXQiOjQwfQ=="
}

Reserve Listing (Hold Lease)

POST /api/v1/bookings/reserve Acquires a temporary checkout lease on a listing to prevent double-booking during payment.

Request Payload:

{
  "listing_id": "list_83901a8f",
  "check_in": "2026-06-01",
  "check_out": "2026-06-05",
  "guest_id": "usr_9981a2bc",
  "idempotency_key": "idem-uuid-9f8a-45c1-92b1"
}

Response Payload:

{
  "booking_id": "book_3389a01f",
  "listing_id": "list_83901a8f",
  "status": "PENDING_PAYMENT",
  "check_in": "2026-06-01",
  "check_out": "2026-06-05",
  "total_price": 480.00,
  "lease_expires_at": "2026-05-22T16:41:23Z"
}

3. High-Level Design (HLD)

The architecture is structured around Command Query Responsibility Segregation (CQRS). Since read traffic (searching and viewing properties) is vastly greater than write traffic (making a booking), the read pathway is entirely decoupled from the transactional database.

graph TD
    Client[Client App/Browser] -->|1. Search query| APIGateway[API Gateway & Rate Limiter]
    Client -->|4. Reserve Listing| APIGateway
    
    %% Read Path (Search)
    APIGateway -->|Read Path| SearchService[Search Service]
    SearchService -->|Query Geospatial & Dates| ESCluster[(Elasticsearch Cluster)]
    
    %% Write Path (Booking & Inventory)
    APIGateway -->|Write Path| BookingService[Booking & Reservation Service]
    BookingService -->|ACID Concurrency Check| InventoryDB[(PostgreSQL Primary)]
    
    %% Synchronization Pipeline (CDC)
    InventoryDB -->|Write Ahead Logs| CDC[Debezium CDC Broker]
    CDC -->|Stream mutations| Kafka[Kafka Event Bus]
    Kafka -->|Consume & Reindex| SyncWorker[Sync Index Workers]
    SyncWorker -->|Bulk Index Updates| ESCluster
    
    %% Payment & Leases
    BookingService -->|Acquire Temporary Lease| RedisCache[(Redis Lock Cluster)]
    BookingService -->|Trigger Charge| PaymentService[Payment Gateway Service]
    PaymentService -->|Verify Status| StripeAPIs[Stripe/PayPal API]

Flow Breakdown:

The Search Journey (Read Path): When a traveler searches for a property, the request passes through the API Gateway to the Search Service. This service queries the Elasticsearch Cluster, which index geographic coordinates and date ranges for instant matches.
The Booking Journey (Write Path): When a traveler clicks 'Book', the request routes to the Booking Service. This service coordinates reservations inside our strongly-consistent PostgreSQL Database by checking availability slots.
Change Data Capture Synchronization: To keep the search indices fresh, any database mutations inside PostgreSQL are streamed immediately via Debezium CDC to Apache Kafka. Reindexing workers consume these events in batches and execute bulk updates on the Elasticsearch Cluster.
Temporary Reservation Leases: To prevent other travelers from attempting to book the same room while the payment window is open (which averages 5-10 minutes), the system acquires a temporary lease inside the Redis Lock Cluster and sets a TTL.

Reservation Lease State Machine

Managing availability requires tracking active locks through a clear, linear state machine:

stateDiagram-v2
    [*] --> Available : Free for dates
    Available --> PendingHold : User reserves (POST /reserve)
    state PendingHold {
        [*] --> LockAcquired : Redis key set
        LockAcquired --> InventoryUpdated : DB Pessimistic Lock Checked
    }
    PendingHold --> Booked : Payment Clears (POST /confirm)
    PendingHold --> Available : Lease Expires / Payment Fails (TTL Cleanup)
    Booked --> Refunded : Booking Cancelled

4. Low-Level Design (LLD) & Data Models

Database Selection Rationale

Primary Source of Truth (PostgreSQL): An ACID-compliant relational engine is necessary to manage listings, bookings, and daily inventory. We use PostgreSQL to run serializable transactions and row-level locks, ensuring mathematical guarantees against double-bookings.
Search Engine (Elasticsearch): Elasticsearch is optimized for complex filter combinations (e.g., location, guest count, amenities, date bounds, price) with sub-second latencies.
High-Speed Cache & Lock coordinator (Redis): Storing active checkout leases as temporary keys with TTL limits avoids database bloating and shields PostgreSQL from heavy traffic during rapid shopping-cart abandonments.

SQL DDL Database Schemas

Listings Table

Stores core host property details, including geosearch structures.

CREATE TABLE listings (
    id VARCHAR(64) PRIMARY KEY,
    owner_id VARCHAR(64) NOT NULL,
    title VARCHAR(255) NOT NULL,
    description TEXT,
    price_per_night DECIMAL(12, 2) NOT NULL,
    max_guests INT NOT NULL,
    latitude DOUBLE PRECISION NOT NULL,
    longitude DOUBLE PRECISION NOT NULL,
    geo_point POINT NOT NULL,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);

-- Spatial GIST Index to support fast geospatial range and circle queries
CREATE INDEX idx_listings_geo ON listings USING GIST (geo_point);

Daily Listing Inventory Table

To avoid expensive database overlap queries, we pre-generate a separate row for each listing day.

CREATE TABLE listing_inventory (
    listing_id VARCHAR(64) REFERENCES listings(id) ON DELETE CASCADE,
    inventory_date DATE NOT NULL,
    status VARCHAR(32) NOT NULL DEFAULT 'AVAILABLE', -- 'AVAILABLE', 'PENDING_LEASE', 'BOOKED'
    lease_expires_at TIMESTAMP WITH TIME ZONE NULL,
    version INT NOT NULL DEFAULT 1,
    PRIMARY KEY (listing_id, inventory_date)
);

-- High-performance index for temporal range queries
CREATE INDEX idx_inventory_lookup ON listing_inventory (listing_id, inventory_date, status);

Bookings Table

Stores traveler booking metadata and holds the financial transactional audit trail.

CREATE TABLE bookings (
    id VARCHAR(64) PRIMARY KEY,
    listing_id VARCHAR(64) REFERENCES listings(id) ON DELETE RESTRICT,
    guest_id VARCHAR(64) NOT NULL,
    check_in DATE NOT NULL,
    check_out DATE NOT NULL,
    total_price DECIMAL(12, 2) NOT NULL,
    status VARCHAR(32) NOT NULL DEFAULT 'PENDING_PAYMENT', -- 'PENDING_PAYMENT', 'CONFIRMED', 'CANCELLED'
    idempotency_key VARCHAR(128) UNIQUE NOT NULL,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX idx_bookings_guest ON bookings(guest_id);
CREATE INDEX idx_bookings_dates ON bookings(listing_id, check_in, check_out);

5. Scaling Challenges & Database Bottlenecks

As traffic scales to 100M active users, several architectural hotspots and bottlenecks emerge at the data layer:

Handling Geographic Data Sharding

A single database server cannot handle the global read/write load of 10M listings. We shard our databases geographically (by country or continent, e.g., US, EU, APAC). Since travelers almost always search and book properties within the same geographical region, 99% of our queries can be completed inside a single, local database shard. This avoids the latency overhead and complexity of cross-shard distributed transactions.

Preventing Hot Shards and Celeb Properties

Highly popular properties (e.g., unique luxury villas or heavily-marketed listings) receive sudden spikes in views and reservation attempts.

Cache-Aside Architecture: We cache static property metadata in Redis for 1 hour to prevent read traffic from hitting the database.
Write Coalescing: When checking availability, instead of hitting the primary PostgreSQL database immediately, the reservation engine checks a Redis distributed lock. If a transaction is already holding a lock on the property for those dates, competing booking requests are aborted early at the edge, saving database capacity.

6. Technical Trade-offs & Consistency Models

When designing a highly concurrent reservation engine, architects face crucial architectural tradeoffs regarding database transaction isolation and data replication strategies.

Tradeoff A: Pessimistic Row Locking vs. Optimistic Concurrency Control (OCC)

Pessimistic Locking (SELECT ... FOR UPDATE):
- Pros: Mathematically guarantees that a transaction holds exclusive control over inventory rows, completely eliminating the possibility of race conditions.
- Cons: Holds long-lived lock queues on the database, blocking other concurrent read/write threads and limiting booking throughput under high contention.
Optimistic Concurrency Control (OCC):
- Pros: Highly efficient in low-contention environments. It allows multiple users to read the same rows without locking, significantly reducing transaction latency.
- Cons: Under high contention (e.g., many users trying to book the same room simultaneously), OCC leads to high transaction abort rates, forcing clients to retry repeatedly and causing CPU spikes on the database.

Tradeoff B: CP vs. AP in Eventual Consistency

To scale search, the system replicates PostgreSQL updates to Elasticsearch asynchronously via a Kafka CDC pipeline. This is an AP (Eventual Consistency) design. During the CDC propagation delay, travelers may see stale search results, showing a recently booked property as still available.

To mitigate this, we accept eventual consistency for general searches, but enforce Strong Consistency (CP) during the final listing check: when a traveler clicks onto the detailed listing page, the page queries PostgreSQL directly to ensure the user only sees live, accurate availability details.

7. Resilience & Failure Scenarios

Operating at massive scale requires robust mitigation patterns for critical failure modes:

Scenario A: Network Partition During Payment Handoff (The Dual-Write Problem)

What happens if the Booking Service charges a user's credit card through Stripe, but the network drops before the Booking Service can write CONFIRMED to the database?

The Mitigation: Transactional Outbox Pattern. Instead of calling the Payment service synchronously inside our reservation transaction, the Booking Service writes a PaymentPendingEvent into a local Outbox table inside the same PostgreSQL transaction. A separate Outbox processor reads this table and publishes it to Kafka, ensuring that database updates and payment requests are decoupled and executed reliably.
Idempotency Keys: We generate a unique idempotency_key (derived from guest_id, listing_id, check_in, and check_out) before charging the user. If the network drops and the request is retried, Stripe recognizes the key and returns the existing transaction status rather than executing a duplicate charge.

Scenario B: Database Shard Outages

If a geographical database shard (e.g., the EU shard) goes down, we must prevent the entire global platform from crashing.

Regional Isolation (Bulkheads): We isolate our microservices by region. If the EU database shard crashes, the US and APAC services continue to operate without interruption.
Read-Only Mode: During a primary database outage, we route all queries to read-replicas, allowing travelers to continue searching and viewing listings, while only disabling new bookings.

8. Staff Engineer Perspective (Deep-Dive Callouts)

9. Candidate Verbal Script (Interview Guide)

Interviewer: "How would you design the database locking mechanism to prevent two users from booking the same property for the same dates?"

Candidate: "Double-booking is a critical business failure. We must prioritize strong consistency (CP in CAP) over latency during reservation.

To solve this, I would model the availability as discrete daily rows in our relational database (listing_inventory), sharded geographically by property location to keep transaction scopes localized. When a traveler initiates a booking, we start a transaction and lock those specific dates using a pessimistic lock:

SELECT status FROM listing_inventory 
WHERE listing_id = :id AND inventory_date BETWEEN :in AND :out 
FOR UPDATE;

This blocks any other transaction trying to modify or lock these exact dates for this property. If the query confirms all target dates are AVAILABLE, we update their status to PENDING_LEASE with a 10-minute expiration timestamp and release the lock.

If the lock request blocks for too long, or the dates are occupied, we abort quickly and return an HTTP 409 conflict code. This keeps our database queue short and responsive."

Interviewer: "Doesn't pessimistic locking create a massive database bottleneck when millions of users are searching and viewing listings at the same time?"

Candidate: "No, because we separate our read queries from our transactional write queries using a CQRS architecture.

Travelers searching and browsing use a decoupled search index powered by Elasticsearch. The read path does not touch PostgreSQL at all. PostgreSQL is only queried in two instances: when a user opens a specific property listing detail page—where we run a highly-indexed point-lookup on the current live state—and when they actually click 'Book'. Since booking traffic is a fraction of a percent of search traffic (about 11.6 QPS globally on average), PostgreSQL is shielded from massive search traffic spikes, ensuring that locking contention is exceptionally low."