System Design: Designing a Proximity Service (Yelp/Google Maps)

Designing a proximity service like Yelp, TripAdvisor, or Google Maps requires solving a fundamental data indexing problem: how to query millions of point-of-interest (POI) entities in a two-dimensional space based on a user's dynamic coordinates with sub-50ms latencies.

Standard database indexes (like B-Trees) are one-dimensional and fail to scale for geographic searches. This case study details the spatial indexing models, databases, and scaling techniques required to power global-scale proximity services.

1. Requirements & Core Constraints

To establish our technical architecture, we must first define the load parameters, latency SLAs, and scale constraints of our proximity service.

Functional Requirements

Add/Modify Business (Host Pathway): Owners must be able to register new business listings, update static metadata (menus, hours, pricing), and upload photos.
Proximity Discovery (Traveler Pathway): Travelers must be able to search for nearby businesses within a specific radius (e.g., 1 km, 5 km, 10 km) of their current coordinates.
Dynamic Filters: Travelers can narrow search results down by listing category, ratings, price range, and current operating status (open now).
Reviews & Ratings: Travelers can post ratings and text reviews for businesses.

Non-Functional Requirements

Sub-50ms Latency for Searches: Proximity search is a highly interactive user experience. Read queries must return results globally within $50\text{ms}$.
Massive Read-to-Write Ratio: The service is overwhelmingly read-heavy ($100,000 : 1$ read/write ratio). Searching nearby businesses is a constant stream of traffic, whereas registering a new restaurant is a rare event.
Eventual Consistency for POI Updates: It is acceptable if a newly added restaurant takes up to 5-10 minutes to appear in search results. Consistency is relaxed to ensure high read performance.
High Availability: A database primary failure should not degrade the search path. Read replicas and caching layers must continue to resolve queries.

Back-of-the-Envelope Estimation

System Scale & QPS:
- Points of Interest (POI): $50,000,000$ active business listings globally.
- Daily Active Users (DAU): $100,000,000$ active searchers.
- Search QPS: Assuming each traveler performs 5 searches per day: $$\text{Daily Search Volume} = 100,000,000 \times 5 = 500,000,000 \text{ searches/day}$$ $$\text{Average Search QPS} = \frac{500,000,000}{86,400} \approx 5,787 \text{ QPS}$$ Assuming a peak traffic multiplier of $2\times$ during peak dining/lunch hours: $$\text{Peak Search QPS} \approx 11,574 \text{ QPS}$$
- Listing Write QPS: Assuming $10,000$ new listings are added globally per day: $$\text{Write QPS} = \frac{10,000}{86,400} \approx 0.11 \text{ QPS (extremely low write load)}$$
Memory Footprint for Spatial Indexing:
- To achieve sub-50ms latencies, we keep the spatial index in-memory (e.g., via Redis or an in-memory Quadtree cache).
- Each listing index entry contains: listing_id (8 bytes), latitude (8 bytes), longitude (8 bytes), and geohash (8 bytes) = 32 bytes per POI.
- For 50 Million listings: $$\text{Index Memory Footprint} = 50,000,000 \times 32 \text{ bytes} \approx 1.6 \text{ GB}$$ This footprint is extremely light and can easily be cached on a single standard server instance.

2. API Design & Core Contracts

The system exposes clean, light REST endpoints to ensure fast search response times.

Proximity Search

GET /api/v1/places/nearby Retrieves all listings within a target radius, supporting category filters and cursor-based pagination.

Query Parameters:

latitude=37.7749
longitude=-122.4194
radius_km=5.0
category=restaurant
min_rating=4.0
limit=20
cursor=eyJvZmZzZXQiOjIwfQ==

Response Payload:

{
  "places": [
    {
      "place_id": "poi_99a8c7b0",
      "name": "The Golden Fork",
      "latitude": 37.7752,
      "longitude": -122.4188,
      "distance_km": 0.12,
      "average_rating": 4.65,
      "price_tier": 2,
      "thumbnail_url": "https://cdn.codesprintpro.com/places/poi_99a8c7b0/thumb.jpg"
    }
  ],
  "next_cursor": "eyJvZmZzZXQiOjQwfQ=="
}

Register New POI

POST /api/v1/places Registers a new Point of Interest. This is written directly to the relational database.

Request Payload:

{
  "name": "Bayside Gelato",
  "latitude": 37.8080,
  "longitude": -122.4177,
  "description": "Artisanal Italian gelato by the wharf.",
  "price_tier": 1,
  "category": "dessert",
  "amenities": ["outdoor_seating", "wifi"]
}

Response Payload:

{
  "place_id": "poi_11b8d2ef",
  "status": "APPROVED",
  "geohash": "9q8yyzh3",
  "created_at": "2026-05-22T16:45:00Z"
}

3. High-Level Design (HLD)

The proximity system relies on a CQRS pattern to shield PostgreSQL from heavy read traffic. Writes are persisted in a primary relational database, while read searches are resolved through high-performance cached spatial indices.

graph TD
    Client[Traveler Client Mobile/Web] -->|1. Search Nearby| APIGateway[API Gateway & Rate Limiter]
    Client -->|4. Register Business| APIGateway
    
    %% Read Path (Search)
    APIGateway -->|Read Path| SearchService[Search Nearby Service]
    SearchService -->|Query Nearest IDs| CacheCluster[(Redis Spatial Cache)]
    SearchService -->|Hydrate Listing Details| ReadReplicas[(PostgreSQL Read Replicas)]
    
    %% Write Path (Admin POI)
    APIGateway -->|Write Path| BusinessAdminService[Business Admin Service]
    BusinessAdminService -->|Create Listing| DBPrimary[(PostgreSQL Primary)]
    
    %% Async Reindexing Pipeline
    DBPrimary -->|CDC Stream| Debezium[Debezium CDC Broker]
    Debezium -->|Publish Mutations| Kafka[Kafka Event Bus]
    Kafka -->|Consume & Sync| IndexWorker[Index Sync Workers]
    IndexWorker -->|Update Geo Hashes| CacheCluster
    
    %% Review & Rating Aggregator
    Client -->|Submit Review| ReviewService[Review & Rating Service]
    ReviewService -->|Write Review| ReviewDB[(Review MongoDB Cluster)]
    ReviewDB -->|Aggregate Score| SparkJob[Spark Aggregator Engine]
    SparkJob -->|Batch Update Rating| DBPrimary

Flow Breakdown:

Writing POI Metadata (Write Path): When a business owner registers a place, the request goes to the Business Admin Service, which writes the record to the PostgreSQL Primary Database.
Synchronization Pipeline (CDC): Any new rows or updates inside PostgreSQL are streamed asynchronously via Debezium CDC to Apache Kafka. The Index Sync Workers consume these events, calculate the listing's Geohash, and update the Redis Spatial Cache.
Searching Nearby (Read Path): When a traveler searches for nearby places, the Search Service first queries the Redis Spatial Cache (using GEORADIUS or Geohash prefix lookups) to retrieve the closest matching listing IDs.
Detail Hydration: Once the Search Service has the nearest 20 listing IDs, it performs a fast primary key point lookup against the PostgreSQL Read Replicas or an elastic memory cache to hydrate the listing metadata (name, reviews, menus) and returns the result.

4. Low-Level Design (LLD) & Data Models

Database Selection Rationale

POI Metadata & Core Relationships (PostgreSQL): Storing business descriptions, menus, hours, and geographical coordinates is a relational problem. PostgreSQL handles ACID transactions, maintains referential integrity, and provides rich indexing support (e.g., GIST indexes on coordinates).
Spatial Index Cache (Redis): Redis stores geographic points inside a Sorted Set using Geohash-based scores. It provides fast in-memory spatial index operations like GEOADD, GEODIST, and GEORADIUS.
Reviews & Ratings Store (MongoDB): Reviews are unstructured, write-heavy documents. Storing them in MongoDB allows us to handle high-frequency writes and easily scale reviews independently.

SQL DDL Database Schemas

Businesses Table

CREATE TABLE businesses (
    id VARCHAR(64) PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    description TEXT,
    price_tier INT CHECK (price_tier BETWEEN 1 AND 4),
    average_rating DECIMAL(3, 2) DEFAULT 0.00,
    review_count INT DEFAULT 0,
    latitude DOUBLE PRECISION NOT NULL,
    longitude DOUBLE PRECISION NOT NULL,
    geohash VARCHAR(12) NOT NULL,
    geo_point POINT NOT NULL,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);

-- Indices for spatial searches and point-lookups
CREATE INDEX idx_businesses_geohash ON businesses (geohash);
CREATE INDEX idx_businesses_geo ON businesses USING GIST (geo_point);

Categories Table

CREATE TABLE categories (
    id SERIAL PRIMARY KEY,
    name VARCHAR(64) UNIQUE NOT NULL,
    slug VARCHAR(64) UNIQUE NOT NULL
);

Business-Category Join Table

CREATE TABLE business_categories (
    business_id VARCHAR(64) REFERENCES businesses(id) ON DELETE CASCADE,
    category_id INT REFERENCES categories(id) ON DELETE CASCADE,
    PRIMARY KEY (business_id, category_id)
);

CREATE INDEX idx_business_categories_lookup ON business_categories (category_id);

5. Geospatial Indexing Strategy Comparison

To design a high-performance proximity service, we compare three primary spatial indexing strategies:

graph TD
    %% Spatial Index Types Comparison
    Geohash[Geohashing: Base32 Grid]
    Quadtree[Quadtree: Recursive 2D Tree]
    H3[Google H3: Hexagonal Cells]

    Geohash -->|Pros| GH_Pros[Simple string prefixes, indexable in standard databases]
    Geohash -->|Cons| GH_Cons[Edge distortion, cells are rectangular rather than circular]

    Quadtree -->|Pros| QT_Pros[Dynamic partitioning, handles dense cities vs sparse deserts]
    Quadtree -->|Cons| QT_Cons[In-memory state management, highly complex to shard]

    H3 -->|Pros| H3_Pros[Uniform distances to all neighboring cells, no edge distortion]
    H3 -->|Cons| H3_Cons[Hexagons cannot be perfectly subdivided, complex coordinate math]

Detailed Evaluation Matrix:

Indexing Model	Data Structure	Neighbor Distance	Density Handling	Sharding Feasibility
Geohash	Base-32 String (1D representation of 2D coordinates)	Non-uniform (cell boundaries distort near poles)	Poor (fixed grid sizes regardless of city density)	Excellent (can shard databases by Geohash prefixes)
Quadtree	Hierarchical Tree (each node splits into 4 child nodes)	Non-uniform (requires traversing tree levels)	Excellent (nodes split only when POI density limits are reached)	Poor (requires keeping the full tree structure in a single server's memory)
Google H3	Hexagonal Grid System	Uniform (every adjacent hexagon shares the exact same center distance)	Good (supports multiple resolution levels hierarchically)	Good (can shard nodes by high-level hexagon indexes)

6. Scaling Challenges & High Write Concurrency

Scaling geospatial searches globally introduces two primary technical bottlenecks: the Boundary Problem and Quadtree Sharding.

A. Solving the Geohash Boundary Problem

If a user stands near the edge of a Geohash cell (e.g., cell 9q8yy), the nearest restaurant might be just 50 meters away, but located inside the adjacent Geohash cell (9q8yz). A simple prefix search inside 9q8yy will miss it.

+-----------+-----------+-----------+
|  9q8yx    |  9q8yy    |  9q8yz    |
| (Neighbor)| (Neighbor)| (Neighbor)|
+-----------+-----------+-----------+
|  9q8yu    |  9q8yv    |  9q8yw    |
| (Neighbor)| (User * ) | (Neighbor)|
+-----------+-----------+-----------+
|  9q8ys    |  9q8yt    |  9q8yr    |
| (Neighbor)| (Neighbor)| (Neighbor)|
+-----------+-----------+-----------+
* User is near the edge of cell 9q8yv. The system must query all 8 surrounding cells.

The Mitigation: Eight-Neighbor Querying. The Search Service must calculate the Geohash of the user's coordinates, determine the 8 surrounding neighbor cells, and run a combined index lookup across all 9 cells. The results are then merged and sorted by distance in memory before returning them to the client.

B. Sharding the Spatial Index

While a Geohash index fits on a single server, a global Quadtree becomes too large for a single machine's memory.

The Mitigation: Pre-Sharding by Region. Instead of maintaining a single, global Quadtree, we split the tree into independent, localized Quadtrees based on geographic regions (e.g., cities or high-level Geohash prefixes). All searches in Paris are routed to a dedicated Paris Quadtree server instance. This regional bulkhead isolation prevents a single server failure from bringing down our global proximity service.

7. Technical Trade-offs & Consistency Models

Tradeoff A: In-Memory Quadtree vs. Redis Geospatial Cache

In-Memory Quadtree (Custom Build):
- Pros: Provides maximum performance and absolute control over node splits, permitting customized search radius algorithms.
- Cons: Highly complex to implement and debug. Failover recovery is slow because the tree must be completely rebuilt from PostgreSQL rows upon node reboot.
Redis Geospatial (GEOADD / GEORADIUS):
- Pros: Fully managed, production-tested, and supports automatic replication and clustering out of the box.
- Cons: Operates on a fixed grid system, making it harder to customize search criteria for extremely dense cities versus empty rural areas.

Tradeoff B: AP vs. CP in Dynamic Reviews

When a traveler submits a review, we must decide when it becomes visible. If we enforce Strong Consistency (CP), the review is written to MongoDB and the business's average rating is recalculated in real time. This creates a massive write lock queue under high traffic. We choose Eventual Consistency (AP). Reviews are accepted immediately, but rating aggregations are calculated asynchronously in batches via a Spark/Flink pipeline every 10 minutes. This decouples the search path from review spikes and ensures the read QPS remains blazing-fast.

8. Resilience & Failure Scenarios

Operating global geographic search indices requires robust recovery mechanisms for critical failures:

Scenario A: Rebuilding a Crashed Spatial Index Server

If a Redis Spatial node crashes and loses its in-memory data, it must be rebuilt quickly to prevent search outages.

The Recovery Plan: We enable AOF (Append Only File) persistence with secondary read replicas. Upon reboot, the node restores its state from the persistent AOF log on disk. If the disk is corrupted, a backup worker pulls POI coordinate rows directly from PostgreSQL and repopulates the Redis Geospatial index in batches.

Scenario B: Hotspot Search Storms (e.g., Times Square in New Year)

During major public events, millions of users in the same tiny area query the same spatial grid simultaneously, creating a thundering herd on local search nodes.

The Mitigation: We cache popular localized search query results (e.g., "Food near Times Square") inside a local memory cache (Aerate/Guava) on the Search Service instances for 30 seconds. This prevents duplicate spatial index lookups from hitting the Redis or Quadtree cluster, shielding our index layer from traffic spikes.

9. Staff Engineer Perspective (Deep-Dive Callouts)

Trade-off

Geohash Length Selection Selecting the right Geohash length is a critical balance between database performance and search precision:

A length of 6 represents a $1.2\text{km} \times 0.6\text{km}$ cell. This is highly precise, but in sparse rural areas, a search radius of 10km forces the system to query hundreds of neighbor cells, slowing down search speeds.
A length of 5 represents a $4.9\text{km} \times 4.9\text{km}$ cell. This is the sweet spot. It minimizes the number of neighbor lookups during medium-range searches, and we filter out extraneous results in-memory.

[!PITFALL] Edge Case Failures in Quadtree Splits If a Quadtree node has a split threshold of 100 listings, and a host registers a new business in a highly dense area, the node splits. If hosts repeatedly register and delete businesses right around the threshold, the node will constantly split and merge, creating high CPU overhead. We mitigate this by implementing a hysteresis buffer (e.g., split at 120 listings, but only merge when count drops below 80 listings).

10. Candidate Verbal Script (Interview Guide)

Interviewer: "How would you design the spatial index for a proximity service like Yelp, and how does it scale as the number of listings grows?"

Candidate: "A standard database index (like B-Tree) only works on one dimension. To perform fast spatial queries, we must convert 2D coordinates into a 1D index representation. I would compare two primary approaches: Geohashing and Quadtrees.

For a production-grade, globally distributed system, I would select Geohashing using Redis Geospatial cache. Geohashing divides the world into a grid and assigns each cell a base-32 string prefix. This allows us to perform spatial queries using fast index prefix lookups.

To prevent double-counting or missing businesses near cell boundaries, the search engine calculates the traveler's current Geohash and queries that cell plus the 8 surrounding neighbor cells. The results are then merged and sorted by distance in memory before returning them to the client.

To scale the system as listings grow to 50M+, we shard the database geographically by Geohash prefixes (e.g., country or region). This keeps all spatial searches highly localized and ensures that read queries are resolved through fast, sharded Redis read-replicas with sub-50ms latencies."

Interviewer: "What happens if a new restaurant is added? How does the spatial index stay updated without slowing down searches?"

Candidate: "We use a CQRS architecture to separate write workflows from read searches.

When a restaurant is registered, the request is written directly to our relational PostgreSQL database (our primary source of truth). This write pathway has a very low load (average of 0.11 QPS globally), so PostgreSQL is never bottlenecked by search traffic.

To update the search index, we use an asynchronous Change Data Capture (CDC) pipeline. Debezium streams PostgreSQL transaction logs to Apache Kafka. Reindexing workers consume these events, calculate the listing's Geohash, and update the Redis Geospatial cache in background batches.

This asynchronous update ensures that new businesses appear in search results within a few minutes, while shielding our read search path from write lock contention."