Database Scaling Part 2: Partitioning vs. Sharding

Sharding & Partitioning: Scaling the Storage Layer

Mental Model

The source of truth where data persistence, consistency, and retrieval speed must be balanced.

To scale a database beyond a single machine, we have two primary tools: Partitioning and Sharding. While they are often used together, they solve fundamentally different problems.

1. Logical Table Partitioning (Single Node)

Table partitioning is a database-native feature where a single massive logical table is physically broken down into smaller, individual tables on the same server.

graph TD
    subgraph "Single Database Server"
        Table[Logical: Orders Table]
        P1[Partition: Orders_2023]
        P2[Partition: Orders_2024]
        P3[Partition: Orders_2025]
        
        Table --> P1
        Table --> P2
        Table --> P3
    end

The B-Tree Index Memory Problem

Why do we partition? To keep the B-Tree index in RAM.

As a table grows, its index grows. If the index becomes larger than the server's available RAM, the database must perform Disk I/O to search the index. This turns a 0.1ms lookup into a 10ms lookup—a 100x performance penalty.

By partitioning, each smaller table has a smaller index that fits into memory, ensuring "Index-Only Scans" remain fast.

2. Physical Database Sharding (Multi-Node)

Sharding is the process of breaking up a large database into smaller chunks (shards) and distributing them across multiple independent server nodes.

graph LR
    subgraph "Shared-Nothing Architecture"
        LB[Shard Proxy / App]
        S1[(Node 1: Shard A)]
        S2[(Node 2: Shard B)]
        S3[(Node 3: Shard C)]
        
        LB -- user_id: 1-100 --> S1
        LB -- user_id: 101-200 --> S2
        LB -- user_id: 201-300 --> S3
    end

The "Staff" Perspective on Sharding

Sharding is a "Nuclear Option." It allows you to aggregate the compute power (CPU/RAM) and storage capacity of many servers. It is the only way to handle write-heavy applications at the scale of Instagram or Uber.

3. Comparison Matrix

Feature	Partitioning	Sharding
Location	Single Server	Multiple Servers
Resources	Shares one CPU/RAM/Disk	Aggregates many CPUs/RAM/Disks
Complexity	Low (Internal to DB)	High (Requires external routing)
Maintenance	Easy	Hard (Data rebalancing)

4. The Shard Key Selection Playbook

The most critical decision in sharding is picking the Shard Key.

The Hotspot Trap: If you shard by created_at, all new writes hit the same shard (the latest one). This is a bottleneck.
The Celebrity Problem: If you shard by user_id, a "Celebrity" user might overload a single shard with traffic.

Staff Recommendation: Use a Hash-based Shard Key on a high-cardinality column (like order_id or uuid) to ensure even distribution across all nodes.

5. Verbal Interview Script (Communication)

Interviewer: "How do you decide when a system has outgrown partitioning and needs sharding?"

You: "I monitor two primary metrics: I/O Wait and Write Saturation. If our query latency is increasing because the indexes no longer fit in the server's buffer pool, I first try partitioning. However, if the Total Write RPS (Requests Per Second) exceeds the capacity of a single disk controller, or if we have reached the vertical limit of the largest cloud instance (e.g., 256GB RAM), then sharding becomes mandatory. Sharding transforms the single-node bottleneck into a 'Shared-Nothing' architecture, giving us near-infinite horizontal scale at the cost of cross-shard join complexity."

6. Optimization Checklist

Denormalization: Avoid joins across shards by duplicating data.
Consistent Hashing: Use it to minimize data movement during "Shard Splits."
Failover: Each shard should be a primary-replica set for high availability.

Key Takeaways

The Hotspot Trap: If you shard by created_at, all new writes hit the same shard (the latest one). This is a bottleneck.
The Celebrity Problem: If you shard by user_id, a "Celebrity" user might overload a single shard with traffic.
Case Study: How Discord Shards Cassandra for Billions of Messages