Lesson 7 of 107 4 min

Database Scaling Part 2: Partitioning vs. Sharding

Learn the difference between logical table partitioning and physical database sharding. Which one do you need? Master the B-Tree index memory trade-offs.

Reading Mode

Hide the curriculum rail and keep the lesson centered for focused reading.

Sharding & Partitioning: Scaling the Storage Layer

Mental Model

The source of truth where data persistence, consistency, and retrieval speed must be balanced.

To scale a database beyond a single machine, we have two primary tools: Partitioning and Sharding. While they are often used together, they solve fundamentally different problems.

1. Logical Table Partitioning (Single Node)

Table partitioning is a database-native feature where a single massive logical table is physically broken down into smaller, individual tables on the same server.

graph TD
    subgraph "Single Database Server"
        Table[Logical: Orders Table]
        P1[Partition: Orders_2023]
        P2[Partition: Orders_2024]
        P3[Partition: Orders_2025]
        
        Table --> P1
        Table --> P2
        Table --> P3
    end

The B-Tree Index Memory Problem

Why do we partition? To keep the B-Tree index in RAM.

As a table grows, its index grows. If the index becomes larger than the server's available RAM, the database must perform Disk I/O to search the index. This turns a 0.1ms lookup into a 10ms lookup—a 100x performance penalty.

By partitioning, each smaller table has a smaller index that fits into memory, ensuring "Index-Only Scans" remain fast.


2. Physical Database Sharding (Multi-Node)

Sharding is the process of breaking up a large database into smaller chunks (shards) and distributing them across multiple independent server nodes.

graph LR
    subgraph "Shared-Nothing Architecture"
        LB[Shard Proxy / App]
        S1[(Node 1: Shard A)]
        S2[(Node 2: Shard B)]
        S3[(Node 3: Shard C)]
        
        LB -- user_id: 1-100 --> S1
        LB -- user_id: 101-200 --> S2
        LB -- user_id: 201-300 --> S3
    end

The "Staff" Perspective on Sharding

Sharding is a "Nuclear Option." It allows you to aggregate the compute power (CPU/RAM) and storage capacity of many servers. It is the only way to handle write-heavy applications at the scale of Instagram or Uber.


3. Comparison Matrix

Feature Partitioning Sharding
Location Single Server Multiple Servers
Resources Shares one CPU/RAM/Disk Aggregates many CPUs/RAM/Disks
Complexity Low (Internal to DB) High (Requires external routing)
Maintenance Easy Hard (Data rebalancing)

4. The Shard Key Selection Playbook

The most critical decision in sharding is picking the Shard Key.

  • The Hotspot Trap: If you shard by created_at, all new writes hit the same shard (the latest one). This is a bottleneck.
  • The Celebrity Problem: If you shard by user_id, a "Celebrity" user might overload a single shard with traffic.

Staff Recommendation: Use a Hash-based Shard Key on a high-cardinality column (like order_id or uuid) to ensure even distribution across all nodes.

5. Verbal Interview Script (Communication)

Interviewer: "How do you decide when a system has outgrown partitioning and needs sharding?"

You: "I monitor two primary metrics: I/O Wait and Write Saturation. If our query latency is increasing because the indexes no longer fit in the server's buffer pool, I first try partitioning. However, if the Total Write RPS (Requests Per Second) exceeds the capacity of a single disk controller, or if we have reached the vertical limit of the largest cloud instance (e.g., 256GB RAM), then sharding becomes mandatory. Sharding transforms the single-node bottleneck into a 'Shared-Nothing' architecture, giving us near-infinite horizontal scale at the cost of cross-shard join complexity."

6. Optimization Checklist

  1. Denormalization: Avoid joins across shards by duplicating data.
  2. Consistent Hashing: Use it to minimize data movement during "Shard Splits."
  3. Failover: Each shard should be a primary-replica set for high availability.

Key Takeaways

Want to track your progress?

Sign in to save your progress, track completed lessons, and pick up where you left off.