Lesson 2 of 12 14 minDatabase Core

MySQL Schema Design Case Study: Building a Scalable E-Commerce Database

A complete MySQL schema design case study for an e-commerce platform handling 1 million orders/day. Covers normalized schema design, indexing strategy, soft deletes, audit trails, and zero-downtime schema evolution.

Reading Mode

Hide the curriculum rail and keep the lesson centered for focused reading.

Key Takeaways

  • Normalize first, then denormalize only where query patterns prove it is necessary.
  • Composite indexes must match your query patterns exactly — the column order is not arbitrary.
  • Every production schema needs soft deletes, audit columns, and a migration strategy that supports zero-downtime deploys.
Recommended Prerequisites
mysql-mastery-01-fundamentalsmysql-mastery-02-joins-aggregationsmysql-mastery-03-indexing-performancemysql-mastery-04-transactions-acidmysql-mastery-11-production-incident-playbook

Premium outcome

From relational fundamentals to query plans, transactions, and scaling.

Backend engineers who want stronger SQL fundamentals and production database depth.

What you unlock

  • Better mental models for indexes, joins, transactions, and query optimization
  • A complete path through core MySQL operational and architectural concepts
  • More confidence choosing schema and storage trade-offs in interviews and production

Database schema design is architecture. Like all architectural decisions, it must be driven by the query patterns and access patterns you actually have—not the ones you imagine you might need.

In this case study, we deconstruct and redesign an e-commerce database schema for a high-growth platform transitioning from $10\text{M}$ ARR to processing 1 million orders per day. The naive weekend-build schema is crashing under production scale: checkout queries take $8$ seconds, flash sales lock inventory tables, and reporting queries crash the primary instance. We walk through a complete, production-ready MySQL 8.0 database architecture built on InnoDB, detailed composite indexing math, and zero-downtime schema migration strategies.


System Requirements and Goals

Designing a relational schema for a million daily orders requires establishing firm transactional boundaries and operational bounds.

1. Functional Requirements

  • Catalog Browsing: Buyers must browse products, filter by category, and see current prices and real-time inventory levels.
  • Resilient Order Checkout: Safely execute multi-item purchases. Inventory reservation must be highly concurrent, preventing overselling during high-demand flash sales.
  • Audit Trails: Record complete historical pricing records and inventory transaction movements; prices paid at checkout must remain immutable.
  • Soft Deletes: Ensure soft-deletion standards across critical tables (users, products) to preserve accounting and tax integrity.

2. Non-Functional Requirements

  • Transaction Latency: $99%$ of checkout transaction writes must complete in under $50\text{ ms}$ on the database layer.
  • Scale Capability: Core tables must scale to sustain over $100\text{ Million}$ rows without query degradation.
  • Operational Isolation: Long-running analytical queries must not affect OLTP write latencies.
  • Zero Downtime: Schema migrations must execute online without locking tables or locking out buyers.

High-Level Design Architecture

To support $1,000,000$ orders per day, the database tier must scale writes independently of reads. The architecture below showcases the decoupling of transactional OLTP write paths from analytical reporting pipelines using primary-replica replication:

graph TD
    %% Define Nodes
    Client[Mobile/Web Buyers] -->|HTTPS| API[API Gateway]
    
    subgraph "Application Layer"
        API -->|Route Checkout Writes| OrderSvc[Order Microservice]
        API -->|Route Read Queries| ReadSvc[Catalog & Search Service]
    end

    subgraph "Relational Database Layer"
        OrderSvc -->|Primary OLTP Writes| MySQLPrimary[MySQL Primary Instance - Master]
        MySQLPrimary -->|Semi-Synchronous Binlog Replication| MySQLReplica1[MySQL Read Replica 1]
        MySQLPrimary -->|Semi-Synchronous Binlog Replication| MySQLReplica2[MySQL Read Replica 2]
        
        ReadSvc -->|Load Balanced Reads| MySQLReplica1
        ReadSvc -->|Load Balanced Reads| MySQLReplica2
    end

    subgraph "Analytics & Data Warehouse"
        MySQLReplica2 -->|Debezium CDC Streams| Kafka[Apache Kafka Telemetry]
        Kafka -->|Bulk Insert| ClickHouse[(ClickHouse Data Warehouse)]
        ClickHouse -->|BI Tools| AnalyticsPanel[Internal Dashboards]
    end

    %% Styling
    classDef client fill:#2c3e50,stroke:#fff,stroke-width:1px,color:#fff;
    classDef app fill:#2980b9,stroke:#fff,stroke-width:1px,color:#fff;
    classDef database fill:#e67e22,stroke:#fff,stroke-width:2px,color:#fff;
    classDef analytics fill:#27ae60,stroke:#fff,stroke-width:1px,color:#fff;
    
    class Client client;
    class API,OrderSvc,ReadSvc app;
    class MySQLPrimary,MySQLReplica1,MySQLReplica2 database;
    class Kafka,ClickHouse,AnalyticsPanel analytics;

API Design and Interface Contracts

Transactional operations require robust REST endpoints that map neatly to relational write bounds.

1. Checkout Order Creation Payload (POST /api/v1/orders)

Client applications submit structured checkout payloads containing user profiles, shipping paths, and a specific array of item identifiers:

{
  "userId": 9920194,
  "shippingAddressId": 48201,
  "currency": "USD",
  "items": [
    {
      "productId": 2049,
      "quantity": 2
    },
    {
      "productId": 5092,
      "quantity": 1
    }
  ]
}

2. Transaction Response Contract

The application returns HTTP 201 Created upon successful transaction commit, providing structured metadata:

{
  "orderId": 882019401,
  "status": "pending",
  "subtotal": 129.97,
  "taxAmount": 10.40,
  "shippingAmount": 0.00,
  "totalAmount": 140.37,
  "createdAt": "2026-05-23T02:30:00.042Z"
}

Low-Level Design & Component Mechanics

To achieve maximum data integrity and low latency, we construct a fully normalized relational Entity-Relationship Diagram (ERD), followed by concrete DDL declarations.

1. Database Entity-Relationship Diagram (ERD)

erDiagram
    users ||--o{ addresses : "has"
    users ||--o{ orders : "places"
    addresses ||--o{ orders : "receives"
    products ||--o{ product_prices : "has historical"
    products ||--o{ order_items : "sold in"
    products ||--o{ inventory_transactions : "audited by"
    orders ||--|{ order_items : "contains"
    orders ||--o{ inventory_transactions : "references"

    users {
        bigint id PK
        varchar email UK
        varchar name
        enum status
        datetime created_at
        datetime updated_at
        datetime deleted_at
    }

    addresses {
        bigint id PK
        bigint user_id FK
        varchar line1
        varchar city
        varchar postal_code
        char country
    }

    products {
        bigint id PK
        varchar sku UK
        varchar name
        enum status
        datetime created_at
    }

    product_prices {
        bigint id PK
        bigint product_id FK
        decimal price
        char currency
        datetime valid_from
        datetime valid_until
    }

    orders {
        bigint id PK
        bigint user_id FK
        bigint shipping_address_id FK
        enum status
        decimal total_amount
        varchar payment_intent_id
        datetime created_at
    }

    order_items {
        bigint id PK
        bigint order_id FK
        bigint product_id FK
        int quantity
        decimal unit_price
    }

    inventory_transactions {
        bigint id PK
        bigint product_id FK
        int delta
        enum type
        bigint reference_id FK
        datetime created_at
    }

2. High-Performance MySQL 8.0 DDL Implementations

Below is the complete SQL implementation schema, utilizing optimized InnoDB parameters, appropriate character sets, foreign keys, and covering indexes.

-- Create database with high-performance default collations
CREATE DATABASE IF NOT EXISTS ecommerce
  CHARACTER SET utf8mb4
  COLLATE utf8mb4_0900_ai_ci;

USE ecommerce;

-- 1. Users Table: standardized constraints and soft deletes
CREATE TABLE users (
  id          BIGINT UNSIGNED AUTO_INCREMENT,
  email       VARCHAR(320) NOT NULL,              -- RFC 5321 max length
  name        VARCHAR(255) NOT NULL,
  status      ENUM('active', 'suspended', 'deleted') NOT NULL DEFAULT 'active',
  
  -- Mandatory Audit Columns
  created_at  DATETIME(3) NOT NULL DEFAULT CURRENT_TIMESTAMP(3),
  updated_at  DATETIME(3) NOT NULL DEFAULT CURRENT_TIMESTAMP(3) ON UPDATE CURRENT_TIMESTAMP(3),
  deleted_at  DATETIME(3) NULL DEFAULT NULL,
  
  CONSTRAINT pk_users PRIMARY KEY (id),
  CONSTRAINT uq_users_email UNIQUE (email)        -- Prevents duplicate registrations
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

-- 2. Addresses Table: normalized to support multiple delivery targets
CREATE TABLE addresses (
  id          BIGINT UNSIGNED AUTO_INCREMENT,
  user_id     BIGINT UNSIGNED NOT NULL,
  line1       VARCHAR(255) NOT NULL,
  line2       VARCHAR(255) NULL,
  city        VARCHAR(100) NOT NULL,
  state       VARCHAR(100) NOT NULL,
  postal_code VARCHAR(20) NOT NULL,
  country     CHAR(2) NOT NULL,                  -- ISO 3166-1 alpha-2 code
  is_default  BOOLEAN NOT NULL DEFAULT FALSE,
  deleted_at  DATETIME(3) NULL DEFAULT NULL,
  
  CONSTRAINT pk_addresses PRIMARY KEY (id),
  CONSTRAINT fk_addresses_user FOREIGN KEY (user_id) REFERENCES users (id) ON DELETE RESTRICT,
  INDEX idx_addresses_user_default (user_id, is_default)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

-- 3. Products Table: indexed catalog listings
CREATE TABLE products (
  id          BIGINT UNSIGNED AUTO_INCREMENT,
  sku         VARCHAR(64) NOT NULL,              -- Unique stock keeping unit code
  name        VARCHAR(512) NOT NULL,
  description TEXT NULL,
  category_id BIGINT UNSIGNED NOT NULL,
  status      ENUM('active', 'draft', 'discontinued') NOT NULL DEFAULT 'draft',
  
  created_at  DATETIME(3) NOT NULL DEFAULT CURRENT_TIMESTAMP(3),
  updated_at  DATETIME(3) NOT NULL DEFAULT CURRENT_TIMESTAMP(3) ON UPDATE CURRENT_TIMESTAMP(3),
  deleted_at  DATETIME(3) NULL DEFAULT NULL,
  
  CONSTRAINT pk_products PRIMARY KEY (id),
  CONSTRAINT uq_products_sku UNIQUE (sku),
  INDEX idx_products_category_status (category_id, status), -- Compound: catalog filters
  FULLTEXT INDEX ft_products_name (name)                    -- Full-text keyword search
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

-- 4. Product Prices Table: historical pricing registry
CREATE TABLE product_prices (
  id          BIGINT UNSIGNED AUTO_INCREMENT,
  product_id  BIGINT UNSIGNED NOT NULL,
  price       DECIMAL(12,4) NOT NULL,            -- Supports multi-currency decimal scales
  currency    CHAR(3) NOT NULL DEFAULT 'USD',    -- ISO 4217 currency key
  valid_from  DATETIME(3) NOT NULL,
  valid_until DATETIME(3) NULL,                  -- NULL means currently active
  
  CONSTRAINT pk_prices PRIMARY KEY (id),
  CONSTRAINT fk_prices_product FOREIGN KEY (product_id) REFERENCES products (id) ON DELETE RESTRICT,
  CONSTRAINT chk_price_positive CHECK (price >= 0.0000),
  INDEX idx_prices_product_timeline (product_id, valid_from, valid_until) -- Fast historic lookup
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

-- 5. Orders Table: core OLTP transaction registry
CREATE TABLE orders (
  id                  BIGINT UNSIGNED AUTO_INCREMENT,
  user_id             BIGINT UNSIGNED NOT NULL,
  shipping_address_id BIGINT UNSIGNED NOT NULL,
  status              ENUM('pending', 'confirmed', 'processing', 'shipped', 'delivered', 'cancelled', 'refunded') 
                      NOT NULL DEFAULT 'pending',
  currency            CHAR(3) NOT NULL DEFAULT 'USD',
  subtotal            DECIMAL(12,4) NOT NULL,
  tax_amount          DECIMAL(12,4) NOT NULL DEFAULT 0.0000,
  shipping_amount     DECIMAL(12,4) NOT NULL DEFAULT 0.0000,
  total_amount        DECIMAL(12,4) NOT NULL,
  
  -- Stripe Payment Tracking
  payment_intent_id   VARCHAR(255) NULL,
  paid_at             DATETIME(3) NULL,
  
  -- Audit & Life-cycle
  created_at          DATETIME(3) NOT NULL DEFAULT CURRENT_TIMESTAMP(3),
  updated_at          DATETIME(3) NOT NULL DEFAULT CURRENT_TIMESTAMP(3) ON UPDATE CURRENT_TIMESTAMP(3),
  cancelled_at        DATETIME(3) NULL,
  cancel_reason       VARCHAR(512) NULL,
  
  CONSTRAINT pk_orders PRIMARY KEY (id),
  CONSTRAINT fk_orders_user FOREIGN KEY (user_id) REFERENCES users (id) ON DELETE RESTRICT,
  CONSTRAINT fk_orders_address FOREIGN KEY (shipping_address_id) REFERENCES addresses (id) ON DELETE RESTRICT,
  INDEX idx_orders_user_created (user_id, created_at DESC), -- "My Orders" customer portal
  INDEX idx_orders_status_created (status, created_at),     -- Worker queue pull
  INDEX idx_orders_payment_intent (payment_intent_id)       -- Webhook idempotency check
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

-- 6. Order Items Table: normalized order details (no raw text blobs)
CREATE TABLE order_items (
  id          BIGINT UNSIGNED AUTO_INCREMENT,
  order_id    BIGINT UNSIGNED NOT NULL,
  product_id  BIGINT UNSIGNED NOT NULL,
  quantity    INT UNSIGNED NOT NULL,
  unit_price  DECIMAL(12,4) NOT NULL,            -- Frozen price snapshot at checkout
  line_total  DECIMAL(12,4) NOT NULL,
  
  CONSTRAINT pk_order_items PRIMARY KEY (id),
  CONSTRAINT fk_items_order FOREIGN KEY (order_id) REFERENCES orders (id) ON DELETE RESTRICT,
  CONSTRAINT fk_items_product FOREIGN KEY (product_id) REFERENCES products (id) ON DELETE RESTRICT,
  INDEX idx_items_order (order_id),
  INDEX idx_items_product (product_id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

-- 7. Inventory Transactions Table: append-only ledger for flash sales
CREATE TABLE inventory_transactions (
  id           BIGINT UNSIGNED AUTO_INCREMENT,
  product_id   BIGINT UNSIGNED NOT NULL,
  delta        INT NOT NULL,                     -- Positive: restock, Negative: sales reservation
  type         ENUM('purchase', 'sale', 'return', 'adjustment', 'reservation') NOT NULL,
  reference_id BIGINT UNSIGNED NULL,             -- Corresponds to order_id or ingestion batch
  created_at   DATETIME(3) NOT NULL DEFAULT CURRENT_TIMESTAMP(3),
  
  CONSTRAINT pk_inventory PRIMARY KEY (id),
  CONSTRAINT fk_inv_product FOREIGN KEY (product_id) REFERENCES products (id) ON DELETE RESTRICT,
  INDEX idx_inv_product_timeline (product_id, created_at, delta) -- Fast sum aggregations
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

Scaling Challenges & Production Bottlenecks

1. The Naive Stock Update Deadlock

In standard e-commerce designs, products have a single stock column. When $10,000$ concurrent buyers attempt a checkout transaction during a flash sale, they run:

-- Anti-Pattern: Triggers table locks and deadlocks under concurrent execution
UPDATE products SET stock = stock - 1 WHERE id = 1290;

Why it crashes:

InnoDB locks rows in index order. If transaction $T_1$ updates Product A then Product B, and transaction $T_2$ updates Product B then Product A, a Deadlock occurs. Under high write concurrency, lock escalation forces checkout latency to skyrocket and crashes transaction pipelines.

The Ledger Solution:

We decouple inventory updates into a high-concurrency, append-only inventory ledger (inventory_transactions). Instead of locking the products table row for writes, checkouts append a negative delta (delta = -2). Appends execute instantly using row-level inserts, eliminating lock contention.

2. High-Cardinality Indexing Memory Limits

Adding indexes to speed up every query pattern creates a secondary bottleneck: Index Bloat. Every index consumes memory within the InnoDB buffer pool. If indexes exceed physical memory sizes, MySQL is forced to swap pages to disk, causing query speeds to plunge.

Back-of-the-Envelope Estimation:

  • Table Size: $100,000,000$ order rows.
  • Primary Key: BIGINT ($8$ bytes) + Secondary Key on payment_intent_id (VARCHAR(255) averaging $32$ bytes).
  • Storage Footprint: $$\text{Leaf Node Size} = 100,000,000 \times (8\text{ B} + 32\text{ B} + 38\text{ B overhead}) \approx 7.8\text{ GB per index}$$
  • Result: Adding five unnecessary indexes will demand over $40\text{ GB}$ of dedicated RAM just to store index maps in the buffer pool.
  • Mitigation: Leverage Covering Indexes (e.g., idx_orders_user_created) and keep index columns narrow.

Technical Trade-offs & Strategic Compromises

When architecting a database schema at scale, we must balance normalization strictness against query performance.

Database Design Choice Pros Cons OLTP Write/Read Impact
Fully Normalized Schema (3NF) * Guarantees absolute data consistency.
* Zero duplicate data prevents anomalies.
* Requires complex, multi-join queries to assemble orders, increasing read latency. * OLTP Write: Extremely Fast (Single table writes)
* OLTP Read: Slow (Demands Joins)
Pessimistic Locking (SELECT FOR UPDATE) * Zero chance of overselling inventory during concurrent sales. * Blocks parallel checkout threads, creating queue bottlenecks. * Write Latency: High under load
* Data Safety: 100% Guaranteed
Optimistic Concurrency Control (OCC) * Non-blocking read-writes; highly scalable. * High rate of transaction rollbacks (retry overhead) during massive concurrent flash sales. * Scalability: Excellent
* Conflict Cost: High under hotkeys
Append-Only Inventory Ledger * Eliminates row lock contentions and deadlocks completely. * Requires summing transactions to compute current stock quantities. * Write Throughput: Ultra-High
* Compute Overhead: Medium (requires cached summary tables)

Failure Scenarios and Fault Tolerance

1. Flash Sale Hotspots and Inventory Overselling

When inventory drops to $0$, how do we prevent overselling without locking the inventory transaction table?

Fault-Tolerance Mitigation:

We execute atomic inventory allocation using a single check-and-insert query within a transaction. We calculate the sum of available stock in real-time, executing the insert only if available inventory satisfies the purchase volume:

START TRANSACTION;

-- Atomic check-and-reserve transaction
INSERT INTO inventory_transactions (product_id, delta, type, reference_id)
SELECT 2049, -2, 'reservation', 882019401
FROM (
  -- Sum all historical inventory transactions for the product
  SELECT IFNULL(SUM(delta), 0) AS available 
  FROM inventory_transactions 
  WHERE product_id = 2049
) AS stock
WHERE stock.available >= 2;

-- The application checks affected rows:
-- If Rows Affected = 1: Reservation succeeded. Commit transaction.
-- If Rows Affected = 0: Out of stock! Rollback transaction immediately.

2. Replica Lag Cascades

If a database replica experiences replication lag (due to long-running analytical queries), a buyer might purchase a product, receive a success page, and refresh the screen—only to see the order missing because the read replica has not synced the write.

Fault-Tolerance Mitigation:

  1. Write-to-Read Pinning: Pin a buyer's requests to the MySQL Primary node for $5$ seconds following any write command.
  2. Session Consistency Tokens: Attach the latest transaction binlog position (GTID) to the user's session token. Replicas must evaluate whether they have reached this position before responding; if not, the query routes to the Primary.

Staff Engineer Perspective

[!TIP] Leverage pt-online-schema-change for Large Tables: Never run direct ALTER TABLE commands on active tables containing more than $1\text{ Million}$ rows. It locks tables, starving connection pools. Use Open-Source schema change tools (like Percona's pt-osc or GitHub's gh-ost). These tools build a shadow table, sync delta logs via triggers/binlogs asynchronously, and perform an atomic cutover in milliseconds.


Verbal Script & Mock Interview

Verbal Script: High-Concurrency Schema Design

Interviewer: "We are designing a database schema for an e-commerce platform that processes 1 million orders a day. During flash sales, thousands of users try to buy the same item concurrently. How do you design the schema to prevent race conditions like overselling, while keeping checkout transactions under 50ms?"

Candidate: "To design a database capable of handling $1\text{ Million}$ orders a day while keeping checkout times under $50\text{ ms}$, we must address two core scaling challenges: schema normalization boundaries and write lock contention.

First, I would design a highly normalized schema using InnoDB. A critical decision is separating Product Details from Product Pricing and Inventory. To resolve the 'price change after purchase' vulnerability, I will design a product_prices table that tracks historic, interval-based pricing with valid_from and valid_until timestamps. When an order is placed, the price paid is frozen as an immutable copy directly within the order_items table.

Second, to eliminate the hot-key deadlock bottleneck during concurrent flash sales, I will completely avoid the anti-pattern of updating a single stock column in the products table. Instead, I will implement an Append-Only Inventory Ledger called inventory_transactions. When a buyer completes a checkout, rather than acquiring a write lock on the product row, we append an atomic, row-level reservation insert. The query sums all historical transactions for that product and only succeeds if the available quantity is sufficient:

INSERT INTO inventory_transactions (product_id, delta, type, reference_id)
SELECT ?, -?, 'reservation', ?
FROM (
  SELECT IFNULL(SUM(delta), 0) AS available 
  FROM inventory_transactions WHERE product_id = ?
) AS stock
WHERE stock.available >= ?;

If the rows affected returned is $1$, the transaction is committed; if $0$, we immediately abort and return an out-of-stock response, taking less than $10\text{ ms}$ with zero lock contention.

Third, I will configure a robust indexing strategy. For our highest-frequency queries, such as retrieving customer order history, I will implement composite indexes matching the query pattern. By creating an index on orders (user_id, created_at DESC), we cover both the filter and the sort, allowing the query to complete in single-digit milliseconds via index-only scans, bypassing expensive filesorts.

Finally, to handle analytical reporting without impacting OLTP checkout performance, I will configure semi-synchronous primary-replica replication. I will direct all transactional checkouts to the Primary node, while load-balancing catalog browsing and analytics queries to read replicas, protecting our checkout pipelines from database resource starvation."


Knowledge Check

MySQL · 3 Questions

Test Your Understanding

Ready to test yourself?

Answer 3 quick questions to reinforce what you just learned. Takes under 2 minutes.

Want to track your progress?

Sign in to save your progress, track completed lessons, and pick up where you left off.