Lesson 3 of 10 5 min

MySQL Mastery: Indexing and Performance

Deep dive into MySQL Indexing. Understand B-Trees, Clustered vs Non-Clustered indexes, and how to read the EXPLAIN plan.

Reading Mode

Hide the curriculum rail and keep the lesson centered for focused reading.

Mental Model

An index is the table of contents at the back of a textbook. Without it, you have to read every single page to find a specific word (Full Table Scan).

When a table grows beyond a million rows, queries that used to take 10ms suddenly take 5 seconds. The solution is almost always Indexing.

1. How Indexes Work (The B-Tree)

By default, MySQL (using the InnoDB engine) stores indexes as B-Trees. A B-Tree is a self-balancing tree data structure that keeps data sorted and allows searches, sequential access, insertions, and deletions in $O(\log N)$ time.

When you create an index on an email column, MySQL creates a separate data structure that stores the sorted emails and a pointer to the original row on disk.

CREATE INDEX idx_users_email ON users(email);

2. Clustered vs Non-Clustered Indexes

In MySQL's InnoDB storage engine, this distinction is critical for performance.

Clustered Index (The Primary Key)

The actual row data is stored inside the leaf nodes of the Primary Key B-Tree. This means looking up a row by its Primary Key is the fastest possible operation in MySQL because there is no extra disk hop.

Non-Clustered Index (Secondary Index)

If you search by email (using idx_users_email), the leaf node of that index does not contain the row data. It only contains the Primary Key ID of that row. MySQL then takes that ID, goes to the Clustered Index, and looks up the actual row data. This is called a Double Lookup.

3. The Covering Index

How do we avoid the slow Double Lookup? We use a Covering Index. If your query only selects columns that are already present in the Secondary Index, MySQL doesn't need to visit the Clustered Index at all.

-- Creating a composite index
CREATE INDEX idx_users_status_email ON users(status, email);

-- This query is "Covered" by the index! No double lookup required.
SELECT email FROM users WHERE status = 'ACTIVE';

4. Reading the EXPLAIN Plan

The EXPLAIN keyword is a Staff Engineer's best friend. Prefix any query with EXPLAIN to see how MySQL intends to execute it.

EXPLAIN SELECT * FROM users WHERE email = 'john@example.com';

Key Columns to Watch:

  • type: Indicates how tables are joined. ALL is a Full Table Scan (terrible). index is a Full Index Scan (bad). ref or eq_ref means it's using an index properly (good). const means it's hitting a Primary Key (perfect).
  • possible_keys: Indexes MySQL considered using.
  • key: The actual index MySQL decided to use.
  • rows: An estimate of how many rows MySQL has to examine to find the result. (Lower is better).

Practice Question

Scenario: You have a table logs with columns id, user_id, action, created_at. You frequently run SELECT action FROM logs WHERE user_id = 123. Question: What is the optimal index to create to make this query a "Covered Query"?

View Answer
CREATE INDEX idx_userid_action ON logs(user_id, action);

By including action in the index alongside user_id, the index "covers" the query. MySQL finds the user_id, grabs the action directly from the index node, and returns the result without ever reading the actual table row from disk.

Technical Trade-offs: Architectural Decision

Strategy Scalability Complexity Operational Cost Performance
Monolithic Low Low Low Fast (Local)
Microservices Very High High High Slower (Network)
Serverless Infinite Medium Variable Variable (Cold Starts)

Production Readiness Checklist

Before deploying this architecture to a production environment, ensure the following Staff-level criteria are met:

  • High Availability: Have we eliminated single points of failure across all layers?
  • Observability: Are we exporting structured JSON logs, custom Prometheus metrics, and OpenTelemetry traces?
  • Circuit Breaking: Do all synchronous service-to-service calls have timeouts and fallbacks (e.g., via Resilience4j)?
  • Idempotency: Can our APIs handle retries safely without causing duplicate side effects?
  • Backpressure: Does the system gracefully degrade or return HTTP 429 when resources are saturated?

Verbal Interview Script

Interviewer: "We added an index to a column, but EXPLAIN shows MySQL is still doing a Full Table Scan. Why?"

Candidate: "There are several reasons the query optimizer might ignore an index. The most common is low cardinality—if the column is a boolean (is_active), the index isn't selective enough, and scanning the table sequentially is actually faster than traversing the B-Tree. Another reason is applying a function to the indexed column in the WHERE clause (e.g., WHERE YEAR(created_at) = 2026), which breaks the index usage (SARGability). Finally, if we use a LIKE '%term' wildcard at the beginning of a string, the B-Tree cannot be traversed left-to-right, forcing a full scan."

Want to track your progress?

Sign in to save your progress, track completed lessons, and pick up where you left off.