Lesson 2 of 10 4 min

MySQL Mastery: Joins and Aggregations

Learn how to combine and aggregate data across multiple tables using INNER JOIN, LEFT JOIN, GROUP BY, and HAVING.

Reading Mode

Hide the curriculum rail and keep the lesson centered for focused reading.

Mental Model

Joins are the bridges that connect isolated islands of data. Aggregations are the funnels that summarize those islands into actionable insights.

A database with a single table is just a slow Excel spreadsheet. The true power of a Relational Database is querying across multiple tables simultaneously.

1. The Anatomy of a JOIN

Let's assume we have two tables: users and orders.

INNER JOIN (The Intersection)

Returns only the records that have matching values in both tables.

SELECT users.email, orders.total_amount
FROM users
INNER JOIN orders ON users.id = orders.user_id;

If a user hasn't placed any orders, they will not appear in this result.

LEFT JOIN (The Baseline)

Returns ALL records from the left table (users), and the matched records from the right table (orders). If there is no match, the result is NULL on the right side.

SELECT users.email, orders.total_amount
FROM users
LEFT JOIN orders ON users.id = orders.user_id;

This is how you find users who have never placed an order: you check WHERE orders.id IS NULL.

2. Aggregations: Summarizing Data

Sometimes you don't want every individual row; you want the total sum, the average, or the count.

SELECT COUNT(*) as total_users FROM users;
SELECT SUM(total_amount) as revenue FROM orders WHERE status = 'DELIVERED';

3. GROUP BY and HAVING

What if we want to know the total revenue generated by each user? We use GROUP BY.

SELECT users.email, SUM(orders.total_amount) as user_ltv
FROM users
INNER JOIN orders ON users.id = orders.user_id
GROUP BY users.id, users.email;

The Difference Between WHERE and HAVING

  • WHERE filters rows before the grouping happens.
  • HAVING filters the groups after the aggregation.

Find users who have spent more than $1,000:

SELECT users.email, SUM(orders.total_amount) as user_ltv
FROM users
INNER JOIN orders ON users.id = orders.user_id
WHERE orders.status = 'DELIVERED'
GROUP BY users.id, users.email
HAVING SUM(orders.total_amount) > 1000;

Practice Question

Scenario: You have employees and departments. Question: Write a query to find all departments that currently have 0 employees assigned to them.

View Answer
SELECT d.name
FROM departments d
LEFT JOIN employees e ON d.id = e.department_id
WHERE e.id IS NULL;

By using a LEFT JOIN from departments, we ensure all departments are returned. Filtering WHERE e.id IS NULL grabs the ones that failed to join with any employee.

Technical Trade-offs: Architectural Decision

Strategy Scalability Complexity Operational Cost Performance
Monolithic Low Low Low Fast (Local)
Microservices Very High High High Slower (Network)
Serverless Infinite Medium Variable Variable (Cold Starts)

Production Readiness Checklist

Before deploying this architecture to a production environment, ensure the following Staff-level criteria are met:

  • High Availability: Have we eliminated single points of failure across all layers?
  • Observability: Are we exporting structured JSON logs, custom Prometheus metrics, and OpenTelemetry traces?
  • Circuit Breaking: Do all synchronous service-to-service calls have timeouts and fallbacks (e.g., via Resilience4j)?
  • Idempotency: Can our APIs handle retries safely without causing duplicate side effects?
  • Backpressure: Does the system gracefully degrade or return HTTP 429 when resources are saturated?

Verbal Interview Script

Interviewer: "What is the performance difference between a Subquery and a JOIN?"

Candidate: "Historically, MySQL struggled with subqueries, often executing them iteratively for every row in the outer query, leading to an $O(N^2)$ performance bottleneck. Modern MySQL optimizers try to rewrite subqueries as JOINs internally. However, as a best practice, explicitly writing an INNER JOIN allows the query optimizer to build a more efficient execution plan, utilizing indexes properly and performing hash or nested-loop joins instead of correlated subquery scans."

Want to track your progress?

Sign in to save your progress, track completed lessons, and pick up where you left off.