Lesson 13 of 21 3 min

Project Case Study: Designing YouTube (Video Streaming at Global Scale)

How does YouTube handle millions of video uploads and billions of views? A technical deep dive into Video Transcoding, CDNs, Adaptive Bitrate Streaming, and storage optimization.

Reading Mode

Hide the curriculum rail and keep the lesson centered for focused reading.

Case Study: Design YouTube (Video Streaming)

Mental Model

Connecting isolated components into a resilient, scalable, and observable distributed web.

Designing a video streaming platform like YouTube or Netflix is the ultimate test of your ability to handle High Bandwidth and Global Content Delivery.

1. Requirement Clarification

Functional

  • Users can upload videos.
  • Users can watch videos on any device (Web, Mobile).
  • Users can search for videos.
  • View counts and real-time analytics.

Non-Functional

  • Scalability: Handle 1M+ uploads/day.
  • Availability: 99.99%.
  • Reliability: No loss of uploaded videos.
  • Latency: No buffering during playback.

2. High-Level Architecture

  1. Ingestion: Receives the raw video file.
  2. Transcoding: Converts video into multiple formats and resolutions (360p, 720p, 4k).
  3. Storage: Metadata in SQL, Raw files in Blob Storage (S3).
  4. CDN: Serves content from edge nodes near the user.

3. The Transcoding Pipeline

Transcoding is CPU-intensive. We use an Asynchronous Pipeline:

  • Raw Video $\rightarrow$ S3 $\rightarrow$ Kafka $\rightarrow$ Workers $\rightarrow$ Transcoded Segments $\rightarrow$ S3.

4. Adaptive Bitrate Streaming (DASH/HLS)

The system doesn't send one giant file. It breaks the video into 2-5 second segments. The player automatically switches between resolutions based on the user's network speed.

5. View Count (The Big Data Problem)

Writing to a single DB row for a viral video will crash your database.

  • Fix: Use a distributed counter (Redis) and periodically flush aggregates to the main DB.

Final Takeaway

Video streaming is about Decoupling Ingestion from Delivery. Ingestion needs reliable pipelines; delivery needs a massive global CDN.

Technical Trade-offs: Messaging Systems

Pattern Ordering Durability Throughput Complexity
Log-based (Kafka) Strict (per partition) High Very High High
Memory-based (Redis Pub/Sub) None Low High Very Low
Push-based (RabbitMQ) Fair Medium Medium Medium

Key Takeaways

  • Users can upload videos.
  • Users can watch videos on any device (Web, Mobile).
  • Users can search for videos.

Production Readiness Checklist

Before deploying this architecture to a production environment, ensure the following Staff-level criteria are met:

  • High Availability: Have we eliminated single points of failure across all layers?
  • Observability: Are we exporting structured JSON logs, custom Prometheus metrics, and OpenTelemetry traces?
  • Circuit Breaking: Do all synchronous service-to-service calls have timeouts and fallbacks (e.g., via Resilience4j)?
  • Idempotency: Can our APIs handle retries safely without causing duplicate side effects?
  • Backpressure: Does the system gracefully degrade or return HTTP 429 when resources are saturated?

Verbal Interview Script

Interviewer: "How would you ensure high availability and fault tolerance for this specific architecture?"

Candidate: "To achieve 'Five Nines' (99.999%) availability, we must eliminate all Single Points of Failure (SPOF). I would deploy the API Gateway and stateless microservices across multiple Availability Zones (AZs) behind an active-active load balancer. For the data layer, I would use asynchronous replication to a read-replica in a different region for disaster recovery. Furthermore, it's not enough to just deploy redundantly; we must protect the system from cascading failures. I would implement strict timeouts, retry mechanisms with exponential backoff and jitter, and Circuit Breakers (using a library like Resilience4j) on all synchronous network calls between microservices."

Want to track your progress?

Sign in to save your progress, track completed lessons, and pick up where you left off.