Lesson 5 of 105 12 minFlagship

Case Study: Design Instagram (Photo Sharing)

Master the architecture of a global photo-sharing app. Learn about feed generation, media storage, and sharding billions of images.

Reading Mode

Hide the curriculum rail and keep the lesson centered for focused reading.

Key Takeaways

  • **Upload**: Users can upload images.
  • **Follow**: Users can follow others.
  • **NewsFeed**: Users see a feed of photos from people they follow.
Recommended Prerequisites
System Design Interview FrameworkSystem Design Module 2: The Interview Framework (PEDAL)

Premium outcome

From vague architecture answers to staff-level trade-off thinking.

Backend engineers preparing for senior, staff, and architecture rounds.

What you unlock

  • A reusable system design answer framework for ambiguous prompts
  • Clear language for consistency, scaling, and reliability trade-offs
  • Case-study depth across feeds, payments, storage, and messaging systems

Mental Model

social photo-sharing platforms scale on media read throughput and fast, eventual consistency feed delivery. You are routing multi-media ingestion through globally distributed edge caches, storing raw blobs in decoupled object stores, and building a hybrid fan-out engine that dynamically balances write overhead for typical users with read latencies for celebrities.


Requirements and System Goals

To design a highly available, global media platform like Instagram, we must align our system boundaries with rigid scale expectations and latency bounds.

1. Functional Requirements

  • Photo Upload & Metadata Registration: Users can upload photos, add captions, and register media metadata.
  • Social Graph Management: Users can follow other users, creating a directed graph of relationships.
  • Dynamic News Feed Delivery: Users can view a reverse-chronological feed containing the latest photos from all people they follow.
  • Content Discovery & Search: Support for searching posts by hashtags or searching for other user profiles.

2. Non-Functional Requirements & Performance Budgets

  • High Read Availability: social social feeds demand a target availability of 99.999% ("Five Nines") for reading. Feed generation must be resilient even if regional replication queues suffer latency.
  • Consistent Low Latency Feed Generation: Querying and returning the home news feed must return inside a P99 latency budget of less than 200ms.
  • Data Reliability & Durability: Raw images and metadata must never be lost once successfully acknowledged to the client (Target 99.999999999% durability for raw assets).
  • Consistent Write Latency: Uploading an image and registering its database metadata must return a P99 response of less than 500ms from the closest geographic gateway.

3. Back-of-the-Envelope Estimation: 5-Year Storage Capacity

To properly design our storage array and select between relational databases, caches, and object stores, we calculate expected ingestion volumes over 5 years.

  • Active User Base: Assume 100,000,000 active daily users (DAU).
  • Write Traffic Model:
    • On average, 2% of active users upload a photo daily.
    • Daily Upload Volume: $100,000,000 \times 0.02 = 2,000,000 \text{ photos/day}$.
  • Storage Footprint Per Upload:
    • Every photo is compressed at the gateway into multiple viewport dimensions (e.g., thumbnail, mobile-standard, desktop-retina), totaling an average composite size of 4 Megabytes (MB).
    • Daily Raw Assets Ingestion: $2,000,000 \times 4 \text{ MB} = 8,000,000 \text{ MB/day} \approx 8 \text{ Terabytes (TB) per day}$.
    • 5-Year Raw Assets Storage Requirement: $8 \text{ TB/day} \times 365 \text{ days/year} \times 5 \text{ years} \approx 14,600 \text{ TB} = 14.6 \text{ Petabytes (PB)}$ of raw object storage.
  • Metadata Database Sizing:
    • Each post metadata entry (Post ID, User ID, Image URL, Caption, Timestamp, Version) averages 500 bytes of structured data.
    • Daily Metadata Volume: $2,000,000 \times 500 \text{ bytes} = 1 \text{ Gigabyte (GB) per day}$.
    • 5-Year Metadata Storage: $1 \text{ GB/day} \times 365 \text{ days/year} \times 5 \text{ years} \approx 1.82 \text{ TB}$ of metadata records. This fits comfortably in a sharded relational database or NoSQL layout.

API Interfaces and Service Contracts

We expose RESTful and GraphQL endpoints to allow photo registration, graph manipulation, and feed ingestion.

1. Photo Upload and Metadata Registration

POST /api/v1/posts

Request Payload:

{
  "userId": "usr_7721b-88cd",
  "clientTimestamp": 1780000000000,
  "caption": "Chasing sunsets in Kyoto #travel #kyoto",
  "mediaMetadata": {
    "imageFormat": "webp",
    "rawBytesCount": 4194304,
    "s3PresignedUploadKey": "raw-media/usr_7721b-88cd/posts/uuid7_9918a.webp"
  }
}

Response Payload (201 Created):

{
  "postId": "pst_019a-uuid7-9918a",
  "userId": "usr_7721b-88cd",
  "mediaUrl": "https://cdn.codesprintpro.com/posts/usr_7721b-88cd/uuid7_9918a.webp",
  "registeredAt": "2026-05-31T11:23:00Z"
}

2. Paginated News Feed Retrieval

GET /api/v1/users/{userId}/feed

Parameters:

  • limit (int, default: 20): Number of post metadata objects to return.
  • cursor (string, optional): Base64 encoded token containing the time-sortable postId of the last item in the previous page.

Response Payload (200 OK):

{
  "posts": [
    {
      "postId": "pst_019a-uuid7-9918a",
      "authorId": "usr_8829a",
      "mediaUrl": "https://cdn.codesprintpro.com/posts/usr_8829a/uuid7_8812b.webp",
      "caption": "Morning coffee!",
      "likesCount": 1420,
      "createdAt": 1780000100000
    }
  ],
  "nextCursor": "eyJwb3N0SWQiOiJwc3RfMDE5YS11dWlkNy05OTE4YSIsImNyZWF0ZWRBdCI6MTc4MDAwMDAwMDAwMH0="
}

3. Follow User Event Contract

POST /api/v1/users/{followerId}/follows

Request Payload:

{
  "targetUserId": "usr_8829a",
  "timestamp": 1780000050000
}

Response Payload (200 OK):

{
  "followerId": "usr_7721b-88cd",
  "targetUserId": "usr_8829a",
  "relationshipState": "ACTIVE"
}

High-Level Design and Visualizations

Our global photo-sharing architecture decouples heavy media processing streams from high-throughput feed lookups.

graph TD
    subgraph Client Layer
        User((User Client)) --> CDN[Cloudflare Global CDN]
        User --> DNS[Geo-DNS Router]
    end
    
    subgraph Routing & Gateway
        DNS --> LB[Elastic Load Balancer]
        LB --> Gateway[Kong API Gateway]
    end
    
    subgraph Microservices Cluster
        Gateway -->|Register Post| PostSvc[Post Service]
        Gateway -->|Manage Graph| GraphSvc[Social Graph Service]
        Gateway -->|Deliver Feed| FeedSvc[Feed Generation Service]
        Gateway -->|Stream Media| MediaSvc[Media Ingestion Service]
    end
    
    subgraph Media Pipeline
        MediaSvc -->|Presigned Upload| S3[(Amazon S3 Object Store)]
        MediaSvc -->|Trigger Async Resize| Kafka[Kafka Media Topic]
        Kafka --> Worker[Media Processing Workers]
        Worker -->|Save Resized Viewports| S3
    end
    
    subgraph Database & Caching Layers
        PostSvc -->|Write Metadata| PostDB[(Sharded PostgreSQL)]
        GraphSvc -->|Read/Write Follows| GraphCache[(Redis Graph Cache)]
        GraphSvc -->|Persist Graph| Neo4j[(Neo4j Graph DB)]
        FeedSvc -->|Query Active Feed| FeedCache[(Redis Feed Cache)]
        PostDB -.->|Invalidate / Warm Cache| FeedCache
    end

Low-Level Design and Schema Strategies

To support millions of relational joins, sharded lookup routing, and ultra-fast user profile fetches, we define the relational metadata layout and NoSQL Single-Table structures.

1. Relational PostgreSQL Schema (Sharded Metadata Database)

To prevent ID collisions across sharded nodes, the system implements time-ordered UUIDv7 as primary keys.

-- PostgreSQL User profiles metadata
CREATE TABLE users (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    username VARCHAR(30) UNIQUE NOT NULL,
    display_name VARCHAR(100) NOT NULL,
    profile_pic_url VARCHAR(255),
    created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW()
);

-- PostgreSQL Post Metadata Table
CREATE TABLE posts (
    id UUID PRIMARY KEY, -- Enforce UUIDv7 generated by application layer
    user_id UUID NOT NULL REFERENCES users(id),
    media_url VARCHAR(255) NOT NULL,
    caption TEXT,
    likes_count INT DEFAULT 0,
    created_at TIMESTAMP WITH TIME ZONE NOT NULL,
    version INT DEFAULT 1
);

-- Indexing for time-ordered reverse chronological sorting
CREATE INDEX idx_user_posts_created ON posts(user_id, created_at DESC);

-- PostgreSQL Social Graph Follows Table
CREATE TABLE follows (
    follower_id UUID NOT NULL REFERENCES users(id),
    followed_id UUID NOT NULL REFERENCES users(id),
    created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(),
    PRIMARY KEY (follower_id, followed_id)
);

CREATE INDEX idx_follows_followed ON follows(followed_id);

2. DynamoDB NoSQL Single-Table Design Alternative

For massive horizontal scalability free from sharding administration, the metadata and social graph are mapped onto a single Amazon DynamoDB table using prefix partitions.

PK (Partition Key) SK (Sort Key) Attribute: mediaUrl Attribute: caption Attribute: timestamp Attribute: type
USER#usr_7721b METADATA - - 1780000000 UserRecord
USER#usr_7721b POST#pst_019a https://cdn...webp Kyoto travel... 1780000100 PostRecord
USER#usr_7721b FOLLOWED#usr_8829a - - 1780000050 FollowRelationship
USER#usr_8829a FOLLOWER#usr_7721b - - 1780000050 FollowRelationship
  • Single-Query Ingestion: To fetch a user profile and all their posts in a single roundtrip, we execute a single query: PK = USER#usr_7721b AND SK BEGINS_WITH(POST#). This reads contiguous records from disk in $O(\log N + K)$ complexity, completely bypassing SQL multi-table joins.

Scaling and Operational Challenges

1. The Multi-Million Follower Problem: Hybrid Fan-out

Generating a feed involves aggregating posts from all followed users. We analyze the mathematical trade-offs of Push vs. Pull models.

A. Pull Model (Fan-out on Read)

When User A requests their feed, the system queries the databases of all users they follow, pulls their latest posts, and merges them in memory.

  • Query Latency Bounds: If User A follows $1,000$ users, the feed generator must merge $1,000$ DB indexes, which introduces massive tail latency spikes (P99 exceeds 800ms).

B. Push Model (Fan-out on Write)

When User B uploads a photo, the system immediately writes the post metadata to the pre-generated feed caches (Redis) of all their followers.

  • Write Overhead Storm: If User B is a celebrity with $50,000,000$ followers, a single upload generates $50,000,000$ writes to Redis keys, completely saturating the Redis cluster and creating cascading network buffers.

C. The Staff Engineer's Hybrid Solution

The feed service dynamically inspects the author's follow count upon photo upload:

sequenceDiagram
    autonumber
    participant Client as User Client
    participant GW as API Gateway
    participant PostS as Post Service
    participant FeedS as Feed Service
    participant Cache as Redis Feed Caches

    Client->>GW: Upload Photo Request
    GW->>PostS: Register Post & Write Metadata
    PostS->>FeedS: Notify Async (Post Event)
    
    rect rgb(240, 240, 245)
        Note over FeedS: Is the author a celebrity? (followers > 100,000)
        alt Normal User (Followers <= 100k)
            FeedS->>Cache: Push Post ID to all follower caches (Fan-out on Write)
        else Celebrity (Followers > 100k)
            FeedS->>FeedS: Skip Push. Store celebrity post strictly in DB.
        end
    end
    
    Note over Client: Client reads home feed
    Client->>FeedS: GET /feed
    FeedS->>Cache: Pull pre-generated feed cache
    FeedS->>FeedS: Fetch active celebrity posts at runtime
    FeedS->>Client: Merge & Deliver Feed (Latency < 200ms)

Architectural Trade-offs and Ingestion Decisions

Deciding between push, pull, and hybrid caching topologies dictates database costs and feed latency bounds.

Operational Dimension Pure Push Model (Fan-out on Write) Pure Pull Model (Fan-out on Read) Hybrid Fan-out Architecture
P99 Read Latency Low (Sub-50ms directly from cache) High (Vulnerable to index merge storms) Optimal (P99 less than 150ms)
Write Amplification Severe ($O(N)$ writes per post) Zero ($O(1)$ writes per post) Controlled (Bounded to normal users)
Celebrity Post Sizing Crashes cache clusters (Hotspots) Negligible impact Elegant (Routes around cache hotspots)
Memory Footprint Massive (Buffers every single feed) Low (No pre-generated caches) Medium (Stores active user feed caches)

Failure Modes and Fault Tolerance Strategies

1. The Celebrity Post Thundering Herd

When a high-profile celebrity (e.g., $100\text{M}$ followers) uploads a photo, they do not push their post to their followers' feed caches. Instead, those $100\text{M}$ followers must pull this celebrity post at runtime.

  • The Danger: When the celebrity posts, millions of clients concurrently hit the database to retrieve this new post, creating a massive, cascading Thundering Herd database lock congestion.
  • Staff Mitigation: Implement SingleFlight (Go/Node) or Mutex Coalescing in the Feed Service. If $10,000$ concurrent requests ask for the same celebrity post metadata, SingleFlight deduplicates the database query, executes exactly one DB fetch, and broadcasts the result to all waiting request threads, cutting database read loads by 99.9%.

2. S3 Media Upload Failures & Dead-Letter Queues (DLQ)

Clients upload raw images to Amazon S3 using presigned URLs. If the S3 upload succeeds but the client's subsequent metadata registration to POST /api/v1/posts fails (e.g., due to network drops), we get Orphaned Media Blobs in S3, wasting terabytes of storage.

  • Staff Mitigation: Implement an asynchronous metadata-reconciliation daemon. All S3 bucket writes emit an S3 ObjectCreated Event to an Amazon SQS queue. A background worker consumes from SQS, checks if a corresponding post entry exists in the PostgreSQL metadata table after a 10-minute grace period, and deletes the orphan S3 blob if no database record is found, ensuring clean storage limits.

Staff Engineer Perspective


Production Readiness Checklist

Before launching a global photo-sharing system:

  • Canary Hybrid Thresholds: The celebrity cutoff value is set dynamically based on current Redis cluster write latency.
  • Edge CDN Compression: All media URLs point to edge locations using WebP compression format dynamically negotiated via client Accept headers.
  • UUIDv7 Key Monotonicity: Database metadata primary keys are generated time-sequentially to ensure optimal B-Tree index density.
  • SingleFlight Query Coalescing Active: Multi-user read requests are throttled at the app gateway to prevent database thundering herds.


Verbal Script

Interviewer: "How would you design a feed generation system for a social photo-sharing app that scales to 100M active daily users and handles celebrity hotspots?"

Candidate: "To design a highly available feed system for 100M active daily users, my core focus is balancing write throughput during ingestion with sub-200ms read latency.

I would implement a Hybrid Fan-out Architecture that separates standard users from high-profile celebrities.

For normal users who follow less than 100,000 people, we use a Push Model (Fan-out on Write). When they post an image, we immediately fan-out and append their post ID to all their followers' pre-generated feed caches in Redis. This ensures that when an active user opens their app, their feed is pre-assembled and returns inside a sub-50ms cache read.

However, for celebrities with millions of followers, a push model would crash our cache cluster. For these users, we switch to a Pull Model (Fan-out on Read). We skip the cache push entirely during upload. Instead, when a follower requests their feed, we pull their pre-generated cache, query the celebrity's recent post metadata, and merge them in-memory before returning the payload.

To prevent the Celebrity Thundering Herd where millions of concurrent users hit our database for the same celebrity post, I would implement SingleFlight query coalescing at the application gateway. This ensures that only a single query goes to the database at any one time, and the results are broadcasted to all waiting clients in-memory.

Finally, we decouple media uploads by returning a fast presigned URL to the client. The client uploads the image directly to Amazon S3, and our media microservices trigger asynchronous resizers using Kafka and worker pools, protecting our main web request threads from CPU exhaustion."

Want to track your progress?

Sign in to save your progress, track completed lessons, and pick up where you left off.