Lesson 14 of 38 12 minDesign Track

What is Load Balancing? A Simple Guide for Backend Engineers

Learn the fundamentals of load balancing, how it works, and the core algorithms used in modern distributed systems.

Reading Mode

Hide the curriculum rail and keep the lesson centered for focused reading.

Key Takeaways

  • Load balancers act as traffic coordinators, scaling read/write traffic out across server pools.
  • Layer 4 balancing operates at the TCP layer, while Layer 7 balancing inspects HTTP/HTTPS headers and cookies.
  • High Availability configurations (Active-Passive or Active-Active) prevent the load balancer itself from becoming a single point of failure.
Recommended Prerequisites
System Design Module 3: Scalability Basics

Premium outcome

Bridge the gap between architecture diagrams and implementation details.

Engineers preparing for LLD rounds or leveling up their software design depth.

What you unlock

  • Cleaner reasoning around SOLID, patterns, responsibilities, and schema design
  • A usable bridge between HLD whiteboard thinking and concrete Java classes
  • Case-study practice across common interview-style design systems

Load Balancing & Reverse Proxies

When you build a web application, it typically starts on a single server. A client makes an HTTP request, DNS resolves your domain to the server's IP address, and the server returns the response. This setup is simple, but it has a massive vulnerability: it cannot scale. If your traffic spikes or the physical host crashes, your application goes offline.

To solve this, you must run multiple copies of your application across a cluster of servers.

A Load Balancer acts as a traffic coordinator sitting in front of these servers, intercepting all incoming client requests and routing them across the backends. This ensures high availability, redundancy, and efficient utilization of compute resources.


Requirements and System Goals

Designing a load balancing infrastructure requires satisfying precise functional and operational requirements:

Functional Requirements

  • Smart Traffic Distribution: Distribute incoming client connections across backends using configurable algorithms (e.g., Round Robin, Least Connections, IP Hashing).
  • Active Health Checking: Periodically query upstream backend servers to verify they are healthy, dynamically removing unresponsive nodes from the active pool.
  • SSL/TLS Termination: Decrypt incoming HTTPS traffic at the load balancer level, passing plaintext HTTP to the internal application network to save server CPU cycles.
  • Session Persistence (Stickiness): Support routing requests from a specific client to the same backend server when stateful session caching is required.

Non-Functional Requirements

  • Sub-Millisecond Routing Overhead: The load balancer must add negligible processing latency (less than 1.0 millisecond p99) to the request-response path.
  • High Packet-Throughput Capacity: Support scaling horizontally or utilizing optimized transport layers to handle millions of concurrent connections.
  • No Single Point of Failure (SPOF): The load balancing tier itself must be redundantly configured using active-active or active-passive topologies.
  • Elastic Configuration Hot-Reloading: Allow adding or removing upstream servers dynamically without dropping active client connections.

API Interfaces and Service Contracts

Rather than standard JSON APIs, load balancers are configured using declarative structures. Below is the service contract for configuring an upstream server pool and routing rules in Nginx and HAProxy formats:

1. Nginx L7 Upstream and Routing Configuration

This configuration tells the proxy how to group backend servers and route path-based traffic:

# Define the backend application cluster
upstream payment_backend_pool {
    least_conn; # Use Least Connections routing algorithm
    
    server 10.0.1.10:8080 max_fails=3 fail_timeout=10s weight=3;
    server 10.0.1.11:8080 max_fails=3 fail_timeout=10s weight=1;
    server 10.0.1.12:8080 backup; # Standby server only used if others fail
}

server {
    listen 443 ssl;
    server_name api.codesprintpro.com;

    ssl_certificate /etc/ssl/certs/api_bundle.crt;
    ssl_certificate_key /etc/ssl/private/api.key;
    
    # Path-based routing (Layer 7 behavior)
    location /v1/payments {
        proxy_pass http://payment_backend_pool;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        
        # Keepalive connection settings to avoid TCP handshake overhead
        proxy_http_version 1.1;
        proxy_set_header Connection "";
    }
}

2. HAProxy Health Check and Timeout Configuration

HAProxy configures health checks explicitly using timeouts and retry thresholds:

backend payment_servers
    mode http
    balance roundrobin
    option httpchk GET /healthz
    http-check expect status 200
    
    # Health checks run every 2000ms. A server is marked dead after 3 consecutive failures.
    server srv1 10.0.1.10:8080 check inter 2000 fall 3 rise 2
    server srv2 10.0.1.11:8080 check inter 2000 fall 3 rise 2

High-Level Design and Visualizations

To understand load balancing, backend engineers must distinguish between Layer 4 (Transport Layer) and Layer 7 (Application Layer) routing paths.

Layer 4 vs Layer 7 Routing Mechanics

graph TD
    Client[Client Browser] -->|1. HTTPS Request| LB[Load Balancer]
    
    subgraph L4_Layer["Layer 4 (TCP Level Proxy)"]
        LB -.->|Bypasses content payload<br/>Direct TCP stream proxy| L4_Target[Backend Server 1]
    end
    
    subgraph L7_Layer["Layer 7 (Application Level Proxy)"]
        LB -->|Terminates SSL/TLS<br/>Inspects /v1/checkout HTTP Header| L7_Target[Backend Server 2]
    end
  • Layer 4 (L4): Operates at the transport level (TCP/UDP). The load balancer opens a TCP socket connection but does not read the application payload. It simply routes bytes based on IP addresses and TCP port numbers.
  • Layer 7 (L7): Operates at the application level (HTTP/HTTPS/gRPC). The load balancer terminates the SSL/TLS connection, reads the HTTP headers, inspects cookies, evaluates paths, and makes intelligent routing decisions.

High-Availability Load Balancer Pipeline (Active-Active)

To prevent the load balancer itself from being a Single Point of Failure (SPOF), we use a combination of BGP Anycast routing and Keepalived heartbeats.

sequenceDiagram
    autonumber
    participant User as Client Browser
    participant Router as Edge Router (BGP Anycast)
    participant LB1 as LB Master 1 (Active)
    participant LB2 as LB Master 2 (Active)
    participant Keepalived as Keepalived Daemon
    participant Servers as Upstream Application Pool

    Note over User: DNS resolves to virtual IP (VIP) shared by both LBs
    User->>Router: Send HTTPS request to VIP: 198.51.100.1
    Note over Router: BGP Anycast routes traffic uniformly<br/>across both LBs using ECMP routing
    Router->>LB1: Route packet
    Router->>LB2: Route packet
    LB1->>Servers: Route to Server 1 (Round Robin)
    LB2->>Servers: Route to Server 2 (Round Robin)
    
    Keepalived->>LB1: Monitor Master 1 Heartbeat
    Keepalived->>LB2: Monitor Master 2 Heartbeat
    
    Note over LB1: Master 1 experiences network drop!
    Keepalived->>Router: Update BGP route maps immediately
    Router->>LB2: Redirect all subsequent VIP traffic to Master 2 only

Low-Level Design and Schema Strategies

To monitor backend health and manage weight allocations, the load balancer engine maintains a low-level routing metadata table.

Relational Schema (PostgreSQL tracking model)

While HAProxy and Nginx manage routing state inside process memory, configuration engines (like Consul Template or custom control planes) persist routing registries:

-- Upstream backend targets registry
CREATE TABLE upstream_servers (
    server_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    pool_name VARCHAR(128) NOT NULL,    -- e.g., 'payment_backend_pool'
    ip_address VARCHAR(45) NOT NULL,     -- Supports IPv4 & IPv6
    port INT NOT NULL,
    weight INT NOT NULL DEFAULT 1,       -- Lower weight = fewer requests
    is_backup BOOLEAN NOT NULL DEFAULT FALSE,
    health_status VARCHAR(32) NOT NULL, -- 'HEALTHY', 'UNHEALTHY', 'DRAINING'
    consecutive_failures INT NOT NULL DEFAULT 0,
    last_checked_at TIMESTAMPTZ,
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- Index for fast status updates and routing table assembly
CREATE INDEX idx_upstream_routing ON upstream_servers (pool_name, health_status);

Scaling and Operational Challenges

Managing load balancers at high traffic volumes exposes performance limits in network bandwidth and CPU cores.

Packet Ingestion and TCP Handshake Overhead Calculations

Let us analyze the differences in CPU processing costs between a Layer 4 (L4) load balancer and a Layer 7 (L7) load balancer handling 1,000,000 (1 Million) active TCP requests per second.

Let:

  • $R$ = Total connections = $1,000,000$ connections/second.
  • Network Packet Size = $1500 \text{ bytes}$ (Standard MTU size).
  • CPU Operations:
    • L4 Balancing: Executes packet forwarding. It does not terminate the connection. It translates the destination IP (NAT - Network Address Translation) and forwards the packet.
      • Average cost: $5,000$ CPU cycles per connection.
    • L7 Balancing: Terminates the incoming TCP connection, terminates the SSL/TLS session, decodes the HTTP request, processes routing rules, opens a new TCP connection to the backend, and proxies the bytes.
      • Average cost: $500,000$ CPU cycles per connection (representing a 100x increase due to cryptography and content parsing).

First, calculate the total CPU cycles required for L4 Load Balancing:

$$\text{Cycles}_{\text{L4}} = 1,000,000 \text{ connections/sec} \times 5,000 \text{ cycles} = 5,000,000,000 \text{ cycles/second} = 5.0 \text{ GHz}$$

Assuming a single modern CPU core runs at 2.5 GHz:

$$\text{Cores}_{\text{L4}} = \frac{5.0 \text{ GHz}}{2.5 \text{ GHz}} = 2 \text{ CPU Cores}$$

Now, calculate the total CPU cycles required for L7 Load Balancing with SSL/TLS termination:

$$\text{Cycles}_{\text{L7}} = 1,000,000 \text{ connections/sec} \times 500,000 \text{ cycles} = 500,000,000,000 \text{ cycles/second} = 500 \text{ GHz}$$

$$\text{Cores}_{\text{L7}} = \frac{500 \text{ GHz}}{2.5 \text{ GHz}} = 200 \text{ CPU Cores}$$

Scaling Insights

  • L4 Balancing requires only 2 CPU cores because it does not inspect payloads. It is highly optimized for raw packet throughput.
  • L7 Balancing requires 200 CPU cores because of SSL decryption and HTTP parsing overhead.
  • Architecture Choice: In high-traffic systems, deploy a multi-tiered load balancing strategy. Place an L4 load balancer (like IPVS or AWS NLB) at the front edge to distribute packets across a horizontally scaled tier of L7 load balancers (like Envoy, HAProxy, or AWS ALB) for intelligent application-level routing.

Trade-offs and Architectural Alternatives

Backend engineers must evaluate the trade-offs between load balancing layers and deployment topologies:

Dimension Layer 4 Load Balancing (NLB / IPVS) Layer 7 Load Balancing (ALB / HAProxy) DNS Round-Robin
Operating Layer Transport Level (TCP/UDP). Application Level (HTTP/HTTPS/gRPC). DNS Resolver Level.
SSL/TLS Decryption None (Passed through to backends). Yes (Terminated at LB). None.
Routing Intelligence Low (Routes based on IP and Port only). High (Routes by path, headers, cookies, query parameters). Minimal (Returns list of IPs randomly).
CPU/Memory Overhead Ultra-Low (NAT and packet forwarding). High (Terminates connection state, decodes protocol). Zero (No proxy hop in request path).
Failover Convergence Time Fast (Sub-second via hardware/health checks). Fast (Sub-second via health checks). Very Slow (Clients cache DNS records, violating TTL settings for hours).
Security Inspections Minimal. High (Supports Web Application Firewalls (WAF) and DDOS protection). None.

Failure Modes and Fault Tolerance Strategies

1. Health Check Flapping

A backend server is near its memory limit. When a health check arrives, it fails to respond within 500ms, so the load balancer marks it dead and stops sending traffic. The server's load drops, its CPU clears, and it starts responding to health checks again. The load balancer reintroduces it to the active pool, it immediately gets saturated, and the loop repeats, creating a flapping state.

  • Resolution Strategy: Implement Hysteresis thresholds and grace periods. A server must fail 3 consecutive checks to be marked dead (fall threshold), but it must pass 5 consecutive healthy checks to be marked live (rise threshold). Furthermore, implement slow-start warming, where the load balancer progressively increases traffic weight to a newly recovered server over a 5-minute window.

2. Upstream Backend Starvation (Cascading Outage)

If 2 out of 5 servers in a pool crash, the remaining 3 servers must inherit 100% of the active traffic. If the servers are already running at 70% capacity, the additional load saturates their CPU threads, causing them to fail health checks. The load balancer marks them dead, taking the entire application offline.

  • Resolution Strategy:
    • Load Shedding: If all upstream servers are saturated, the load balancer must return HTTP 429 Too Many Requests or HTTP 503 Service Unavailable directly, rather than passing traffic through to crash the servers.
    • Fail-Open Status: If all health checks fail, the load balancer should bypass health checks and send traffic to all nodes, avoiding a complete blackout.

Staff Engineer Perspective

The Danger of Session Stickiness (IP Hashing)

Interviewer candidates often suggest IP Hashing to achieve session stickiness for stateful apps. While simple, this creates major hotspots in production:

  • If your clients connect behind a single corporate NAT proxy or VPN, millions of users will share a handful of outbound IP addresses.
  • IP Hashing will route all these requests to the same backend node, overloading it while the rest of the cluster sits idle.
  • Best Practice: Avoid IP hashing. If stickiness is required, use L7 Cookie-Based Stickiness, where the load balancer inserts a tracking cookie (e.g., SERVERID=node_01) into the client's first HTTP response, routing subsequent requests based on the cookie header.

SSL Session Resumption (Tickets and IDs)

Terminating SSL/TLS handshakes consumes significant CPU resources due to asymmetric cryptography. To optimize this:

  • Implement SSL Session IDs and Session Tickets.
  • When a client connects, the load balancer encrypts the session state and sends it to the client as a session ticket.
  • On subsequent connections, the client sends this ticket back. The load balancer decrypts it, resuming the session instantly without executing a full cryptographic handshake. This cuts handshake latency from 150ms to less than 10ms and reduces load balancer CPU usage by up to 80%.

Verbal Script

Interviewer: "How would you design a load balancing architecture for a system handling millions of concurrent users, and what are the trade-offs?"

Candidate:

"To design a high-throughput load balancing architecture, I would implement a multi-tiered topology to balance raw packet speed with application-level routing intelligence.

At the edge, I would configure a tier of Layer 4 (L4) load balancers running IPVS or using AWS Network Load Balancer (NLB). L4 balancers operate at the transport layer, routing packets based on IP and Port using NAT without terminating the TCP connections. This keeps their CPU overhead extremely low, requiring only 2 CPU cores to process 1 million concurrent connections.

These L4 balancers distribute connections across a horizontally scaled pool of Layer 7 (L7) software load balancers, such as HAProxy or Envoy. The L7 load balancers terminate SSL/TLS sessions, decrypt traffic, and execute path-based routing—for example, directing /v1/payments to a dedicated payment service.

To prevent the load balancing tier itself from becoming a Single Point of Failure (SPOF), I would configure the L4 load balancers in an Active-Active setup using BGP Anycast routing and ECMP (Equal-Cost Multi-Path) on our edge routers. This assigns a single Virtual IP (VIP) to all active L4 balancers. If one node crashes, Keepalived heartbeats trigger the edge router to redirect packets to the remaining active nodes in sub-seconds.

Additionally, to prevent cascading outages on our backend servers, I would configure health checks with hysteresis thresholds to avoid flapping, and enforce strict load-shedding policies directly on the L7 proxies. If backend thread saturation is reached, the proxies return HTTP 429 errors directly, preserving the availability of our core application servers."


Want to track your progress?

Sign in to save your progress, track completed lessons, and pick up where you left off.