System DesignAdvancedguide

WebSocket Fleet Management: Handling Millions of Connections

How to scale a WebSocket infrastructure to handle 10 million concurrent open TCP connections. Learn about gateway routing, backplanes, memory tuning, and failover patterns.

Sachin SarawgiApril 20, 202611 min read11 minute lesson

Reading Mode

Reduce distractions and widen the article focus for long-form reading.

Key Takeaways

What you will learn

**Connection Sizing:** Sizing for 10 million connections requires optimizing the Linux kernel file descriptors and socket buffer memory footprints (30GB+ bare minimum RAM).

**Decoupled Backplane:** WebSocket gateway nodes must remain stateless, delegating connection maps and message distribution to a Redis/Kafka pub/sub mesh.

**Thundering Herd Mitigation:** Implement adaptive randomized jitter during connection reconnections to prevent server crash cascades.

Mental Model

Connecting isolated components into a resilient, scalable, and observable distributed web.

Unlike traditional stateless HTTP gateways, WebSocket gateways are stateful. Every active user maintains a persistent, open TCP connection to a specific server. Scaling this architecture to handle millions of concurrent connections requires optimizing the Linux kernel network stack, designing a highly resilient message backplane, and managing distributed routing states across dynamic gateway fleets.


System Requirements

To scale a WebSocket gateway fleet, we establish capacity requirements and resource calculations:

Capacity Estimations (Sizing for 10 Million Connections)

  • File Descriptors (FD): Every open socket consumes exactly 1 file descriptor. 10M connections require configuring the OS limits to support at least 10,500,000 FDs, allowing safety buffer room for internal microservice network sockets and filesystem locks.
  • Memory Footprint: By default, the Linux kernel allocates 4KB for read and 4KB for write TCP buffers per socket: $$\text{Memory} = 10,000,000 \times 8\text{KB} = 80,000,000\text{KB} \approx 80\text{GB}$$ Tuning the TCP buffer limits to a minimum of 2KB per socket reduces this baseline socket memory cost down to 20--30GB.
  • Network Throughput: Assuming 10% of users are active (sending/receiving a 500-byte message every 5 seconds), the network throughput scales as: $$\text{Ingress Throughput} = 1,000,000 \times 100\text{ bytes/second} = 100\text{MB/s} \approx 800\text{Mbps}$$
  • Bandwidth Sizing: The egress bandwidth must sustain greater than 1Gbps peak capacity, requiring multi-NIC configurations on the load balancers.

Non-Functional Requirements

  • Gateway Node Capacity: Limit connection allocation to a maximum of 100,000 connections per physical gateway node, requiring a fleet of exactly 100 active nodes.
  • Health Tracking: The system must detect dead connections within 30 seconds via active heartbeat ping-pong loops.
  • Blast Radius: The failure of a single gateway node must not sever more than 1% of the global connection pool.
  • Connection Handshake SLA: The gateway fleet must sustain a connection rate of greater than 50,000 new connections per second during sudden cluster failovers.

API Design and Interface Contracts

To coordinate connection routing across stateless microservices, the gateway exposes connection registry contracts. Below is a structured JSON API payload representing the routing state published by a gateway node when a client establishes a persistent connection:

1. Connection Registry Publish Payload (Gateway to Redis/Routing Layer)

POST /api/v1/registry/register

{
  "client_id": "usr-881100-active",
  "gateway_node_id": "ws-node-east-44",
  "connection_status": "CONNECTED",
  "ip_address": "10.0.4.12",
  "established_at": "2026-05-23T10:00:00.123Z",
  "ping_latency_ms": 12
}

2. Route Message Dispatch (Internal Service to Gateway Node API)

POST /api/v1/gateway/dispatch

{
  "target_client_id": "usr-881100-active",
  "message_id": "msg-990088-dispatch",
  "event_type": "CHAT_MESSAGE",
  "payload": {
    "sender_id": "usr-990022",
    "text": "Hello world!"
  }
}

High-Level Architecture

WebSocket fleets require decoupling client connections from internal service routing using a Message Backplane. Client applications establish connection upgrades via an Application Load Balancer (ALB) that terminates TLS. The ALB distributes connections across stateless WebSocket Gateway nodes. The application services publish events to a central Redis Pub/Sub cluster.

1. WebSocket Fleet Architecture

Stateless application servers do not know which physical gateway node holds a target user's active TCP connection. They publish messages to a shared Redis Pub/Sub Backplane. Each gateway node subscribes to a channel matching its own node ID, consuming and delivering messages to the local socket.

graph TD
    Client1[Client A] -->|TCP Connection| GW1[WebSocket Gateway 1]
    Client2[Client B] -->|TCP Connection| GW2[WebSocket Gateway 2]
    
    subgraph GatewayFleet["WebSocket Gateway Fleet"]
        GW1
        GW2
    end
    
    subgraph Backplane["Pub/Sub Broker Backplane"]
        Redis[(Redis Pub/Sub Clusters)]
    end
    
    AppServer[Application Microservices] -->|Publish Target: GW1| Redis
    Redis -->|Deliver| GW1
    Redis -->|Deliver| GW2
    
    %% Style annotations
    classDef gate fill:#e1f5fe,stroke:#01579b,stroke-width:2px;
    class GW1,GW2 gate;

2. Client Connection Lifecycle

Every client handshake is authenticated at the Load Balancer before routing to a gateway instance. Once connected, a bi-directional ping/pong loop keeps the connection alive.

sequenceDiagram
    autonumber
    actor Client
    participant LB as ALB (TLS Termination)
    participant GW as WebSocket Gateway Node
    participant Registry as Redis Connection Registry

    Client->>LB: Upgrade to WebSocket (HTTP 101 Request)
    LB->>GW: Forward Handshake (Core affinity mapping)
    GW->>Registry: Set user-route -> "ws-node-east-44"
    GW-->>Client: Handshake Completed (TCP Established)
    
    loop Heartbeat Loop (every 30s)
        Client->>GW: PING (Client heartbeat)
        GW-->>Client: PONG
    end

Low-Level Design and Schema

Below is a production-ready, compilable Go snippet representing a high-performance WebSocket Connection Registry. It stores active client socket pointers in memory using sharded maps and mutex locks to prevent resource contention under concurrent read/write operations:

package main

import (
	"sync"
	"net"
)

// ShardedConnectionRegistry splits the locking map into 16 shards
// to reduce global mutex lock contention under high write/delete loads.
type ShardedConnectionRegistry struct {
	shards [16]*RegistryShard
}

type RegistryShard struct {
	sync.RWMutex
	connections map[string]net.Conn
}

func NewShardedRegistry() *ShardedConnectionRegistry {
	r := &ShardedConnectionRegistry{}
	for i := 0; i < 16; i++ {
		r.shards[i] = &RegistryShard{
			connections: make(map[string]net.Conn),
		}
	}
	return r
}

func (r *ShardedConnectionRegistry) getShard(clientId string) *RegistryShard {
	// Simple hashing function to distribute users across shards
	hash := 0
	for _, char := range clientId {
		hash += int(char)
	}
	if hash < 0 {
		hash = -hash
	}
	return r.shards[hash%16]
}

func (r *ShardedConnectionRegistry) Register(clientId string, conn net.Conn) {
	shard := r.getShard(clientId)
	shard.Lock()
	defer shard.Unlock()
	shard.connections[clientId] = conn
}

func (r *ShardedConnectionRegistry) Deregister(clientId string) {
	shard := r.getShard(clientId)
	shard.Lock()
	defer shard.Unlock()
	if conn, exists := shard.connections[clientId]; exists {
		conn.Close()
		delete(shard.connections, clientId)
	}
}

func (r *ShardedConnectionRegistry) Get(clientId string) (net.Conn, bool) {
	shard := r.getShard(clientId)
	shard.RLock()
	defer shard.RUnlock()
	conn, exists := shard.connections[clientId]
	return conn, exists
}

Relational Schema (Connection Auditing Database)

To log connection events, lifetimes, and geo-distributions, we maintain an audit database schema:

CREATE TABLE gateway_fleet_nodes (
    gateway_id VARCHAR(100) PRIMARY KEY,
    datacenter_region VARCHAR(50) NOT NULL,
    active_connections_count INT NOT NULL DEFAULT 0,
    health_status VARCHAR(20) NOT NULL DEFAULT 'HEALTHY',
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE active_client_connections (
    connection_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    client_id VARCHAR(100) NOT NULL,
    gateway_id VARCHAR(100) REFERENCES gateway_fleet_nodes(gateway_id) ON DELETE CASCADE,
    client_ip VARCHAR(45) NOT NULL,
    ping_latency_ms INT NOT NULL DEFAULT 0,
    established_at TIMESTAMP WITH TIME ZONE NOT NULL,
    last_heartbeat TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX idx_connections_client ON active_client_connections(client_id);
CREATE INDEX idx_connections_gateway ON active_client_connections(gateway_id);

Scaling Challenges and Capacity Estimation

Stateful scaling introduces distinct low-level operating system constraints that cap capacity margins:

1. Linux Kernel File Descriptor Limits

By default, the Linux operating system caps the maximum open file descriptors per process to 1024, causing connections to fail immediately at scale.

  • Mitigation: Edit /etc/security/limits.conf to set soft and hard limits: * soft nofile 1048576 and * hard nofile 1048576, and configure the system variables sys.fs.file-max = 2097152 in sysctl.conf. This allocates enough room to handle the file descriptor quotas.

2. TCP Buffer Squeezing (Memory Depletion)

If the Linux kernel allocates large default read and write buffers per TCP socket, memory exhaustion occurs under high load.

  • Mitigation: Tune the socket memory allocations in /etc/sysctl.conf to reduce the TCP buffer footprint down to 2KB boundaries. This ensures we don't hit memory barriers:
    net.ipv4.tcp_rmem = 2048 4096 8192
    net.ipv4.tcp_wmem = 2048 4096 8192
    

3. Ephemeral Port Exhaustion

When gateway servers establish upstream connections or proxy requests to backend microservices, they consume local ephemeral ports. By default, Linux reserves ports 32768 to 60999 (28,231 ports), meaning a single gateway trying to establish more than 28,000 simultaneous outgoing connections to the same target IP address will crash due to port exhaustion.

  • Mitigation: Edit /etc/sysctl.conf to expand the port range to the absolute maximum limits: net.ipv4.ip_local_port_range = 1024 65535, and configure TCP socket reuse rules: net.ipv4.tcp_tw_reuse = 1.

Architectural Trade-offs

Evaluating the optimal persistent connection pattern requires balancing performance limits:

Technology Client-to-Server Direction Push Latency Memory per Socket Connection State
WebSocket Supported (Full Duplex) Low (Sub-5ms) Medium (~30KB) Stateful (Persistent TCP)
Server-Sent Events (SSE) Not Supported (One Way) Low (Sub-5ms) Low (~8KB) Stateful (Persistent HTTP)
HTTP Long Polling Supported (via request cycles) High (Depends on poll interval) Extremely High Stateless (Simulated)
WebTransport (HTTP/3) Supported (Multiplexed UDP) Incredibly Low Low Stateful (Multiplexed UDP stream)

Trade-off Evaluation

  1. WebSocket vs. SSE: SSE runs over standard HTTP/2 streams. It is unidirectional (server-to-client only), but it has a much lower memory footprint (approximately 8KB per connection vs. 30KB for WebSockets). It also features built-in reconnection logic and passes easily through proxies and firewalls that often block raw WebSocket upgrades.
  2. WebSocket vs. WebTransport: WebTransport utilizes HTTP/3 over QUIC (UDP). This eliminates head-of-line blocking (where a single dropped packet stalls the entire connection) and reduces connection handshake latency to a single round-trip. However, WebTransport is a draft standard and lacks comprehensive client-library support across legacy web browsers.

Failure Scenarios and Resilience

Resilience configurations must prevent catastrophic cascades during gateway outages:

Scenario A: The Thundering Herd Storm

If a single gateway server holding 100,000 connections crashes, all 100,000 clients will immediately attempt to reconnect to the remaining nodes in the fleet. This will saturate CPU thread pools and knock out the rest of the fleet.

  • Resiliency Mitigation: Enforce Exponential Backoff and Jitter at the client-side reconnection loop. Reconnection retries must be randomized over a wide window. The formula for the retry interval $t$ is: $$t = T_{\text{base}} \times 2^{\text{attempt}} \pm \text{RandomJitter}$$ By adding a wide random jitter, the clients spread their reconnection attempts over several minutes, allowing the load balancer to distribute the spike smoothly.

Scenario B: Redis Pub/Sub Bottlenecking

If every gateway node subscribes to a single global Redis channel to parse messages, the throughput becomes limited by the single-threaded CPU processing speed of the Redis node.

  • Resiliency Mitigation: Group channels dynamically by partition. Map user routes to localized shards (e.g. channel:ws-node-east-44), ensuring a gateway node only consumes messages addressed to its own connections.

Scenario C: Socket Leakage (Zombie Connections)

If clients drop off the network without cleanly closing their sockets (e.g., losing signal inside a tunnel), the gateway server keeps the TCP socket open indefinitely. This leaks memory and file descriptors.

  • Resiliency Mitigation: Implement aggressive heartbeat checks. If a client fails to respond to a ping with a pong message within two heartbeat intervals (e.g., 60 seconds), the gateway server terminates the connection, closes the file descriptor, and releases the allocated buffer memory.

Staff Engineer Perspective


Verbal Script

Verbal Script: Scaling Persistent Sockets

Interviewer: "How would you design a push system to deliver real-time notifications to 10 million concurrent active users?"

Candidate: "To handle 10 million concurrent users, I would design a stateful, horizontally scalable WebSocket fleet consisting of approximately 100 gateway nodes, with each node managing up to 100,000 open TCP connections. To decouple the stateless application servers from this stateful fleet, I would deploy a Redis Pub/Sub backplane as a message broker. When an application server needs to send a notification to User A, it queries a Redis routing registry to identify which gateway node currently holds User A's socket, then publishes the message to a channel dedicated to that specific node."

Interviewer: "Excellent. How would you optimize the operating system on the gateway nodes to support 100,000 open TCP sockets?"

Candidate: "First, we must configure the Linux OS limits. I would set the process file descriptor limit to at least 1,000,000 in limits.conf. Second, to prevent memory depletion, we must tune the TCP socket buffers. Standard Linux TCP allocations can consume up to 80GB of RAM for 10M sockets. By modifying /etc/sysctl.conf to decrease the net.ipv4.tcp_rmem and net.ipv4.tcp_wmem limits down to a 2KB minimum baseline, we can reduce the global memory footprint down to 20--30GB, allowing standard servers to operate efficiently. Third, we expand the local port range to 1024-65535 to prevent ephemeral port exhaustion."

Interviewer: "What fails when a gateway node goes down, and how do we prevent the re-connection storm from bringing down the rest of our fleet?"

Candidate: "When a gateway node holding 100,000 sockets crashes, the clients will immediately disconnect and attempt to reconnect. If they reconnect simultaneously, this thundering herd will crash the remaining gateway nodes. To prevent this, we must enforce exponential backoff with randomized jitter on the client-side reconnection logic. This spreads the reconnection attempts over a 2 to 3 minute window. On the gateway side, the load balancer should rate-limit incoming handshake requests, shedding load to maintain cluster stability while the fleet autoscales."

Practical engineering notes

Get the next backend guide in your inbox

One useful note when a new deep dive is published: system design tradeoffs, Java production lessons, Kafka debugging, database patterns, and AI infrastructure.

No spam. Just practical notes you can use at work.

Sachin Sarawgi

Written by

Sachin Sarawgi

Engineering Manager and backend engineer with 10+ years building distributed systems across fintech, enterprise SaaS, and startups. CodeSprintPro is where I write practical guides on system design, Java, Kafka, databases, AI infrastructure, and production reliability.

Keep Learning

Move through the archive without losing the thread.

Related Articles

More deep dives chosen from shared tags, category overlap, and reading difficulty.

System DesignAdvanced

API Pagination at Scale: Why OFFSET 100,000 is a Database Killer

Designing a paginated API seems simple. Standard frameworks make it trivial: just use LIMIT 20 OFFSET 100. This works perfectly during development and for the first few pages of small tables. However, once your data scal…

Apr 20, 202611 min read
Deep DiveBackend Systems Mastery
#databases#java#performance
System DesignAdvanced

Event-Driven Architecture: CQRS and Event Sourcing in Practice

Mental Model In traditional CRUD (Create, Read, Update, Delete) architectures, the same database model is used for both writing and reading data. Under high traffic, this creates locking contention and complex SQL joins…

Apr 20, 202610 min read
Deep Dive
#performance#system-design
System DesignAdvanced

Bypassing the Kernel: User-Space Networking for Sub-Microsecond Performance

Mental Model For ultra-low-latency distributed systems—such as high-frequency trading (HFT) matching engines, real-time telemetry filters, and high-performance packet routers—even the optimized Linux kernel is too slow.…

Apr 20, 202611 min read
Deep DivePerformance & Optimization Mastery
#performance#system-design
System DesignAdvanced

HyperLogLog at Scale: Billion-Cardinality Estimation

Mental Model > Connecting isolated components into a resilient, scalable, and observable distributed web. Counting unique items (such as Daily Active Users - DAUs, unique page views, or IP addresses) is a classic problem…

Apr 20, 202614 min read
Deep Dive
#performance#system-design

More in System Design

Category-based suggestions if you want to stay in the same domain.