Mental Model
Connecting isolated components into a resilient, scalable, and observable distributed web.
Unlike traditional stateless HTTP gateways, WebSocket gateways are stateful. Every active user maintains a persistent, open TCP connection to a specific server. Scaling this architecture to handle millions of concurrent connections requires optimizing the Linux kernel network stack, designing a highly resilient message backplane, and managing distributed routing states across dynamic gateway fleets.
System Requirements
To scale a WebSocket gateway fleet, we establish capacity requirements and resource calculations:
Capacity Estimations (Sizing for 10 Million Connections)
- File Descriptors (FD): Every open socket consumes exactly 1 file descriptor. 10M connections require configuring the OS limits to support at least 10,500,000 FDs, allowing safety buffer room for internal microservice network sockets and filesystem locks.
- Memory Footprint: By default, the Linux kernel allocates 4KB for read and 4KB for write TCP buffers per socket: $$\text{Memory} = 10,000,000 \times 8\text{KB} = 80,000,000\text{KB} \approx 80\text{GB}$$ Tuning the TCP buffer limits to a minimum of 2KB per socket reduces this baseline socket memory cost down to 20--30GB.
- Network Throughput: Assuming 10% of users are active (sending/receiving a 500-byte message every 5 seconds), the network throughput scales as: $$\text{Ingress Throughput} = 1,000,000 \times 100\text{ bytes/second} = 100\text{MB/s} \approx 800\text{Mbps}$$
- Bandwidth Sizing: The egress bandwidth must sustain greater than 1Gbps peak capacity, requiring multi-NIC configurations on the load balancers.
Non-Functional Requirements
- Gateway Node Capacity: Limit connection allocation to a maximum of 100,000 connections per physical gateway node, requiring a fleet of exactly 100 active nodes.
- Health Tracking: The system must detect dead connections within 30 seconds via active heartbeat ping-pong loops.
- Blast Radius: The failure of a single gateway node must not sever more than 1% of the global connection pool.
- Connection Handshake SLA: The gateway fleet must sustain a connection rate of greater than 50,000 new connections per second during sudden cluster failovers.
API Design and Interface Contracts
To coordinate connection routing across stateless microservices, the gateway exposes connection registry contracts. Below is a structured JSON API payload representing the routing state published by a gateway node when a client establishes a persistent connection:
1. Connection Registry Publish Payload (Gateway to Redis/Routing Layer)
POST /api/v1/registry/register
{
"client_id": "usr-881100-active",
"gateway_node_id": "ws-node-east-44",
"connection_status": "CONNECTED",
"ip_address": "10.0.4.12",
"established_at": "2026-05-23T10:00:00.123Z",
"ping_latency_ms": 12
}
2. Route Message Dispatch (Internal Service to Gateway Node API)
POST /api/v1/gateway/dispatch
{
"target_client_id": "usr-881100-active",
"message_id": "msg-990088-dispatch",
"event_type": "CHAT_MESSAGE",
"payload": {
"sender_id": "usr-990022",
"text": "Hello world!"
}
}
High-Level Architecture
WebSocket fleets require decoupling client connections from internal service routing using a Message Backplane. Client applications establish connection upgrades via an Application Load Balancer (ALB) that terminates TLS. The ALB distributes connections across stateless WebSocket Gateway nodes. The application services publish events to a central Redis Pub/Sub cluster.
1. WebSocket Fleet Architecture
Stateless application servers do not know which physical gateway node holds a target user's active TCP connection. They publish messages to a shared Redis Pub/Sub Backplane. Each gateway node subscribes to a channel matching its own node ID, consuming and delivering messages to the local socket.
graph TD
Client1[Client A] -->|TCP Connection| GW1[WebSocket Gateway 1]
Client2[Client B] -->|TCP Connection| GW2[WebSocket Gateway 2]
subgraph GatewayFleet["WebSocket Gateway Fleet"]
GW1
GW2
end
subgraph Backplane["Pub/Sub Broker Backplane"]
Redis[(Redis Pub/Sub Clusters)]
end
AppServer[Application Microservices] -->|Publish Target: GW1| Redis
Redis -->|Deliver| GW1
Redis -->|Deliver| GW2
%% Style annotations
classDef gate fill:#e1f5fe,stroke:#01579b,stroke-width:2px;
class GW1,GW2 gate;
2. Client Connection Lifecycle
Every client handshake is authenticated at the Load Balancer before routing to a gateway instance. Once connected, a bi-directional ping/pong loop keeps the connection alive.
sequenceDiagram
autonumber
actor Client
participant LB as ALB (TLS Termination)
participant GW as WebSocket Gateway Node
participant Registry as Redis Connection Registry
Client->>LB: Upgrade to WebSocket (HTTP 101 Request)
LB->>GW: Forward Handshake (Core affinity mapping)
GW->>Registry: Set user-route -> "ws-node-east-44"
GW-->>Client: Handshake Completed (TCP Established)
loop Heartbeat Loop (every 30s)
Client->>GW: PING (Client heartbeat)
GW-->>Client: PONG
end
Low-Level Design and Schema
Below is a production-ready, compilable Go snippet representing a high-performance WebSocket Connection Registry. It stores active client socket pointers in memory using sharded maps and mutex locks to prevent resource contention under concurrent read/write operations:
package main
import (
"sync"
"net"
)
// ShardedConnectionRegistry splits the locking map into 16 shards
// to reduce global mutex lock contention under high write/delete loads.
type ShardedConnectionRegistry struct {
shards [16]*RegistryShard
}
type RegistryShard struct {
sync.RWMutex
connections map[string]net.Conn
}
func NewShardedRegistry() *ShardedConnectionRegistry {
r := &ShardedConnectionRegistry{}
for i := 0; i < 16; i++ {
r.shards[i] = &RegistryShard{
connections: make(map[string]net.Conn),
}
}
return r
}
func (r *ShardedConnectionRegistry) getShard(clientId string) *RegistryShard {
// Simple hashing function to distribute users across shards
hash := 0
for _, char := range clientId {
hash += int(char)
}
if hash < 0 {
hash = -hash
}
return r.shards[hash%16]
}
func (r *ShardedConnectionRegistry) Register(clientId string, conn net.Conn) {
shard := r.getShard(clientId)
shard.Lock()
defer shard.Unlock()
shard.connections[clientId] = conn
}
func (r *ShardedConnectionRegistry) Deregister(clientId string) {
shard := r.getShard(clientId)
shard.Lock()
defer shard.Unlock()
if conn, exists := shard.connections[clientId]; exists {
conn.Close()
delete(shard.connections, clientId)
}
}
func (r *ShardedConnectionRegistry) Get(clientId string) (net.Conn, bool) {
shard := r.getShard(clientId)
shard.RLock()
defer shard.RUnlock()
conn, exists := shard.connections[clientId]
return conn, exists
}
Relational Schema (Connection Auditing Database)
To log connection events, lifetimes, and geo-distributions, we maintain an audit database schema:
CREATE TABLE gateway_fleet_nodes (
gateway_id VARCHAR(100) PRIMARY KEY,
datacenter_region VARCHAR(50) NOT NULL,
active_connections_count INT NOT NULL DEFAULT 0,
health_status VARCHAR(20) NOT NULL DEFAULT 'HEALTHY',
updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE active_client_connections (
connection_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
client_id VARCHAR(100) NOT NULL,
gateway_id VARCHAR(100) REFERENCES gateway_fleet_nodes(gateway_id) ON DELETE CASCADE,
client_ip VARCHAR(45) NOT NULL,
ping_latency_ms INT NOT NULL DEFAULT 0,
established_at TIMESTAMP WITH TIME ZONE NOT NULL,
last_heartbeat TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);
CREATE INDEX idx_connections_client ON active_client_connections(client_id);
CREATE INDEX idx_connections_gateway ON active_client_connections(gateway_id);
Scaling Challenges and Capacity Estimation
Stateful scaling introduces distinct low-level operating system constraints that cap capacity margins:
1. Linux Kernel File Descriptor Limits
By default, the Linux operating system caps the maximum open file descriptors per process to 1024, causing connections to fail immediately at scale.
- Mitigation: Edit
/etc/security/limits.confto set soft and hard limits:* soft nofile 1048576and* hard nofile 1048576, and configure the system variablessys.fs.file-max = 2097152insysctl.conf. This allocates enough room to handle the file descriptor quotas.
2. TCP Buffer Squeezing (Memory Depletion)
If the Linux kernel allocates large default read and write buffers per TCP socket, memory exhaustion occurs under high load.
- Mitigation: Tune the socket memory allocations in
/etc/sysctl.confto reduce the TCP buffer footprint down to 2KB boundaries. This ensures we don't hit memory barriers:net.ipv4.tcp_rmem = 2048 4096 8192 net.ipv4.tcp_wmem = 2048 4096 8192
3. Ephemeral Port Exhaustion
When gateway servers establish upstream connections or proxy requests to backend microservices, they consume local ephemeral ports. By default, Linux reserves ports 32768 to 60999 (28,231 ports), meaning a single gateway trying to establish more than 28,000 simultaneous outgoing connections to the same target IP address will crash due to port exhaustion.
- Mitigation: Edit
/etc/sysctl.confto expand the port range to the absolute maximum limits:net.ipv4.ip_local_port_range = 1024 65535, and configure TCP socket reuse rules:net.ipv4.tcp_tw_reuse = 1.
Architectural Trade-offs
Evaluating the optimal persistent connection pattern requires balancing performance limits:
| Technology | Client-to-Server Direction | Push Latency | Memory per Socket | Connection State |
|---|---|---|---|---|
| WebSocket | Supported (Full Duplex) | Low (Sub-5ms) | Medium (~30KB) | Stateful (Persistent TCP) |
| Server-Sent Events (SSE) | Not Supported (One Way) | Low (Sub-5ms) | Low (~8KB) | Stateful (Persistent HTTP) |
| HTTP Long Polling | Supported (via request cycles) | High (Depends on poll interval) | Extremely High | Stateless (Simulated) |
| WebTransport (HTTP/3) | Supported (Multiplexed UDP) | Incredibly Low | Low | Stateful (Multiplexed UDP stream) |
Trade-off Evaluation
- WebSocket vs. SSE: SSE runs over standard HTTP/2 streams. It is unidirectional (server-to-client only), but it has a much lower memory footprint (approximately 8KB per connection vs. 30KB for WebSockets). It also features built-in reconnection logic and passes easily through proxies and firewalls that often block raw WebSocket upgrades.
- WebSocket vs. WebTransport: WebTransport utilizes HTTP/3 over QUIC (UDP). This eliminates head-of-line blocking (where a single dropped packet stalls the entire connection) and reduces connection handshake latency to a single round-trip. However, WebTransport is a draft standard and lacks comprehensive client-library support across legacy web browsers.
Failure Scenarios and Resilience
Resilience configurations must prevent catastrophic cascades during gateway outages:
Scenario A: The Thundering Herd Storm
If a single gateway server holding 100,000 connections crashes, all 100,000 clients will immediately attempt to reconnect to the remaining nodes in the fleet. This will saturate CPU thread pools and knock out the rest of the fleet.
- Resiliency Mitigation: Enforce Exponential Backoff and Jitter at the client-side reconnection loop. Reconnection retries must be randomized over a wide window. The formula for the retry interval $t$ is: $$t = T_{\text{base}} \times 2^{\text{attempt}} \pm \text{RandomJitter}$$ By adding a wide random jitter, the clients spread their reconnection attempts over several minutes, allowing the load balancer to distribute the spike smoothly.
Scenario B: Redis Pub/Sub Bottlenecking
If every gateway node subscribes to a single global Redis channel to parse messages, the throughput becomes limited by the single-threaded CPU processing speed of the Redis node.
- Resiliency Mitigation: Group channels dynamically by partition. Map user routes to localized shards (e.g.
channel:ws-node-east-44), ensuring a gateway node only consumes messages addressed to its own connections.
Scenario C: Socket Leakage (Zombie Connections)
If clients drop off the network without cleanly closing their sockets (e.g., losing signal inside a tunnel), the gateway server keeps the TCP socket open indefinitely. This leaks memory and file descriptors.
- Resiliency Mitigation: Implement aggressive heartbeat checks. If a client fails to respond to a ping with a pong message within two heartbeat intervals (e.g., 60 seconds), the gateway server terminates the connection, closes the file descriptor, and releases the allocated buffer memory.
Staff Engineer Perspective
Verbal Script
Verbal Script: Scaling Persistent Sockets
Interviewer: "How would you design a push system to deliver real-time notifications to 10 million concurrent active users?"
Candidate: "To handle 10 million concurrent users, I would design a stateful, horizontally scalable WebSocket fleet consisting of approximately 100 gateway nodes, with each node managing up to 100,000 open TCP connections. To decouple the stateless application servers from this stateful fleet, I would deploy a Redis Pub/Sub backplane as a message broker. When an application server needs to send a notification to User A, it queries a Redis routing registry to identify which gateway node currently holds User A's socket, then publishes the message to a channel dedicated to that specific node."
Interviewer: "Excellent. How would you optimize the operating system on the gateway nodes to support 100,000 open TCP sockets?"
Candidate: "First, we must configure the Linux OS limits. I would set the process file descriptor limit to at least 1,000,000 in limits.conf. Second, to prevent memory depletion, we must tune the TCP socket buffers. Standard Linux TCP allocations can consume up to 80GB of RAM for 10M sockets. By modifying /etc/sysctl.conf to decrease the net.ipv4.tcp_rmem and net.ipv4.tcp_wmem limits down to a 2KB minimum baseline, we can reduce the global memory footprint down to 20--30GB, allowing standard servers to operate efficiently. Third, we expand the local port range to 1024-65535 to prevent ephemeral port exhaustion."
Interviewer: "What fails when a gateway node goes down, and how do we prevent the re-connection storm from bringing down the rest of our fleet?"
Candidate: "When a gateway node holding 100,000 sockets crashes, the clients will immediately disconnect and attempt to reconnect. If they reconnect simultaneously, this thundering herd will crash the remaining gateway nodes. To prevent this, we must enforce exponential backoff with randomized jitter on the client-side reconnection logic. This spreads the reconnection attempts over a 2 to 3 minute window. On the gateway side, the load balancer should rate-limit incoming handshake requests, shedding load to maintain cluster stability while the fleet autoscales."
