Stateless Auth: Managing JWT Blacklisting at Scale

Mental Model

Connecting isolated components into a resilient, scalable, and observable distributed web.

JSON Web Tokens (JWTs) are the standard for stateless authentication. Because they are self-contained and cryptographically signed, any microservice can validate them locally without hitting a central database. However, this creates a critical security loophole: how do we revoke a compromised token before its expiration? If we query a database for every request, our architecture is no longer stateless. Solving this requires a hybrid, space-efficient gating architecture combining Bloom Filters at the API Gateway with a distributed Redis Cluster.

1. Functional & Non-Functional Requirements

To design a high-throughput token revocation system, we define these operational requirements:

Capacity Estimations (Sizing for 1 Million Revoked Tokens)

The DB Bottleneck: Checking a database for 100,000 requests per second (RPS) would saturate database connection pools.
Bloom Filter Sizing: We wish to store 1,000,000 revoked token signatures (JTI - JWT ID) in a local in-memory Bloom Filter with a false positive rate ($p$) of exactly 0.1% (0.001). The required bit array size ($m$) is calculated as: $$m = -\frac{n \ln(p)}{(\ln(2))^2} = -\frac{1,000,000 \times \ln(0.001)}{0.48045} \approx 14,377,587 \text{ bits} \approx 1.7\text{MB}$$
Hash Functions: The optimal number of hash functions ($k$) is: $$k = \frac{m}{n} \ln(2) = 14.37 \times 0.693 \approx 10 \text{ hash functions}$$ Storing 1,000,000 revoked tokens takes only 1.7MB of RAM inside the API Gateway, with a P99 lookup latency of under 50 microseconds.

Non-Functional Requirements

Lookup Latency SLA: Lookup times must remain under 100 microseconds for valid tokens.
Instant Revocation propagation: A token revoked in Region A must be gated at all regional API Gateways within 2 seconds.
Zero False Negatives: A revoked token must never be incorrectly classified as "valid" (Bloom filters guarantee zero false negatives).

2. Interface Design & APIs

To coordinate token revocations, the auth service and gateway expose explicit API payloads. Below is a structured JSON API contract defining the revocation payload, followed by the gateway lookup filter contract:

Revocation Command (Auth Service to Gateway Coordinator)

{
  "event_id": "evt-880099-revocation",
  "jti": "jti-uuid-998811-token",
  "reason": "USER_LOGOUT",
  "expires_at_epoch": 1716422400,
  "revoked_at": "2026-05-23T10:00:00.123Z"
}

Gateway Filter Configuration (YAML)

gateway_jwt_filter:
  bloom_filter:
    capacity: 1000000
    false_positive_rate: 0.001
    sync_interval_ms: 1000
  redis_cluster:
    endpoints: ["redis-node-1.auth:6379", "redis-node-2.auth:6379"]
    connection_timeout_ms: 50
    fallback_allow_on_timeout: true

3. High-Level Design & Topology

The core of stateless blacklisting is Space-Efficient Gating.

1. The Stateful Revocation Paradox

In standard architectures, checking a central database for every incoming request removes the stateless advantages of JWTs, creating a massive single point of failure (SPOF) and adding database network latencies.

graph TD
    Client[Client Request] -->|Validate JWT| GW[API Gateway]
    GW -->|Saturating DB Network Hop| DB[(Central Session Database)]
    DB -->|Validate state| GW
    GW -->|Accept/Reject| API[Downstream API]
    
    %% Style annotations
    style DB fill:#ffebee,stroke:#c62828,stroke-width:2px;

2. In-Memory Bloom Filter + Redis Gating Architecture

With Bloom Filters, the API Gateway evaluates the incoming token locally.

Case A (99.9% of requests): The Bloom Filter returns "not in set" (100% accurate). The gateway accepts the JWT instantly with zero network hops.
Case B (False Positive or Revoked): The Bloom Filter returns "maybe in set". The gateway performs a fast, targeted query to the local Redis Cluster to verify.

sequenceDiagram
    autonumber
    participant Client
    participant GW as API Gateway (In-Memory Bloom Filter)
    participant Redis as Redis Cluster (State Store)
    participant API as Downstream Service

    Client->>GW: Request + Bearer JWT (JTI=X)
    
    rect rgb(240, 248, 255)
        Note over GW: Step 1: Query Local Bloom Filter
        alt JTI not in Bloom Filter (100% Accurate)
            GW->>API: Route Request (Stateless Speed)
        else JTI maybe in Bloom Filter (0.1% False Positive Rate)
            GW->>Redis: Step 2: Query Redis for JTI=X
            Redis-->>GW: Returns JTI Found == TRUE (Token Revoked)
            GW-->>Client: HTTP 401 Unauthorized (Blocked)
        end
    end

4. Low-Level Design & Data Models

Below is a production-ready, compilable Java class modeling the Fenced JWT Blacklist Validator. It implements a thread-safe local BitSet Bloom Filter combined with a Redis Client verification fallback path:

package com.codesprintpro.auth;

import java.nio.charset.StandardCharsets;
import java.util.BitSet;
import java.util.concurrent.locks.ReentrantReadWriteLock;

public class JwtBlacklistValidator {
    private final BitSet bloomFilter;
    private final int bitArraySize;
    private final int numHashFunctions;
    private final ReentrantReadWriteLock rwLock = new ReentrantReadWriteLock();
    
    // Stub representational connection to Redis cluster
    private final RedisClusterClientStub redisClient;

    public JwtBlacklistValidator(int expectedElements, double falsePositiveRate, RedisClusterClientStub redisClient) {
        this.bitArraySize = (int) (-expectedElements * Math.log(falsePositiveRate) / Math.pow(Math.log(2), 2));
        this.numHashFunctions = (int) (Math.round((double) bitArraySize / expectedElements * Math.log(2)));
        this.bloomFilter = new BitSet(bitArraySize);
        this.redisClient = redisClient;
    }

    /**
     * Registers a revoked JTI into the local Bloom Filter.
     */
    public void registerRevocationLocal(String jti) {
        this.rwLock.writeLock().lock();
        try {
            byte[] bytes = jti.getBytes(StandardCharsets.UTF_8);
            for (int i = 0; i < this.numHashFunctions; i++) {
                int hash = getMurmurHash(bytes, i) % this.bitArraySize;
                this.bloomFilter.set(Math.abs(hash), true);
            }
        } finally {
            this.rwLock.writeLock().unlock();
        }
    }

    /**
     * Checks if a JWT ID is valid. Employs space-efficient gating:
     * returns true immediately if not in Bloom filter, falls back to Redis on match.
     */
    public boolean isTokenRevoked(String jti) {
        this.rwLock.readLock().lock();
        boolean maybeInFilter = true;
        try {
            byte[] bytes = jti.getBytes(StandardCharsets.UTF_8);
            for (int i = 0; i < this.numHashFunctions; i++) {
                int hash = getMurmurHash(bytes, i) % this.bitArraySize;
                if (!this.bloomFilter.get(Math.abs(hash))) {
                    maybeInFilter = false; // 100% accurate: not revoked
                    break;
                }
            }
        } finally {
            this.rwLock.readLock().unlock();
        }

        if (!maybeInFilter) {
            return false; // Token is valid
        }

        // Fallback: Query Redis to verify the false positive
        return this.redisClient.exists(jti);
    }

    private static int getMurmurHash(byte[] data, int seed) {
        // Primitive MurmurHash3 32-bit placeholder for Java compilation
        int h = seed;
        for (byte b : data) {
            h ^= b;
            h *= 0x5bd1e995;
            h ^= h >>> 15;
        }
        return h;
    }
}

// Representational Stub for Compilation
class RedisClusterClientStub {
    public boolean exists(String key) { return false; }
}

5. Scaling Bottlenecks & Mitigations

Gating stateless auth introduces distinct scale constraints:

1. Bloom Filter Saturation (False Positive Drift)

Over time, as more revoked JTIs are added to the Bloom Filter, the number of active bits reaches saturation. The false positive rate ($p$) climbs above the 0.1% SLA, causing the gateway to execute more Redis query hops, degrading performance.

Mitigation: Implement Time-to-Live (TTL) Resets. Since JWTs have a fixed expiry date, we only need to keep a JTI in the blacklist until the token naturally expires. Periodically (e.g., daily) rebuild the Bloom Filter from scratch, ignoring events whose token expiry dates have already passed, keeping the filter clean.

2. Multi-Gateway Synchronizations

When a token is revoked, all gateway instances running in separate availability zones must synchronize their local in-memory Bloom Filters.

Mitigation: Implement Redis Pub/Sub broadcasting. When a user logs out, the auth service writes the revocation to Redis and publishes the JTI to a global Pub/Sub channel. All gateway instances subscribe to this channel, updating their local Bloom Filters dynamically within milliseconds.

6. Strategic Trade-offs & Alternatives

Evaluating the session management pattern requires balancing scalability limits:

Pattern	Read Latency	Database Pressure	Expiry Precision	Implementation Complexity
Stateless JWT (No Revocation)	Incredibly Low (Local verify)	Zero	Poor (Must wait for TTL expiration)	Very Low
Stateful DB Sessions	High (Read query per request)	Extremely High	Instant	Low
Redis Session Gating	Medium (5ms network hop)	High (Redis load scales with RPS)	Instant	Medium
Bloom Filter + Redis Gating	Incredibly Low (Microseconds)	Minimal (Redis only hit on false positives)	Instant	High (Requires synchronization)

7. Failure Scenarios & Resiliency

Security gateways must gracefully degrade during infrastructure partitions:

Scenario A: Redis Cluster Outage

If the Redis cluster crashes, the gateway cannot resolve the 0.1% of requests that trigger Bloom Filter false positives, causing authentication errors.

Resiliency Mitigation: Configure Safe-Fail Defaults. If the Redis check times out (e.g. timeout limit > 50ms), the gateway defaults to "Allow", relying on subsequent signature and expiration checks while alerting engineers via Prometheus.

Scenario B: Gateway Bootstrapping Lag

When a new gateway auto-scales online, its local in-memory Bloom Filter is empty, causing it to bypass checks or query Redis for all requests.

Resiliency Mitigation: Implement Warm-Up Bootstrapping. During startup, the new gateway node queries the Redis cluster to download the active active-blacklist dataset, building its local Bloom Filter before accepting external HTTP traffic.

8. Staff Engineer Perspective

9. Mock Interview Dialogue

Verbal Interview Script

Interviewer: "If JSON Web Tokens are designed to be stateless, why is token revocation a major architectural challenge, and how would you design a revocation system at scale?"

Candidate: "JWTs are self-contained and validated using cryptographic signatures on the local node, eliminating database queries. However, this means once a JWT is issued, it is valid until its expiration date. If a user's token is stolen or they log out, we cannot invalidate it locally. Checking a central session database for every request removes the stateless speed benefit. To solve this, I would design a space-efficient gating architecture at the API Gateway using an in-memory Bloom Filter backed by a Redis Cluster as our source of truth."

Interviewer: "How does the Bloom Filter improve lookup performance under high workloads?"

Candidate: "A Bloom Filter has zero false negatives. If it returns 'not in set', we are 100% mathematically certain that the token has not been revoked. For 99.9% of our valid user traffic, the gateway processes the request locally in microseconds with zero network hops. If the Bloom Filter returns 'maybe in set' (which only occurs for our 0.1% false positive rate or actually revoked tokens), only then does the gateway query the Redis Cluster to verify. This reduces Redis lookup load by 99.9%, keeping read latencies low while guaranteeing instant session revocation."

Stateless Auth: Managing JWT Blacklisting at Scale

What you will learn

Mental Model

1. Functional & Non-Functional Requirements

Capacity Estimations (Sizing for 1 Million Revoked Tokens)

Non-Functional Requirements

2. Interface Design & APIs

Revocation Command (Auth Service to Gateway Coordinator)

Gateway Filter Configuration (YAML)

3. High-Level Design & Topology

1. The Stateful Revocation Paradox

2. In-Memory Bloom Filter + Redis Gating Architecture

4. Low-Level Design & Data Models

5. Scaling Bottlenecks & Mitigations

1. Bloom Filter Saturation (False Positive Drift)

2. Multi-Gateway Synchronizations

6. Strategic Trade-offs & Alternatives

7. Failure Scenarios & Resiliency

Scenario A: Redis Cluster Outage

Scenario B: Gateway Bootstrapping Lag

8. Staff Engineer Perspective

9. Mock Interview Dialogue

Verbal Interview Script

Sachin Sarawgi

Keep Learning

B-Trees vs. LSM-Trees: The Battle of Storage Engine Internals

Speculative Retries: The Google Approach to Solving Tail Latency

Related Articles

API Pagination at Scale: Why OFFSET 100,000 is a Database Killer

Event-Driven Architecture: CQRS and Event Sourcing in Practice

Bypassing the Kernel: User-Space Networking for Sub-Microsecond Performance

HyperLogLog at Scale: Billion-Cardinality Estimation

More in System Design

LLD Mastery: The Factory Design Pattern

LLD Mastery: The Singleton Design Pattern

LLD Mastery: The Strategy Design Pattern

Stateless Auth: Managing JWT Blacklisting at Scale

What you will learn

Mental Model

1. Functional & Non-Functional Requirements

Capacity Estimations (Sizing for 1 Million Revoked Tokens)

Non-Functional Requirements

2. Interface Design & APIs

Revocation Command (Auth Service to Gateway Coordinator)

Gateway Filter Configuration (YAML)

3. High-Level Design & Topology

1. The Stateful Revocation Paradox

2. In-Memory Bloom Filter + Redis Gating Architecture

4. Low-Level Design & Data Models

5. Scaling Bottlenecks & Mitigations

1. Bloom Filter Saturation (False Positive Drift)

2. Multi-Gateway Synchronizations

6. Strategic Trade-offs & Alternatives

7. Failure Scenarios & Resiliency

Scenario A: Redis Cluster Outage

Scenario B: Gateway Bootstrapping Lag

8. Staff Engineer Perspective

9. Mock Interview Dialogue

Verbal Interview Script

Get the next backend guide in your inbox

Sachin Sarawgi

Keep Learning

B-Trees vs. LSM-Trees: The Battle of Storage Engine Internals

Speculative Retries: The Google Approach to Solving Tail Latency

Related Articles

API Pagination at Scale: Why OFFSET 100,000 is a Database Killer

Event-Driven Architecture: CQRS and Event Sourcing in Practice

Bypassing the Kernel: User-Space Networking for Sub-Microsecond Performance

HyperLogLog at Scale: Billion-Cardinality Estimation

More in System Design

LLD Mastery: The Factory Design Pattern

LLD Mastery: The Singleton Design Pattern

LLD Mastery: The Strategy Design Pattern