System DesignAdvancedguide

Stateless Auth: Managing JWT Blacklisting at Scale

The stateless JWT paradox. Learn how to design a highly scalable token revocation system using Bloom Filters, Redis Clusters, and gateway caching.

Sachin SarawgiApril 20, 20269 min read9 minute lesson

Reading Mode

Reduce distractions and widen the article focus for long-form reading.

Key Takeaways

What you will learn

**The Stateless Paradox:** JWTs are designed to be stateless, but instant revocation requires a stateful database check, destroying write-to-read speed advantages.

**Bloom Filter Gating:** Gating requests with an in-memory Bloom Filter at the API Gateway rejects 99%+ of valid tokens instantly with zero database network hops.

**Eventual Sync:** Revoked token IDs are broadcasted via Redis Pub/Sub to synchronize gateway Bloom Filters dynamically.

Mental Model

Connecting isolated components into a resilient, scalable, and observable distributed web.

JSON Web Tokens (JWTs) are the standard for stateless authentication. Because they are self-contained and cryptographically signed, any microservice can validate them locally without hitting a central database. However, this creates a critical security loophole: how do we revoke a compromised token before its expiration? If we query a database for every request, our architecture is no longer stateless. Solving this requires a hybrid, space-efficient gating architecture combining Bloom Filters at the API Gateway with a distributed Redis Cluster.


1. Functional & Non-Functional Requirements

To design a high-throughput token revocation system, we define these operational requirements:

Capacity Estimations (Sizing for 1 Million Revoked Tokens)

  • The DB Bottleneck: Checking a database for 100,000 requests per second (RPS) would saturate database connection pools.
  • Bloom Filter Sizing: We wish to store 1,000,000 revoked token signatures (JTI - JWT ID) in a local in-memory Bloom Filter with a false positive rate ($p$) of exactly 0.1% (0.001). The required bit array size ($m$) is calculated as: $$m = -\frac{n \ln(p)}{(\ln(2))^2} = -\frac{1,000,000 \times \ln(0.001)}{0.48045} \approx 14,377,587 \text{ bits} \approx 1.7\text{MB}$$
  • Hash Functions: The optimal number of hash functions ($k$) is: $$k = \frac{m}{n} \ln(2) = 14.37 \times 0.693 \approx 10 \text{ hash functions}$$ Storing 1,000,000 revoked tokens takes only 1.7MB of RAM inside the API Gateway, with a P99 lookup latency of under 50 microseconds.

Non-Functional Requirements

  • Lookup Latency SLA: Lookup times must remain under 100 microseconds for valid tokens.
  • Instant Revocation propagation: A token revoked in Region A must be gated at all regional API Gateways within 2 seconds.
  • Zero False Negatives: A revoked token must never be incorrectly classified as "valid" (Bloom filters guarantee zero false negatives).

2. Interface Design & APIs

To coordinate token revocations, the auth service and gateway expose explicit API payloads. Below is a structured JSON API contract defining the revocation payload, followed by the gateway lookup filter contract:

Revocation Command (Auth Service to Gateway Coordinator)

{
  "event_id": "evt-880099-revocation",
  "jti": "jti-uuid-998811-token",
  "reason": "USER_LOGOUT",
  "expires_at_epoch": 1716422400,
  "revoked_at": "2026-05-23T10:00:00.123Z"
}

Gateway Filter Configuration (YAML)

gateway_jwt_filter:
  bloom_filter:
    capacity: 1000000
    false_positive_rate: 0.001
    sync_interval_ms: 1000
  redis_cluster:
    endpoints: ["redis-node-1.auth:6379", "redis-node-2.auth:6379"]
    connection_timeout_ms: 50
    fallback_allow_on_timeout: true

3. High-Level Design & Topology

The core of stateless blacklisting is Space-Efficient Gating.

1. The Stateful Revocation Paradox

In standard architectures, checking a central database for every incoming request removes the stateless advantages of JWTs, creating a massive single point of failure (SPOF) and adding database network latencies.

graph TD
    Client[Client Request] -->|Validate JWT| GW[API Gateway]
    GW -->|Saturating DB Network Hop| DB[(Central Session Database)]
    DB -->|Validate state| GW
    GW -->|Accept/Reject| API[Downstream API]
    
    %% Style annotations
    style DB fill:#ffebee,stroke:#c62828,stroke-width:2px;

2. In-Memory Bloom Filter + Redis Gating Architecture

With Bloom Filters, the API Gateway evaluates the incoming token locally.

  • Case A (99.9% of requests): The Bloom Filter returns "not in set" (100% accurate). The gateway accepts the JWT instantly with zero network hops.
  • Case B (False Positive or Revoked): The Bloom Filter returns "maybe in set". The gateway performs a fast, targeted query to the local Redis Cluster to verify.
sequenceDiagram
    autonumber
    participant Client
    participant GW as API Gateway (In-Memory Bloom Filter)
    participant Redis as Redis Cluster (State Store)
    participant API as Downstream Service

    Client->>GW: Request + Bearer JWT (JTI=X)
    
    rect rgb(240, 248, 255)
        Note over GW: Step 1: Query Local Bloom Filter
        alt JTI not in Bloom Filter (100% Accurate)
            GW->>API: Route Request (Stateless Speed)
        else JTI maybe in Bloom Filter (0.1% False Positive Rate)
            GW->>Redis: Step 2: Query Redis for JTI=X
            Redis-->>GW: Returns JTI Found == TRUE (Token Revoked)
            GW-->>Client: HTTP 401 Unauthorized (Blocked)
        end
    end

4. Low-Level Design & Data Models

Below is a production-ready, compilable Java class modeling the Fenced JWT Blacklist Validator. It implements a thread-safe local BitSet Bloom Filter combined with a Redis Client verification fallback path:

package com.codesprintpro.auth;

import java.nio.charset.StandardCharsets;
import java.util.BitSet;
import java.util.concurrent.locks.ReentrantReadWriteLock;

public class JwtBlacklistValidator {
    private final BitSet bloomFilter;
    private final int bitArraySize;
    private final int numHashFunctions;
    private final ReentrantReadWriteLock rwLock = new ReentrantReadWriteLock();
    
    // Stub representational connection to Redis cluster
    private final RedisClusterClientStub redisClient;

    public JwtBlacklistValidator(int expectedElements, double falsePositiveRate, RedisClusterClientStub redisClient) {
        this.bitArraySize = (int) (-expectedElements * Math.log(falsePositiveRate) / Math.pow(Math.log(2), 2));
        this.numHashFunctions = (int) (Math.round((double) bitArraySize / expectedElements * Math.log(2)));
        this.bloomFilter = new BitSet(bitArraySize);
        this.redisClient = redisClient;
    }

    /**
     * Registers a revoked JTI into the local Bloom Filter.
     */
    public void registerRevocationLocal(String jti) {
        this.rwLock.writeLock().lock();
        try {
            byte[] bytes = jti.getBytes(StandardCharsets.UTF_8);
            for (int i = 0; i < this.numHashFunctions; i++) {
                int hash = getMurmurHash(bytes, i) % this.bitArraySize;
                this.bloomFilter.set(Math.abs(hash), true);
            }
        } finally {
            this.rwLock.writeLock().unlock();
        }
    }

    /**
     * Checks if a JWT ID is valid. Employs space-efficient gating:
     * returns true immediately if not in Bloom filter, falls back to Redis on match.
     */
    public boolean isTokenRevoked(String jti) {
        this.rwLock.readLock().lock();
        boolean maybeInFilter = true;
        try {
            byte[] bytes = jti.getBytes(StandardCharsets.UTF_8);
            for (int i = 0; i < this.numHashFunctions; i++) {
                int hash = getMurmurHash(bytes, i) % this.bitArraySize;
                if (!this.bloomFilter.get(Math.abs(hash))) {
                    maybeInFilter = false; // 100% accurate: not revoked
                    break;
                }
            }
        } finally {
            this.rwLock.readLock().unlock();
        }

        if (!maybeInFilter) {
            return false; // Token is valid
        }

        // Fallback: Query Redis to verify the false positive
        return this.redisClient.exists(jti);
    }

    private static int getMurmurHash(byte[] data, int seed) {
        // Primitive MurmurHash3 32-bit placeholder for Java compilation
        int h = seed;
        for (byte b : data) {
            h ^= b;
            h *= 0x5bd1e995;
            h ^= h >>> 15;
        }
        return h;
    }
}

// Representational Stub for Compilation
class RedisClusterClientStub {
    public boolean exists(String key) { return false; }
}

5. Scaling Bottlenecks & Mitigations

Gating stateless auth introduces distinct scale constraints:

1. Bloom Filter Saturation (False Positive Drift)

Over time, as more revoked JTIs are added to the Bloom Filter, the number of active bits reaches saturation. The false positive rate ($p$) climbs above the 0.1% SLA, causing the gateway to execute more Redis query hops, degrading performance.

  • Mitigation: Implement Time-to-Live (TTL) Resets. Since JWTs have a fixed expiry date, we only need to keep a JTI in the blacklist until the token naturally expires. Periodically (e.g., daily) rebuild the Bloom Filter from scratch, ignoring events whose token expiry dates have already passed, keeping the filter clean.

2. Multi-Gateway Synchronizations

When a token is revoked, all gateway instances running in separate availability zones must synchronize their local in-memory Bloom Filters.

  • Mitigation: Implement Redis Pub/Sub broadcasting. When a user logs out, the auth service writes the revocation to Redis and publishes the JTI to a global Pub/Sub channel. All gateway instances subscribe to this channel, updating their local Bloom Filters dynamically within milliseconds.

6. Strategic Trade-offs & Alternatives

Evaluating the session management pattern requires balancing scalability limits:

Pattern Read Latency Database Pressure Expiry Precision Implementation Complexity
Stateless JWT (No Revocation) Incredibly Low (Local verify) Zero Poor (Must wait for TTL expiration) Very Low
Stateful DB Sessions High (Read query per request) Extremely High Instant Low
Redis Session Gating Medium (5ms network hop) High (Redis load scales with RPS) Instant Medium
Bloom Filter + Redis Gating Incredibly Low (Microseconds) Minimal (Redis only hit on false positives) Instant High (Requires synchronization)

7. Failure Scenarios & Resiliency

Security gateways must gracefully degrade during infrastructure partitions:

Scenario A: Redis Cluster Outage

If the Redis cluster crashes, the gateway cannot resolve the 0.1% of requests that trigger Bloom Filter false positives, causing authentication errors.

  • Resiliency Mitigation: Configure Safe-Fail Defaults. If the Redis check times out (e.g. timeout limit > 50ms), the gateway defaults to "Allow", relying on subsequent signature and expiration checks while alerting engineers via Prometheus.

Scenario B: Gateway Bootstrapping Lag

When a new gateway auto-scales online, its local in-memory Bloom Filter is empty, causing it to bypass checks or query Redis for all requests.

  • Resiliency Mitigation: Implement Warm-Up Bootstrapping. During startup, the new gateway node queries the Redis cluster to download the active active-blacklist dataset, building its local Bloom Filter before accepting external HTTP traffic.

8. Staff Engineer Perspective


9. Mock Interview Dialogue

Verbal Interview Script

Interviewer: "If JSON Web Tokens are designed to be stateless, why is token revocation a major architectural challenge, and how would you design a revocation system at scale?"

Candidate: "JWTs are self-contained and validated using cryptographic signatures on the local node, eliminating database queries. However, this means once a JWT is issued, it is valid until its expiration date. If a user's token is stolen or they log out, we cannot invalidate it locally. Checking a central session database for every request removes the stateless speed benefit. To solve this, I would design a space-efficient gating architecture at the API Gateway using an in-memory Bloom Filter backed by a Redis Cluster as our source of truth."

Interviewer: "How does the Bloom Filter improve lookup performance under high workloads?"

Candidate: "A Bloom Filter has zero false negatives. If it returns 'not in set', we are 100% mathematically certain that the token has not been revoked. For 99.9% of our valid user traffic, the gateway processes the request locally in microseconds with zero network hops. If the Bloom Filter returns 'maybe in set' (which only occurs for our 0.1% false positive rate or actually revoked tokens), only then does the gateway query the Redis Cluster to verify. This reduces Redis lookup load by 99.9%, keeping read latencies low while guaranteeing instant session revocation."

Practical engineering notes

Get the next backend guide in your inbox

One useful note when a new deep dive is published: system design tradeoffs, Java production lessons, Kafka debugging, database patterns, and AI infrastructure.

No spam. Just practical notes you can use at work.

Sachin Sarawgi

Written by

Sachin Sarawgi

Engineering Manager and backend engineer with 10+ years building distributed systems across fintech, enterprise SaaS, and startups. CodeSprintPro is where I write practical guides on system design, Java, Kafka, databases, AI infrastructure, and production reliability.

Keep Learning

Move through the archive without losing the thread.

Related Articles

More deep dives chosen from shared tags, category overlap, and reading difficulty.

System DesignAdvanced

API Pagination at Scale: Why OFFSET 100,000 is a Database Killer

Designing a paginated API seems simple. Standard frameworks make it trivial: just use LIMIT 20 OFFSET 100. This works perfectly during development and for the first few pages of small tables. However, once your data scal…

Apr 20, 202611 min read
Deep DiveBackend Systems Mastery
#databases#java#performance
System DesignAdvanced

Event-Driven Architecture: CQRS and Event Sourcing in Practice

Mental Model In traditional CRUD (Create, Read, Update, Delete) architectures, the same database model is used for both writing and reading data. Under high traffic, this creates locking contention and complex SQL joins…

Apr 20, 202610 min read
Deep Dive
#performance#system-design
System DesignAdvanced

Bypassing the Kernel: User-Space Networking for Sub-Microsecond Performance

Mental Model For ultra-low-latency distributed systems—such as high-frequency trading (HFT) matching engines, real-time telemetry filters, and high-performance packet routers—even the optimized Linux kernel is too slow.…

Apr 20, 202611 min read
Deep DivePerformance & Optimization Mastery
#performance#system-design
System DesignAdvanced

HyperLogLog at Scale: Billion-Cardinality Estimation

Mental Model > Connecting isolated components into a resilient, scalable, and observable distributed web. Counting unique items (such as Daily Active Users - DAUs, unique page views, or IP addresses) is a classic problem…

Apr 20, 202614 min read
Deep Dive
#performance#system-design

More in System Design

Category-based suggestions if you want to stay in the same domain.