Mental Model
Connecting isolated components into a resilient, scalable, and observable distributed web.
JSON Web Tokens (JWTs) are the standard for stateless authentication. Because they are self-contained and cryptographically signed, any microservice can validate them locally without hitting a central database. However, this creates a critical security loophole: how do we revoke a compromised token before its expiration? If we query a database for every request, our architecture is no longer stateless. Solving this requires a hybrid, space-efficient gating architecture combining Bloom Filters at the API Gateway with a distributed Redis Cluster.
1. Functional & Non-Functional Requirements
To design a high-throughput token revocation system, we define these operational requirements:
Capacity Estimations (Sizing for 1 Million Revoked Tokens)
- The DB Bottleneck: Checking a database for 100,000 requests per second (RPS) would saturate database connection pools.
- Bloom Filter Sizing: We wish to store 1,000,000 revoked token signatures (JTI - JWT ID) in a local in-memory Bloom Filter with a false positive rate ($p$) of exactly 0.1% (0.001). The required bit array size ($m$) is calculated as: $$m = -\frac{n \ln(p)}{(\ln(2))^2} = -\frac{1,000,000 \times \ln(0.001)}{0.48045} \approx 14,377,587 \text{ bits} \approx 1.7\text{MB}$$
- Hash Functions: The optimal number of hash functions ($k$) is: $$k = \frac{m}{n} \ln(2) = 14.37 \times 0.693 \approx 10 \text{ hash functions}$$ Storing 1,000,000 revoked tokens takes only 1.7MB of RAM inside the API Gateway, with a P99 lookup latency of under 50 microseconds.
Non-Functional Requirements
- Lookup Latency SLA: Lookup times must remain under 100 microseconds for valid tokens.
- Instant Revocation propagation: A token revoked in Region A must be gated at all regional API Gateways within 2 seconds.
- Zero False Negatives: A revoked token must never be incorrectly classified as "valid" (Bloom filters guarantee zero false negatives).
2. Interface Design & APIs
To coordinate token revocations, the auth service and gateway expose explicit API payloads. Below is a structured JSON API contract defining the revocation payload, followed by the gateway lookup filter contract:
Revocation Command (Auth Service to Gateway Coordinator)
{
"event_id": "evt-880099-revocation",
"jti": "jti-uuid-998811-token",
"reason": "USER_LOGOUT",
"expires_at_epoch": 1716422400,
"revoked_at": "2026-05-23T10:00:00.123Z"
}
Gateway Filter Configuration (YAML)
gateway_jwt_filter:
bloom_filter:
capacity: 1000000
false_positive_rate: 0.001
sync_interval_ms: 1000
redis_cluster:
endpoints: ["redis-node-1.auth:6379", "redis-node-2.auth:6379"]
connection_timeout_ms: 50
fallback_allow_on_timeout: true
3. High-Level Design & Topology
The core of stateless blacklisting is Space-Efficient Gating.
1. The Stateful Revocation Paradox
In standard architectures, checking a central database for every incoming request removes the stateless advantages of JWTs, creating a massive single point of failure (SPOF) and adding database network latencies.
graph TD
Client[Client Request] -->|Validate JWT| GW[API Gateway]
GW -->|Saturating DB Network Hop| DB[(Central Session Database)]
DB -->|Validate state| GW
GW -->|Accept/Reject| API[Downstream API]
%% Style annotations
style DB fill:#ffebee,stroke:#c62828,stroke-width:2px;
2. In-Memory Bloom Filter + Redis Gating Architecture
With Bloom Filters, the API Gateway evaluates the incoming token locally.
- Case A (99.9% of requests): The Bloom Filter returns "not in set" (100% accurate). The gateway accepts the JWT instantly with zero network hops.
- Case B (False Positive or Revoked): The Bloom Filter returns "maybe in set". The gateway performs a fast, targeted query to the local Redis Cluster to verify.
sequenceDiagram
autonumber
participant Client
participant GW as API Gateway (In-Memory Bloom Filter)
participant Redis as Redis Cluster (State Store)
participant API as Downstream Service
Client->>GW: Request + Bearer JWT (JTI=X)
rect rgb(240, 248, 255)
Note over GW: Step 1: Query Local Bloom Filter
alt JTI not in Bloom Filter (100% Accurate)
GW->>API: Route Request (Stateless Speed)
else JTI maybe in Bloom Filter (0.1% False Positive Rate)
GW->>Redis: Step 2: Query Redis for JTI=X
Redis-->>GW: Returns JTI Found == TRUE (Token Revoked)
GW-->>Client: HTTP 401 Unauthorized (Blocked)
end
end
4. Low-Level Design & Data Models
Below is a production-ready, compilable Java class modeling the Fenced JWT Blacklist Validator. It implements a thread-safe local BitSet Bloom Filter combined with a Redis Client verification fallback path:
package com.codesprintpro.auth;
import java.nio.charset.StandardCharsets;
import java.util.BitSet;
import java.util.concurrent.locks.ReentrantReadWriteLock;
public class JwtBlacklistValidator {
private final BitSet bloomFilter;
private final int bitArraySize;
private final int numHashFunctions;
private final ReentrantReadWriteLock rwLock = new ReentrantReadWriteLock();
// Stub representational connection to Redis cluster
private final RedisClusterClientStub redisClient;
public JwtBlacklistValidator(int expectedElements, double falsePositiveRate, RedisClusterClientStub redisClient) {
this.bitArraySize = (int) (-expectedElements * Math.log(falsePositiveRate) / Math.pow(Math.log(2), 2));
this.numHashFunctions = (int) (Math.round((double) bitArraySize / expectedElements * Math.log(2)));
this.bloomFilter = new BitSet(bitArraySize);
this.redisClient = redisClient;
}
/**
* Registers a revoked JTI into the local Bloom Filter.
*/
public void registerRevocationLocal(String jti) {
this.rwLock.writeLock().lock();
try {
byte[] bytes = jti.getBytes(StandardCharsets.UTF_8);
for (int i = 0; i < this.numHashFunctions; i++) {
int hash = getMurmurHash(bytes, i) % this.bitArraySize;
this.bloomFilter.set(Math.abs(hash), true);
}
} finally {
this.rwLock.writeLock().unlock();
}
}
/**
* Checks if a JWT ID is valid. Employs space-efficient gating:
* returns true immediately if not in Bloom filter, falls back to Redis on match.
*/
public boolean isTokenRevoked(String jti) {
this.rwLock.readLock().lock();
boolean maybeInFilter = true;
try {
byte[] bytes = jti.getBytes(StandardCharsets.UTF_8);
for (int i = 0; i < this.numHashFunctions; i++) {
int hash = getMurmurHash(bytes, i) % this.bitArraySize;
if (!this.bloomFilter.get(Math.abs(hash))) {
maybeInFilter = false; // 100% accurate: not revoked
break;
}
}
} finally {
this.rwLock.readLock().unlock();
}
if (!maybeInFilter) {
return false; // Token is valid
}
// Fallback: Query Redis to verify the false positive
return this.redisClient.exists(jti);
}
private static int getMurmurHash(byte[] data, int seed) {
// Primitive MurmurHash3 32-bit placeholder for Java compilation
int h = seed;
for (byte b : data) {
h ^= b;
h *= 0x5bd1e995;
h ^= h >>> 15;
}
return h;
}
}
// Representational Stub for Compilation
class RedisClusterClientStub {
public boolean exists(String key) { return false; }
}
5. Scaling Bottlenecks & Mitigations
Gating stateless auth introduces distinct scale constraints:
1. Bloom Filter Saturation (False Positive Drift)
Over time, as more revoked JTIs are added to the Bloom Filter, the number of active bits reaches saturation. The false positive rate ($p$) climbs above the 0.1% SLA, causing the gateway to execute more Redis query hops, degrading performance.
- Mitigation: Implement Time-to-Live (TTL) Resets. Since JWTs have a fixed expiry date, we only need to keep a JTI in the blacklist until the token naturally expires. Periodically (e.g., daily) rebuild the Bloom Filter from scratch, ignoring events whose token expiry dates have already passed, keeping the filter clean.
2. Multi-Gateway Synchronizations
When a token is revoked, all gateway instances running in separate availability zones must synchronize their local in-memory Bloom Filters.
- Mitigation: Implement Redis Pub/Sub broadcasting. When a user logs out, the auth service writes the revocation to Redis and publishes the JTI to a global Pub/Sub channel. All gateway instances subscribe to this channel, updating their local Bloom Filters dynamically within milliseconds.
6. Strategic Trade-offs & Alternatives
Evaluating the session management pattern requires balancing scalability limits:
| Pattern | Read Latency | Database Pressure | Expiry Precision | Implementation Complexity |
|---|---|---|---|---|
| Stateless JWT (No Revocation) | Incredibly Low (Local verify) | Zero | Poor (Must wait for TTL expiration) | Very Low |
| Stateful DB Sessions | High (Read query per request) | Extremely High | Instant | Low |
| Redis Session Gating | Medium (5ms network hop) | High (Redis load scales with RPS) | Instant | Medium |
| Bloom Filter + Redis Gating | Incredibly Low (Microseconds) | Minimal (Redis only hit on false positives) | Instant | High (Requires synchronization) |
7. Failure Scenarios & Resiliency
Security gateways must gracefully degrade during infrastructure partitions:
Scenario A: Redis Cluster Outage
If the Redis cluster crashes, the gateway cannot resolve the 0.1% of requests that trigger Bloom Filter false positives, causing authentication errors.
- Resiliency Mitigation: Configure Safe-Fail Defaults. If the Redis check times out (e.g. timeout limit > 50ms), the gateway defaults to "Allow", relying on subsequent signature and expiration checks while alerting engineers via Prometheus.
Scenario B: Gateway Bootstrapping Lag
When a new gateway auto-scales online, its local in-memory Bloom Filter is empty, causing it to bypass checks or query Redis for all requests.
- Resiliency Mitigation: Implement Warm-Up Bootstrapping. During startup, the new gateway node queries the Redis cluster to download the active active-blacklist dataset, building its local Bloom Filter before accepting external HTTP traffic.
8. Staff Engineer Perspective
9. Mock Interview Dialogue
Verbal Interview Script
Interviewer: "If JSON Web Tokens are designed to be stateless, why is token revocation a major architectural challenge, and how would you design a revocation system at scale?"
Candidate: "JWTs are self-contained and validated using cryptographic signatures on the local node, eliminating database queries. However, this means once a JWT is issued, it is valid until its expiration date. If a user's token is stolen or they log out, we cannot invalidate it locally. Checking a central session database for every request removes the stateless speed benefit. To solve this, I would design a space-efficient gating architecture at the API Gateway using an in-memory Bloom Filter backed by a Redis Cluster as our source of truth."
Interviewer: "How does the Bloom Filter improve lookup performance under high workloads?"
Candidate: "A Bloom Filter has zero false negatives. If it returns 'not in set', we are 100% mathematically certain that the token has not been revoked. For 99.9% of our valid user traffic, the gateway processes the request locally in microseconds with zero network hops. If the Bloom Filter returns 'maybe in set' (which only occurs for our 0.1% false positive rate or actually revoked tokens), only then does the gateway query the Redis Cluster to verify. This reduces Redis lookup load by 99.9%, keeping read latencies low while guaranteeing instant session revocation."
