1. Core Requirements & Scale Constraints
In a large microservice ecosystem, every single incoming API request must be authorized before it can access sensitive resources. While Authentication (AuthN) verifies who you are (typically completed once at the API Gateway via JWT verification), Authorization (AuthZ) determines what you are allowed to do.
Designing a centralized, enterprise-grade Authorization Service requires scaling to billions of resource-permission tuples while operating directly in the synchronous path of every system request.
Request Flow:
Client ---> API Gateway ---> Product Service ---> [Synchronous gRPC AuthZ Check] ---> AuthZ Service
|
Evaluates relations
against Tuple Store
Functional Requirements
- Permission Checking: Expose a high-performance endpoint:
Check(user_id, resource_id, relation)returning a simple boolean flag (ALLOWED/DENIED). - Policy Models: Support multiple paradigm models:
- RBAC (Role-Based Access Control): Permissions linked to user roles (e.g.
Role: EditorcanwritetoDocument). - ABAC (Attribute-Based Access Control): Permissions based on dynamic context attributes (e.g.
Allowonly ifrequest_ipis within the corporate subnet). - ReBAC (Relationship-Based Access Control): Permissions derived from graph relationships (e.g.
User Ais theeditorofFolder Xwhich containsDocument Y, thereforeUser Acan editDocument Y).
- RBAC (Role-Based Access Control): Permissions linked to user roles (e.g.
- Tuple Writing: Expose APIs to add, remove, and list permission tuples (e.g., granting user membership to a group).
Non-Functional SLAs
- Sub-Millisecond Evaluation Latency: Permissions checks must operate with $p95 \le 2\text{ ms}$ and $p99 \le 5\text{ ms}$. Since this service is queried on virtually every microservice request, any latency here degrades the entire platform.
- Ultra-High Availability: Maintain "Five Nines" ($99.999%$) availability. An outage in the Authorization Service blocks access to all downstream product lines.
- Causal Consistency: Protect against the "New Enemy" problem (preventing a user from reading a file using stale cached data if their permission was revoked in a separate concurrent transaction).
Back-of-the-Envelope Estimates
- Scale Metrics:
- Total Users: 100 Million.
- Total Managed Objects: 10 Billion.
- Permission Relation Tuples: 50 Billion.
- Request QPS:
- Standard user action rate: Average 50,000 product API requests per second.
- If each API request generates an average of 2 internal authorization checks (due to nested resources):
- Average Check Throughput: $100,000\text{ queries per second (QPS)}$.
- Peak Check Throughput (3x multiplier): $300,000\text{ QPS}$.
- Write Throughput (Tuple updates):
- Average: $200\text{ writes/sec}$.
- Peak: $1,000\text{ writes/sec}$.
- Cache Memory Estimation:
- We cache active user permissions at the edge. Storing permissions for 10 Million monthly active users.
- Assuming an average of 100 permission nodes per user: $1\text{ Billion cached nodes}$.
- Average node key-value payload size (compressed): 120 bytes.
- Total Memory footprint: $1\text{ Billion} \times 120\text{ bytes} \approx 120\text{ Gigabytes}$ of distributed RAM.
2. API Design & Core Contracts
Due to strict latency requirements, the internal authorization check API must be exposed via high-performance binary gRPC/protobuf protocols rather than JSON-over-HTTP.
Protobuf Service Contract (authz.proto)
syntax = "proto3";
package codesprintpro.authz;
service AuthorizationService {
// Low-latency permission evaluation
rpc CheckPermission(CheckRequest) returns (CheckResponse);
// Create or delete relationship tuples
rpc WriteTuple(WriteTupleRequest) returns (WriteTupleResponse);
}
message CheckRequest {
string user_id = 1;
string namespace = 2; // e.g. "document", "repository"
string object_id = 3; // e.g. "doc_99018"
string relation = 4; // e.g. "viewer", "editor", "owner"
string zookie = 5; // Cryptographic consistency token
}
message CheckResponse {
bool allowed = 1;
string zookie = 2; // Updated consistency token
}
message WriteTupleRequest {
enum Operation {
OP_INSERT = 0;
OP_DELETE = 1;
}
Operation op = 1;
string namespace = 2;
string object_id = 3;
string relation = 4;
string subject_id = 5; // Can be a specific user_id or a nested relation (e.g. "group:admin#member")
}
message WriteTupleResponse {
string zookie = 1; // Commit zookie token
}
Sample HTTP/REST Proxy Representation
For external integrations, a REST Gateway can convert JSON payloads into protobuf calls.
POST /v1/authz/check
Content-Type: application/json
{
"user_id": "usr_889231",
"namespace": "document",
"object_id": "doc_99018",
"relation": "editor",
"zookie": "1c7b8d0a:0000028a"
}
3. High-Level Design (HLD)
The authorization architecture separates the Policy Administration Point (PAP), which registers metadata tuples, from the highly optimized Policy Decision Point (PDP), which evaluates permissions at sub-millisecond speeds.
graph TD
%% Actors
Client[Client App]
API_Gateway[API Gateway]
ProductService[Product Microservice]
Client -->|API Call| API_Gateway
API_Gateway --> ProductService
%% Microservice to Sidecar Cache
subgraph MicroserviceNode ["Product Microservice Host Node"]
ProductService -->|1. Check Local Cache| SidecarCache[Local Memory Cache LRU]
end
%% High-Latency Failover
ProductService -->|2. Cache Miss gRPC Check| AuthZEngine[Central AuthZ Evaluation Engine]
%% HLD Internals
subgraph CentralAuthZ ["Central AuthZ Clustering"]
AuthZEngine
EvaluationCache[(Redis Cluster Cache)]
AuthZEngine -->|Read/Write Graph Paths| EvaluationCache
end
%% Storage Layer
subgraph DatabaseLayer ["Globally Replicated Database Layer"]
CockroachDB[(CockroachDB Globally Replicated)]
end
AuthZEngine -->|Read Master Truth| CockroachDB
%% Policy Administration Point
subgraph PAP ["Policy Administration Point"]
AdminDashboard[Admin Policy Portal]
TupleWriter[Tuple Writer Service]
AdminDashboard --> TupleWriter
TupleWriter -->|Write Atomic Tuples| CockroachDB
end
%% Event Bus & Logging
Kafka{Apache Kafka Event Bus}
AuthZEngine -->|Publish Audit Log| Kafka
Kafka --> SecurityAnalytics[SIEM / Compliance Logging]
Request Evaluation Cycle
- Sidecar Cache Lookup: The product service first queries a local in-memory LRU cache inside its own host process. This handles $90%$ of standard check operations in under $0.5\text{ ms}$.
- Central Engine Query: If a cache miss occurs, the sidecar triggers a multiplexed gRPC check to the central AuthZ Engine.
- Graph Evaluation: The engine queries a distributed Redis Evaluation Cache to check for pre-resolved relationship subgraphs. If missing, it executes a recursive query on the master database.
- Master Database: Relationship tuples are stored in a highly consistent database (e.g., CockroachDB) that replicates data globally across regions using Raft consensus, ensuring serializable transactions.
4. Low-Level Design (LLD) & Data Models
Relationship-Based Access Control (ReBAC) Modeling
Permissions are modeled as directed relation tuples representing a graph:
object#relation@subject
- Example:
document:doc_abc#editor@user_123(User 123 is an editor of doc_abc). - Example:
document:doc_abc#viewer@group:marketing#member(All members of the marketing group are viewers of doc_abc).
Master Database Schema (CockroachDB / Spanner)
-- Namespace definitions (e.g. document, group, folder)
CREATE TABLE namespaces (
namespace_name VARCHAR(100) PRIMARY KEY,
config JSONB NOT NULL, -- Defines schema relations and path evaluations
created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);
-- Core Relationship Tuples (The source of truth)
CREATE TABLE relation_tuples (
namespace VARCHAR(100) NOT NULL REFERENCES namespaces(namespace_name),
object_id VARCHAR(255) NOT NULL,
relation VARCHAR(100) NOT NULL,
subject_namespace VARCHAR(100) NOT NULL,
subject_id VARCHAR(255) NOT NULL,
subject_relation VARCHAR(100), -- Optional (e.g., "member" if referencing a group)
tx_timestamp TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (namespace, object_id, relation, subject_namespace, subject_id, COALESCE(subject_relation, ''))
);
CREATE INDEX idx_tuples_subject ON relation_tuples(subject_namespace, subject_id);
CREATE INDEX idx_tuples_object ON relation_tuples(namespace, object_id);
-- System Transaction Log for Zookie Generation
CREATE TABLE transaction_records (
tx_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tx_token VARCHAR(255) NOT NULL UNIQUE, -- The Zookie String representation
committed_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);
Cache Model Topology (Redis Cluster)
We cache resolved relationships to avoid executing deep recursive SQL joins.
- Cache Key:
authz:path_cache:{namespace}:{object_id}:{relation} - Cache Value: A flat list of resolved subject IDs (e.g.,
["user_123", "user_456"]) with an associated Zookie token.
5. Scaling Challenges & System Bottlenecks
Graph Evaluation Loops and Recursive Depth Limits
When evaluating relationship graphs, checking user permissions can require resolving deep, nested paths. For instance, a user might belong to a group, which is a member of another group, which has access to a folder, which contains a file.
Checking Check(user_123, file_9901, "edit") requires a recursive lookup up the folder/tenant tree:
user_123 ----(member-of)---> Group A ----(member-of)---> Group B ----(owner-of)---> Folder X ----(contains)---> File 9901
Evaluating this path naively runs into two massive scaling issues:
- Deep Loops: If a circular relation exists (e.g.,
Group Abelongs toGroup B, andGroup Bbelongs toGroup A), the evaluation thread enters an infinite recursive loop. - Latency Spikes: Deep paths require multiple sequential database hops.
Mitigation Strategies:
- Loop Detection: The Evaluation Engine tracks the path stack using a local
Set<String> visited_nodesduring recursion. If a node is hit twice, the evaluation path terminates immediately to break the cycle. - Graph Flattening / Path Caching: Instead of traversing the tree on every query, we store intermediate relationship paths. For example, when adding
Group AtoGroup B, we asynchronously traverse the tree and precompute transitive memberships, caching them in Redis.
graph TD
%% Recursion Loop Engine
subgraph EngineLoop ["Recursive Evaluation Engine"]
CheckCall["Check(user_123, file_9901, 'view')"]
PathStack["Initialize path stack: [] & visited = {}"]
CheckCall --> PathStack
NextHop["Fetch direct relationships from DB/Cache"]
PathStack --> NextHop
IsVisited{"Is subject in visited?"}
NextHop --> IsVisited
IsVisited -->|Yes: Loop Detected| Terminate[Return false for this path]
IsVisited -->|No| PushStack[Add to visited & push to stack]
PushStack --> DeepPath{"Is relation matching?"}
DeepPath -->|Yes| Success[Return True]
DeepPath -->|No & Deep Limit Exceeded| Terminate
DeepPath -->|No & Limit Safe| NextHop
end
The "New Enemy" Problem (Out-of-Order Cache Updates)
Suppose a user is removed from a sensitive role (e.g., fired from the company). This update is committed to the master database. However, the update takes 5 seconds to propagate to a remote region's evaluation cache. If the user makes a request in that region immediately after being fired, they will still be authorized to read sensitive resources.
Google Zanzibar solved this using Zookies:
- When a policy write transaction commits (e.g., removing a user's role), the Database Layer returns a cryptographically signed token called a Zookie, representing a logical commit timestamp.
- The product service stores this Zookie inside the user's session context or JWT.
- On the next permission check, the product service passes the Zookie alongside the check query.
- The Evaluation Engine ensures it evaluates the request using database replicas that have caught up to at least the timestamp specified in the Zookie. If a local cache or database replica is stale, it bypasses it and reads from the master node.
6. Staff Engineer Perspective (Operational Trade-offs)
Consistency vs. Latency in Permission Revocation
Stack Protection Against Cold Cache Stampedes
7. Candidate Verbal Script (Mock Interview Guide)
This verbatim dialogue shows how a top-performing candidate navigates requirements and structures their design during a live architectural session.
Interviewer: "We want to design a centralized authorization system for a scale of 100 Million active users. If this system is in the critical request path of every microservice, how do you prevent it from becoming a single point of failure and a massive latency bottleneck?"
Candidate: "To make sure this authorization service does not become a bottleneck, I will design a tiered evaluation architecture. Sitewide latency targets must be under $5\text{ ms}$ for $p99$.
First, I will decouple the evaluation engine by deploying an AuthZ Sidecar Daemon directly alongside every microservice container. The sidecar maintains a local, high-performance in-memory LRU cache storing resolved user-permission states. For $90%$ of standard check calls, the microservice will query this local sidecar over IPC (Inter-Process Communication) or localhost gRPC, returning an answer in under $0.5\text{ ms}$ without ever traversing the network."
Interviewer: "That handles the happy path. But what about the other $10%$ cache misses? They will hit the central engines. How do you scale the evaluation engine when resolving deep relationship graphs?"
Candidate: "For cache misses, the sidecar falls back to a central pool of stateless Evaluation Engine Nodes. These nodes are sharded based on a Consistent Hashing Ring of the object_id. This ensures that permission checks for the same resource (e.g. a popular shared workspace) are routed to the same cluster nodes, maximizing the hit rate of their local caches.
Additionally, to avoid performing slow recursive database joins (for instance, tracing group memberships up a nested hierarchy), the evaluation engines use a distributed Redis Cluster to cache evaluated graph paths rather than raw database tuples. We cache resolved path sub-segments (e.g. user_123 -> member -> group_abc) with a strict TTL, allowing us to evaluate deep permission trees in memory."
Interviewer: "Excellent. How do you handle the 'New Enemy' problem? If a user's permissions are revoked in CockroachDB, but the edge cache in a remote region still has a cached 'ALLOWED' result, how do you prevent unauthorized access without invalidating the entire global cache?"
Candidate: "We can solve this by implementing Google Zanzibar's Zookie protocol. A Zookie is a lightweight consistency token that represents a logical transaction commit timestamp in our database.
When a permission write transaction occurs—such as revoking a user from a workspace—the database commits the change and returns a Zookie (e.g., zookie_commit_99182). The calling service passes this Zookie to the user client, which includes it in subsequent requests.
When the client performs a request that triggers an auth check, the sidecar daemon compares the incoming Zookie with the timestamp of its cached entry. If the cached entry's timestamp is older than the Zookie, the sidecar knows its cache is stale. It immediately bypasses the local cache, fetches the latest state from our consistent database store, and updates its local cache with the newer Zookie. This provides causal consistency, guaranteeing that once a permission is revoked, subsequent requests will reflect the change immediately."