Lesson 15 of 105 13 minFlagship

System Design: Building a Highly Available Authorization Service (RBAC/ABAC/Zanzibar)

How do you design a low-latency authorization engine that handles billions of resources? Learn about Google Zanzibar-style relationship graphs, RBAC, ABAC, and Zookie consistency.

Reading Mode

Hide the curriculum rail and keep the lesson centered for focused reading.

Key Takeaways

  • **Decoupling Authentication vs Authorization:** Authentication verifies user identity (AuthN), whereas a low-latency authorization engine (AuthZ) verifies permissions based on fine-grained access policies.
  • **Zanzibar-style Relationship Graphs:** Graph-based permission systems model authorization as directed relations (user -> member -> owner), solving deep hierarchical permission checks through recursive path evaluation.
  • **Low-Latency Evaluation Topologies:** Microservice API request gating requires sub-millisecond authorization check latencies, solved by distributed edge-caches, hash-ring sharding, and consistent read-tokens.
Recommended Prerequisites
System Design Interview FrameworkSystem Design: Designing a Distributed Lock Service

Premium outcome

From vague architecture answers to staff-level trade-off thinking.

Backend engineers preparing for senior, staff, and architecture rounds.

What you unlock

  • A reusable system design answer framework for ambiguous prompts
  • Clear language for consistency, scaling, and reliability trade-offs
  • Case-study depth across feeds, payments, storage, and messaging systems

1. Core Requirements & Scale Constraints

In a large microservice ecosystem, every single incoming API request must be authorized before it can access sensitive resources. While Authentication (AuthN) verifies who you are (typically completed once at the API Gateway via JWT verification), Authorization (AuthZ) determines what you are allowed to do.

Designing a centralized, enterprise-grade Authorization Service requires scaling to billions of resource-permission tuples while operating directly in the synchronous path of every system request.

Request Flow:
Client ---> API Gateway ---> Product Service ---> [Synchronous gRPC AuthZ Check] ---> AuthZ Service
                                                                                            |
                                                                                    Evaluates relations
                                                                                    against Tuple Store

Functional Requirements

  • Permission Checking: Expose a high-performance endpoint: Check(user_id, resource_id, relation) returning a simple boolean flag (ALLOWED / DENIED).
  • Policy Models: Support multiple paradigm models:
    • RBAC (Role-Based Access Control): Permissions linked to user roles (e.g. Role: Editor can write to Document).
    • ABAC (Attribute-Based Access Control): Permissions based on dynamic context attributes (e.g. Allow only if request_ip is within the corporate subnet).
    • ReBAC (Relationship-Based Access Control): Permissions derived from graph relationships (e.g. User A is the editor of Folder X which contains Document Y, therefore User A can edit Document Y).
  • Tuple Writing: Expose APIs to add, remove, and list permission tuples (e.g., granting user membership to a group).

Non-Functional SLAs

  • Sub-Millisecond Evaluation Latency: Permissions checks must operate with $p95 \le 2\text{ ms}$ and $p99 \le 5\text{ ms}$. Since this service is queried on virtually every microservice request, any latency here degrades the entire platform.
  • Ultra-High Availability: Maintain "Five Nines" ($99.999%$) availability. An outage in the Authorization Service blocks access to all downstream product lines.
  • Causal Consistency: Protect against the "New Enemy" problem (preventing a user from reading a file using stale cached data if their permission was revoked in a separate concurrent transaction).

Back-of-the-Envelope Estimates

  • Scale Metrics:
    • Total Users: 100 Million.
    • Total Managed Objects: 10 Billion.
    • Permission Relation Tuples: 50 Billion.
  • Request QPS:
    • Standard user action rate: Average 50,000 product API requests per second.
    • If each API request generates an average of 2 internal authorization checks (due to nested resources):
    • Average Check Throughput: $100,000\text{ queries per second (QPS)}$.
    • Peak Check Throughput (3x multiplier): $300,000\text{ QPS}$.
  • Write Throughput (Tuple updates):
    • Average: $200\text{ writes/sec}$.
    • Peak: $1,000\text{ writes/sec}$.
  • Cache Memory Estimation:
    • We cache active user permissions at the edge. Storing permissions for 10 Million monthly active users.
    • Assuming an average of 100 permission nodes per user: $1\text{ Billion cached nodes}$.
    • Average node key-value payload size (compressed): 120 bytes.
    • Total Memory footprint: $1\text{ Billion} \times 120\text{ bytes} \approx 120\text{ Gigabytes}$ of distributed RAM.

2. API Design & Core Contracts

Due to strict latency requirements, the internal authorization check API must be exposed via high-performance binary gRPC/protobuf protocols rather than JSON-over-HTTP.

Protobuf Service Contract (authz.proto)

syntax = "proto3";

package codesprintpro.authz;

service AuthorizationService {
  // Low-latency permission evaluation
  rpc CheckPermission(CheckRequest) returns (CheckResponse);
  
  // Create or delete relationship tuples
  rpc WriteTuple(WriteTupleRequest) returns (WriteTupleResponse);
}

message CheckRequest {
  string user_id = 1;
  string namespace = 2;       // e.g. "document", "repository"
  string object_id = 3;       // e.g. "doc_99018"
  string relation = 4;        // e.g. "viewer", "editor", "owner"
  string zookie = 5;          // Cryptographic consistency token
}

message CheckResponse {
  bool allowed = 1;
  string zookie = 2;          // Updated consistency token
}

message WriteTupleRequest {
  enum Operation {
    OP_INSERT = 0;
    OP_DELETE = 1;
  }
  Operation op = 1;
  string namespace = 2;
  string object_id = 3;
  string relation = 4;
  string subject_id = 5;      // Can be a specific user_id or a nested relation (e.g. "group:admin#member")
}

message WriteTupleResponse {
  string zookie = 1;          // Commit zookie token
}

Sample HTTP/REST Proxy Representation

For external integrations, a REST Gateway can convert JSON payloads into protobuf calls.

POST /v1/authz/check
Content-Type: application/json

{
  "user_id": "usr_889231",
  "namespace": "document",
  "object_id": "doc_99018",
  "relation": "editor",
  "zookie": "1c7b8d0a:0000028a"
}

3. High-Level Design (HLD)

The authorization architecture separates the Policy Administration Point (PAP), which registers metadata tuples, from the highly optimized Policy Decision Point (PDP), which evaluates permissions at sub-millisecond speeds.

graph TD
    %% Actors
    Client[Client App]
    API_Gateway[API Gateway]
    ProductService[Product Microservice]

    Client -->|API Call| API_Gateway
    API_Gateway --> ProductService

    %% Microservice to Sidecar Cache
    subgraph MicroserviceNode ["Product Microservice Host Node"]
        ProductService -->|1. Check Local Cache| SidecarCache[Local Memory Cache LRU]
    end

    %% High-Latency Failover
    ProductService -->|2. Cache Miss gRPC Check| AuthZEngine[Central AuthZ Evaluation Engine]

    %% HLD Internals
    subgraph CentralAuthZ ["Central AuthZ Clustering"]
        AuthZEngine
        EvaluationCache[(Redis Cluster Cache)]
        AuthZEngine -->|Read/Write Graph Paths| EvaluationCache
    end

    %% Storage Layer
    subgraph DatabaseLayer ["Globally Replicated Database Layer"]
        CockroachDB[(CockroachDB Globally Replicated)]
    end
    
    AuthZEngine -->|Read Master Truth| CockroachDB

    %% Policy Administration Point
    subgraph PAP ["Policy Administration Point"]
        AdminDashboard[Admin Policy Portal]
        TupleWriter[Tuple Writer Service]
        AdminDashboard --> TupleWriter
        TupleWriter -->|Write Atomic Tuples| CockroachDB
    end

    %% Event Bus & Logging
    Kafka{Apache Kafka Event Bus}
    AuthZEngine -->|Publish Audit Log| Kafka
    Kafka --> SecurityAnalytics[SIEM / Compliance Logging]

Request Evaluation Cycle

  1. Sidecar Cache Lookup: The product service first queries a local in-memory LRU cache inside its own host process. This handles $90%$ of standard check operations in under $0.5\text{ ms}$.
  2. Central Engine Query: If a cache miss occurs, the sidecar triggers a multiplexed gRPC check to the central AuthZ Engine.
  3. Graph Evaluation: The engine queries a distributed Redis Evaluation Cache to check for pre-resolved relationship subgraphs. If missing, it executes a recursive query on the master database.
  4. Master Database: Relationship tuples are stored in a highly consistent database (e.g., CockroachDB) that replicates data globally across regions using Raft consensus, ensuring serializable transactions.

4. Low-Level Design (LLD) & Data Models

Relationship-Based Access Control (ReBAC) Modeling

Permissions are modeled as directed relation tuples representing a graph: object#relation@subject

  • Example: document:doc_abc#editor@user_123 (User 123 is an editor of doc_abc).
  • Example: document:doc_abc#viewer@group:marketing#member (All members of the marketing group are viewers of doc_abc).

Master Database Schema (CockroachDB / Spanner)

-- Namespace definitions (e.g. document, group, folder)
CREATE TABLE namespaces (
    namespace_name VARCHAR(100) PRIMARY KEY,
    config JSONB NOT NULL, -- Defines schema relations and path evaluations
    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);

-- Core Relationship Tuples (The source of truth)
CREATE TABLE relation_tuples (
    namespace VARCHAR(100) NOT NULL REFERENCES namespaces(namespace_name),
    object_id VARCHAR(255) NOT NULL,
    relation VARCHAR(100) NOT NULL,
    subject_namespace VARCHAR(100) NOT NULL,
    subject_id VARCHAR(255) NOT NULL,
    subject_relation VARCHAR(100), -- Optional (e.g., "member" if referencing a group)
    tx_timestamp TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (namespace, object_id, relation, subject_namespace, subject_id, COALESCE(subject_relation, ''))
);

CREATE INDEX idx_tuples_subject ON relation_tuples(subject_namespace, subject_id);
CREATE INDEX idx_tuples_object ON relation_tuples(namespace, object_id);

-- System Transaction Log for Zookie Generation
CREATE TABLE transaction_records (
    tx_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    tx_token VARCHAR(255) NOT NULL UNIQUE, -- The Zookie String representation
    committed_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);

Cache Model Topology (Redis Cluster)

We cache resolved relationships to avoid executing deep recursive SQL joins.

  • Cache Key: authz:path_cache:{namespace}:{object_id}:{relation}
  • Cache Value: A flat list of resolved subject IDs (e.g., ["user_123", "user_456"]) with an associated Zookie token.

5. Scaling Challenges & System Bottlenecks

Graph Evaluation Loops and Recursive Depth Limits

When evaluating relationship graphs, checking user permissions can require resolving deep, nested paths. For instance, a user might belong to a group, which is a member of another group, which has access to a folder, which contains a file.

Checking Check(user_123, file_9901, "edit") requires a recursive lookup up the folder/tenant tree:

user_123 ----(member-of)---> Group A ----(member-of)---> Group B ----(owner-of)---> Folder X ----(contains)---> File 9901

Evaluating this path naively runs into two massive scaling issues:

  1. Deep Loops: If a circular relation exists (e.g., Group A belongs to Group B, and Group B belongs to Group A), the evaluation thread enters an infinite recursive loop.
  2. Latency Spikes: Deep paths require multiple sequential database hops.

Mitigation Strategies:

  • Loop Detection: The Evaluation Engine tracks the path stack using a local Set<String> visited_nodes during recursion. If a node is hit twice, the evaluation path terminates immediately to break the cycle.
  • Graph Flattening / Path Caching: Instead of traversing the tree on every query, we store intermediate relationship paths. For example, when adding Group A to Group B, we asynchronously traverse the tree and precompute transitive memberships, caching them in Redis.
graph TD
    %% Recursion Loop Engine
    subgraph EngineLoop ["Recursive Evaluation Engine"]
        CheckCall["Check(user_123, file_9901, 'view')"]
        PathStack["Initialize path stack: [] & visited = {}"]
        
        CheckCall --> PathStack
        NextHop["Fetch direct relationships from DB/Cache"]
        PathStack --> NextHop
        
        IsVisited{"Is subject in visited?"}
        NextHop --> IsVisited
        
        IsVisited -->|Yes: Loop Detected| Terminate[Return false for this path]
        IsVisited -->|No| PushStack[Add to visited & push to stack]
        
        PushStack --> DeepPath{"Is relation matching?"}
        DeepPath -->|Yes| Success[Return True]
        DeepPath -->|No & Deep Limit Exceeded| Terminate
        DeepPath -->|No & Limit Safe| NextHop
    end

The "New Enemy" Problem (Out-of-Order Cache Updates)

Suppose a user is removed from a sensitive role (e.g., fired from the company). This update is committed to the master database. However, the update takes 5 seconds to propagate to a remote region's evaluation cache. If the user makes a request in that region immediately after being fired, they will still be authorized to read sensitive resources.

Google Zanzibar solved this using Zookies:

  1. When a policy write transaction commits (e.g., removing a user's role), the Database Layer returns a cryptographically signed token called a Zookie, representing a logical commit timestamp.
  2. The product service stores this Zookie inside the user's session context or JWT.
  3. On the next permission check, the product service passes the Zookie alongside the check query.
  4. The Evaluation Engine ensures it evaluates the request using database replicas that have caught up to at least the timestamp specified in the Zookie. If a local cache or database replica is stale, it bypasses it and reads from the master node.

6. Staff Engineer Perspective (Operational Trade-offs)

Consistency vs. Latency in Permission Revocation

Stack Protection Against Cold Cache Stampedes


7. Candidate Verbal Script (Mock Interview Guide)

This verbatim dialogue shows how a top-performing candidate navigates requirements and structures their design during a live architectural session.


Interviewer: "We want to design a centralized authorization system for a scale of 100 Million active users. If this system is in the critical request path of every microservice, how do you prevent it from becoming a single point of failure and a massive latency bottleneck?"

Candidate: "To make sure this authorization service does not become a bottleneck, I will design a tiered evaluation architecture. Sitewide latency targets must be under $5\text{ ms}$ for $p99$.

First, I will decouple the evaluation engine by deploying an AuthZ Sidecar Daemon directly alongside every microservice container. The sidecar maintains a local, high-performance in-memory LRU cache storing resolved user-permission states. For $90%$ of standard check calls, the microservice will query this local sidecar over IPC (Inter-Process Communication) or localhost gRPC, returning an answer in under $0.5\text{ ms}$ without ever traversing the network."

Interviewer: "That handles the happy path. But what about the other $10%$ cache misses? They will hit the central engines. How do you scale the evaluation engine when resolving deep relationship graphs?"

Candidate: "For cache misses, the sidecar falls back to a central pool of stateless Evaluation Engine Nodes. These nodes are sharded based on a Consistent Hashing Ring of the object_id. This ensures that permission checks for the same resource (e.g. a popular shared workspace) are routed to the same cluster nodes, maximizing the hit rate of their local caches.

Additionally, to avoid performing slow recursive database joins (for instance, tracing group memberships up a nested hierarchy), the evaluation engines use a distributed Redis Cluster to cache evaluated graph paths rather than raw database tuples. We cache resolved path sub-segments (e.g. user_123 -> member -> group_abc) with a strict TTL, allowing us to evaluate deep permission trees in memory."

Interviewer: "Excellent. How do you handle the 'New Enemy' problem? If a user's permissions are revoked in CockroachDB, but the edge cache in a remote region still has a cached 'ALLOWED' result, how do you prevent unauthorized access without invalidating the entire global cache?"

Candidate: "We can solve this by implementing Google Zanzibar's Zookie protocol. A Zookie is a lightweight consistency token that represents a logical transaction commit timestamp in our database.

When a permission write transaction occurs—such as revoking a user from a workspace—the database commits the change and returns a Zookie (e.g., zookie_commit_99182). The calling service passes this Zookie to the user client, which includes it in subsequent requests.

When the client performs a request that triggers an auth check, the sidecar daemon compares the incoming Zookie with the timestamp of its cached entry. If the cached entry's timestamp is older than the Zookie, the sidecar knows its cache is stale. It immediately bypasses the local cache, fetches the latest state from our consistent database store, and updates its local cache with the newer Zookie. This provides causal consistency, guaranteeing that once a permission is revoked, subsequent requests will reflect the change immediately."


Want to track your progress?

Sign in to save your progress, track completed lessons, and pick up where you left off.