Lesson 90 of 105 12 minFlagship

System Design: Building an Audit Log System for Compliance and Debugging

Design a production audit log system with immutable events, schema design, write paths, search, retention, tamper resistance, PII controls, partitioning, and compliance tradeoffs.

Reading Mode

Hide the curriculum rail and keep the lesson centered for focused reading.

Key Takeaways

  • record user actions
  • record system actions
  • show audit history for a resource
Recommended Prerequisites
System Design Interview Framework

Premium outcome

From vague architecture answers to staff-level trade-off thinking.

Backend engineers preparing for senior, staff, and architecture rounds.

What you unlock

  • A reusable system design answer framework for ambiguous prompts
  • Clear language for consistency, scaling, and reliability trade-offs
  • Case-study depth across feeds, payments, storage, and messaging systems

Audit logs answer a mission-critical question for enterprise operations: who did what, to which resource, from where, and when? Whether investigating security breaches, addressing client compliance inquiries, analyzing debugging events, or running forensic investigations, a robust audit log platform is indispensable.

Unlike normal application logging, which is optimized for debug readability and stored temporarily, an audit log system serves as a long-term, legally compliance-grade, tamper-resistant record of truth. This case study details the system design of a high-throughput, immutable audit log platform at the scale of 100,000 servers.


1. Requirements & Core Constraints

Functional Requirements

  • Capture Audit Events: Record granular actions performed by human actors (e.g., plan tier changes, database mutations) and machine systems (e.g., automated API key rotations).
  • Search & Filter: Provide search capabilities by actor ID, target resource, tenant ID, action type, and date/time range.
  • Export Utility: Allow tenants to export compliance logs (e.g., CSV or JSON formats) over custom time-frame ranges.
  • Durable Retention: Maintain logs based on strict retention categories (e.g., billing edits retained for 7 years, admin views for 90 days).
  • Cryptographic Immutability: Proactively verify that audit records have not been altered or deleted by malicious internal actors or database administrators.

Non-Functional Requirements

  • Ultra-High Write Availability: The platform must handle continuous high-throughput writes from 100,000 application servers without blocking core user operations.
  • Low-Latency Search: Queries for recent logs (past 30 days) must complete in under 200 milliseconds.
  • Tamper Evidence: Provide mathematical guarantees (using hash chains or Merkle Trees) that any tampering at the database layer is immediately detected.
  • GDPR Compliance (PII Isolation): Provide a way to erase or anonymize personal identifiable information without breaking the structural integrity of immutable log hashes.

Back-of-the-Envelope Capacity Estimation

1. Ingress Scale Calculations

  • Active host servers: 100,000 virtual server nodes.
  • Events per server: Let's assume an average of 100 audit-worthy actions occur per server every second.
  • Total Ingestion Throughput (Write QPS):
    • 100,000 servers * 100 events/sec = 10,000,000 events/second (10 Million QPS) during peak hours.
  • Average Event JSON Payload: ~500 bytes.
  • Total Write Bandwidth:
    • 10,000,000 QPS * 500 bytes = 5 Gigabytes/second.
    • Sizing Kafka: 5 GB/s requires at least 250 shards/partitions assuming 20 MB/s per partition bandwidth limits.

2. Storage Footprint Calculations (ClickHouse & Compressed Storage)

  • Raw daily log volume:
    • 10,000,000 events/sec * 86,400 seconds * 500 bytes = 432 Terabytes raw daily.
  • ClickHouse Compression: Columnar DBMS like ClickHouse achieve excellent compression ratios (up to 80% under standard dictionaries). Let's assume an active 5x compression factor.
  • Compressed daily footprint:
    • 432 TB / 5 = ~86.4 Terabytes/day.
  • Annual compressed database storage (ClickHouse):
    • 86.4 TB * 365 days = ~31.5 Petabytes/year.

2. API Design & Core Contracts

The platform exposes gRPC APIs for high-throughput client ingestion, and REST interfaces for querying compliance audit history.

API 1: Push Ingestion Payload (gRPC Protocol)

service AuditLogService {
  rpc RecordEvent(RecordEventRequest) returns (RecordEventResponse);
}

message RecordEventRequest {
  string tenant_id = 1;
  string actor_id = 2;
  string actor_type = 3; // USER, SYSTEM
  string action = 4; // CREATE, UPDATE, DELETE
  string resource_id = 5;
  string resource_type = 6;
  string client_ip = 7;
  string user_agent = 8;
  string metadata_json = 9;
  int64 occurred_at_timestamp = 10;
}

message RecordEventResponse {
  string event_id = 1;
  string verification_hash = 2;
  bool status = 3;
}

API 2: Query Tenant Audit History (REST Endpoint)

Retrieves historically filtered audit logs for customer portals.

  • HTTP Method: GET
  • Path: /api/v1/tenants/t_9823/audit-logs
  • Query Parameters:
    • actor_id=u_2913
    • action=ROLE_UPGRADE
    • start_time=1779421200
    • end_time=1779435600
    • limit=20

Response Payload

{
  "tenant_id": "t_9823",
  "total_records": 1,
  "records": [
    {
      "event_id": "evt_981a28cb20c",
      "actor": {
        "id": "u_2913",
        "type": "USER",
        "email_hash": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
      },
      "action": "ROLE_UPGRADE",
      "resource": {
        "id": "res_admin_role",
        "type": "ROLE"
      },
      "result": "SUCCESS",
      "client_ip": "198.51.100.42",
      "occurred_at": "2026-05-22T17:44:00Z",
      "block_index": 12891,
      "block_hash": "a4b2c938de1209fb32c21980894ab7f8a12903fe2"
    }
  ]
}

3. High-Level Design (HLD)

Our architecture is split into a Fault-Tolerant Asynchronous Ingest pipeline and a Cryptographic Ledger verification segment.

Ingestion Flow & ClickHouse Topology

graph TD
    %% Application Ingestion Path
    subgraph Client Application Nodes
        Server1[App Server Pool] -->|Log Event / Local Outbox| Outbox[(Local Outbox DB)]
        Outbox -->|Background Forwarder| KafkaPool[Apache Kafka Cluster]
    end

    %% Streaming Ingestion Engine
    subgraph Streaming & Aggregation
        KafkaPool -->|High Throughput Stream| Flink[Apache Flink / Kafka Connect]
        Flink -->|Structured Streaming Write| ClickHouse[(ClickHouse Columnar Cluster)]
        Flink -->|Cryptographic Aggregation| LedgerService[Cryptographic Ledger Chainer]
    end

    %% Permanent Storage
    subgraph Secure Archive
        LedgerService -->|Merkle Tree Blocks| S3[AWS S3 - WORM Lock]
        ClickHouse -->|Cold Partitions| S3
    end

    %% Search and View Path
    subgraph Search Query Flow
        Dashboard[Compliance UI] -->|Query REST API| SearchGateway[Search Service]
        SearchGateway -->|Scan| ClickHouse
    end

Tamper Evidence Verification Logic

To prove that our logs have not been retroactively altered, the cryptographic ledger chainer aggregates logs into sequential blocks, generates a Merkle Tree for each block, and chains the root hashes together.

sequenceDiagram
    autonumber
    participant App as Application Server
    participant Kafka as Kafka Broker
    participant Chainer as Ledger Chainer Service
    participant ClickHouse as ClickHouse DB
    participant S3 as Amazon S3 (WORM)

    App->>Kafka: Publish audit events (e.g. Evt1, Evt2)
    Kafka->>Chainer: Fetch logs batch (Block Index: 45)
    Chainer->>Chainer: Build Merkle Tree root hash for Block 45
    Chainer->>Chainer: Chain root hash: SHA-256(Root_45 + Block_44_Hash)
    Chainer->>ClickHouse: Save event logs with block signatures
    Chainer->>S3: Archive sealed Block 45 metadata & Merkle Tree Root
    S3-->>Chainer: Write Confirmation (Locked via WORM policy)

4. Low-Level Design (LLD) & Data Models

Database Schema (ClickHouse Columnar Storage)

ClickHouse columnar partitions allow rapid lookups on indexed fields while enabling dense bitwise compression on metadata columns.

-- ClickHouse Audit Logs Table
CREATE TABLE default.audit_logs (
    event_id UUID,
    tenant_id String,
    actor_id String,
    actor_type Enum8('USER' = 1, 'SYSTEM' = 2),
    action LowCardinality(String),
    resource_id String,
    resource_type LowCardinality(String),
    result Enum8('SUCCESS' = 1, 'FAILURE' = 2),
    occurred_at DateTime64(3, 'UTC'),
    request_id String,
    client_ip String,
    metadata String,
    block_index UInt64,
    block_hash FixedString(32)
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(occurred_at)
PRIMARY KEY (tenant_id)
ORDER BY (tenant_id, occurred_at, action, actor_id)
SETTINGS index_granularity = 8192;

Compilable Java Implementation: Cryptographic Audit Log Chainer

This class chains individual audit logs into a hash block to detect retro-modification at the database layer.

package com.codesprintpro.audit;

import java.nio.charset.StandardCharsets;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.util.ArrayList;
import java.util.List;

public class CryptographicAuditChainer {

    public static class AuditLog {
        public final String eventId;
        public final String payload;
        public String previousHash;
        public String currentHash;

        public AuditLog(String eventId, String payload) {
            this.eventId = eventId;
            this.payload = payload;
        }
    }

    public static class AuditBlock {
        public final int blockIndex;
        public final List<AuditLog> logs = new ArrayList<>();
        public String previousBlockHash;
        public String blockRootHash;

        public AuditBlock(int blockIndex, String previousBlockHash) {
            this.blockIndex = blockIndex;
            this.previousBlockHash = previousBlockHash;
        }

        public void addLog(AuditLog log, String prevLogHash) throws NoSuchAlgorithmException {
            log.previousHash = prevLogHash;
            log.currentHash = calculateSHA256(log.eventId + log.payload + log.previousHash);
            logs.add(log);
        }

        public void sealBlock() throws NoSuchAlgorithmException {
            // Build simple cryptographic Merkle verification root
            StringBuilder blockHashInput = new StringBuilder();
            blockHashInput.append(previousBlockHash);
            for (AuditLog log : logs) {
                blockHashInput.append(log.currentHash);
            }
            this.blockRootHash = calculateSHA256(blockHashInput.toString());
        }
    }

    public static String calculateSHA256(String input) throws NoSuchAlgorithmException {
        MessageDigest digest = MessageDigest.getInstance("SHA-256");
        byte[] encodedHash = digest.digest(input.getBytes(StandardCharsets.UTF_8));
        return bytesToHex(encodedHash);
    }

    private static String bytesToHex(byte[] hash) {
        StringBuilder hexString = new StringBuilder(2 * hash.length);
        for (byte b : hash) {
            String hex = Integer.toHexString(0xff & b);
            if (hex.length() == 1) {
                hexString.append('0');
            }
            hexString.append(hex);
        }
        return hexString.toString();
    }

    public static void main(String[] args) {
        try {
            // Setup Genesis Block
            AuditBlock block0 = new AuditBlock(0, "0000000000000000000000000000000000000000000000000000000000000000");
            
            AuditLog log1 = new AuditLog("evt_001", "User t_1 assigned Admin Role");
            block0.addLog(log1, block0.previousBlockHash);

            AuditLog log2 = new AuditLog("evt_002", "User t_2 assigned Premium Role");
            block0.addLog(log2, log1.currentHash);

            block0.sealBlock();
            System.out.println("Genesis Block root hash: " + block0.blockRootHash);

            // Verify tampering detection
            String originalHash = block0.blockRootHash;
            
            // Simulating a DB Hacker modifying log1 payload
            AuditLog tamperedLog1 = new AuditLog("evt_001", "User t_1 assigned SUPER-Admin Role"); // altered
            tamperedLog1.previousHash = block0.previousBlockHash;
            String tamperedLog1Hash = calculateSHA256(tamperedLog1.eventId + tamperedLog1.payload + tamperedLog1.previousHash);

            // Re-calculate block hash with altered record
            StringBuilder tamperedHashInput = new StringBuilder();
            tamperedHashInput.append(block0.previousBlockHash);
            tamperedHashInput.append(tamperedLog1Hash);
            tamperedHashInput.append(log2.currentHash);
            String tamperedBlockRoot = calculateSHA256(tamperedHashInput.toString());

            System.out.println("Tampered Block root hash: " + tamperedBlockRoot);
            System.out.println("Has tampering been detected? " + (!originalHash.equals(tamperedBlockRoot)));

        } catch (NoSuchAlgorithmException e) {
            e.printStackTrace();
        }
    }
}

5. Scaling Challenges & Bottlenecks

ClickHouse Hot-Partition Ingestion Bottlenecks

  • Problem: Directly writing single logs to ClickHouse at a peak rate of 10 Million QPS degrades database health rapidly. ClickHouse is designed for large block insertion, not millions of concurrent single-row inserts.
  • Mitigation: Implement Micro-batching Aggregations via Kafka Connect or Apache Flink. Logs are buffered in memory and flushed to ClickHouse in optimized batches of 100,000 rows or once every 5 seconds. Flink provides exactly-once processing guarantees, preventing duplicate logs during failovers.

Storage Compression Tuning

  • Problem: Storing Petabytes of raw data in cloud systems incurs astronomical storage costs.
  • Mitigation: Configure ClickHouse Tiered Storage TTLs. The system retains hot data (past 30 days) on high-speed NVMe drives. As partitions cross the 30-day mark, a background task automatically shifts partitions to AWS S3, compressed using ZSTD compression filters to reduce storage overhead by up to 10x.

6. Technical Trade-offs & Compromises

Kafka Log Ingestion vs. Database Outbox

  • Direct Kafka writes: Extremely high throughput and low latency, but risks lost events if the application database transaction commits and the application server crashes before publishing to Kafka.
  • Database Outbox Pattern: Guarantees zero log loss by writing both business data and the audit event inside the same database transaction. The downside is increased write load on the primary transactional database (PostgreSQL).
  • Decision: We use Database Outbox for high-priority compliance events (RBAC changes, API key creations, billing transactions) and Direct Kafka Streams for high-volume, lower-priority observability logs (read queries, system status checks).

7. Failure Scenarios & Operational Resiliency

1. ClickHouse Storage Full (Node Saturation)

  • Scenario: Disk saturation on a master ClickHouse node freezes all incoming log writes, blocking Flink streaming buffers.
  • Resiliency Plan: Configure Kafka to act as a durable write buffer. If ClickHouse halts, Kafka holds logs for up to 7 days. Flink pauses partition offsets and resumes ingestion once storage partitions are cleared or shifted to S3 cold blocks.

2. Hash Chain Corruptions

  • Scenario: A storage controller bug introduces bit rot, corrupting a historical block hash and throwing integrity alarms.
  • Resiliency Plan: We replicate block metadata and Merkle roots to a highly durable, write-once-read-many (WORM) AWS S3 bucket with Object Lock policies enabled. In the event of a local hash check alarm, the chainer queries the replica hashes in S3 to verify if the alert represents local bit rot or database tampering.

3. Kafka Network Partition (Split-Brain)

  • Scenario: Network splits isolate Kafka brokers, leading to out-of-order logs across consumer groups.
  • Resiliency Plan: Ingested events include an incrementing sequence number assigned at the application level. ClickHouse deduplicates messages using the ReplacingMergeTree engine based on event ID, resolving any network-driven out-of-order duplicates.

8. Candidate Verbal Script

Below is a mock interview walkthrough demonstrating how a candidate should execute this system design interview.

Interviewer: "Design a highly available and tamper-resistant Audit Log System for 100,000 servers."

Candidate: "I will design an audit log system that acts as a secure, legally-compliant ledger of truth. To handle writes from 100,000 host servers, we expect an ingestion peak of 10 Million QPS. I will implement an Asynchronous micro-batched architecture.

Instead of writing directly to our data layers, application servers will publish audit events into partitioned Apache Kafka streams. A dedicated streaming pipeline (using Apache Flink) will ingest these logs and execute micro-batch flushes into our primary query engine, ClickHouse. ClickHouse is a columnar database designed to handle high-throughput analytical query speeds, which lets us query recent logs within 200 milliseconds.

To achieve tamper resistance, I will build a Cryptographic Ledger Chainer. Flink will bundle logs into distinct sequential blocks, build a Merkle Tree for each block, and calculate a chained SHA-256 root hash that references the previous block hash. These roots will be archived inside an AWS S3 bucket locked with strict WORM write policies. If an intruder attempts to alter a historical database row, the computed Merkle root will mismatch, instantly triggering high-priority security alerts.

To align with GDPR compliance laws without altering historical hash roots, I will use cryptographic shredding. PII like usernames or emails are hashed using user-specific salt keys. Upon deletion request, we delete the salt key from our secure Vault, rendering the audit log anonymized while preserving the validity of the hash chain."

Interviewer: "Why use ClickHouse instead of Elasticsearch/OpenSearch for log querying?"

Candidate: "While Elasticsearch is superb for unstructured full-text search, it carries a high storage and indexing overhead. In an audit log system, queries are highly structured (e.g., filter by tenant ID and resource ID). ClickHouse is a columnar DBMS that compresses structured data by up to 5x more than Elasticsearch, saving Petabytes of storage across years of logs. ClickHouse also performs parallelized scan queries across indexed partition keys at speeds comparable to inverted indexes, offering massive cost savings without sacrificing search latency."


Key Takeaways

  • record user actions
  • record system actions
  • show audit history for a resource

Want to track your progress?

Sign in to save your progress, track completed lessons, and pick up where you left off.