System DesignAdvancedguide

Event-Driven Architecture: CQRS and Event Sourcing in Practice

A practical guide to CQRS and Event Sourcing. Learn how to segregate command and query paths, design append-only event stores, and build read-side projections.

Sachin SarawgiApril 20, 202610 min read10 minute lesson

Reading Mode

Reduce distractions and widen the article focus for long-form reading.

Key Takeaways

What you will learn

**CQRS Segregation:** Commands (writes) and Queries (reads) are split into separate execution paths and database tables optimizing both paths independently.

**Event Sourcing:** State is not stored as a single mutable row; instead, it is rebuilt dynamically by replaying a sequence of immutable event records.

**Eventual Consistency Projections:** Read-optimized views are updated asynchronously from the event stream, creating eventual consistency write-to-read windows.

Mental Model

In traditional CRUD (Create, Read, Update, Delete) architectures, the same database model is used for both writing and reading data. Under high traffic, this creates locking contention and complex SQL joins that degrade database throughput. CQRS (Command Query Responsibility Segregation) and Event Sourcing resolve this by segregating write commands from read queries, and storing state as a sequence of immutable events rather than mutable rows.

Instead of executing UPDATE accounts SET balance = balance + 100 on a single row, an event-sourced system appends a FundsDeposited event to an immutable log. The current state is a derived value, computed by reading all past events in sequence. By decoupling the write path from the read path, we can optimize both independently, scaling writes to high velocities while serving complex read queries with low latency.


System Requirements

To implement a production-grade CQRS/ES banking system, we define these requirements:

Functional Requirements

  • Immutable State Ledger: Every state mutation (depositing money, withdrawing, locking account) must be stored as an immutable event record.
  • Dynamic Projections: The system must build a read-optimized view showing the user's current account balance.
  • Command Validation: The write path must validate command business rules (e.g. preventing withdrawals exceeding current balance) before committing events.
  • Auditing and Replay: The system must allow reconstructing the system state at any historical point in time by replaying events up to that specific timestamp.

Non-Functional Requirements

  • Write Path SLA: Appending an event to the Event Store must maintain a P99 latency of less than 15 milliseconds.
  • Read Path SLA: Fetching the account balance projection must return a response in less than 5 milliseconds.
  • Consistency Window: The lag between an event commit and its read-side projection update must be under 100 milliseconds.
  • Optimistic Concurrency: Prevent simultaneous write command updates from corrupting the transaction sequence under concurrent processing.

API Design and Interface Contracts

To segregate writes and reads, the system exposes decoupled command and query interfaces. Below is a structured JSON API payload representing a Command write request, followed by the corresponding Query read contract:

Command Request Payload (Client to Write API)

  • Endpoint: POST /v1/commands/accounts/deposit

Request Payload (JSON):

{
  "command_id": "cmd-880099-deposit",
  "command_type": "DEPOSIT_FUNDS",
  "aggregate_id": "acc-998811",
  "payload": {
    "amount_cents": 50000,
    "currency": "USD",
    "reference": "Payroll deposit"
  },
  "created_at": "2026-05-23T10:00:00.123Z"
}

Query API Response (Read API to Client)

  • Endpoint: GET /v1/queries/accounts/acc-998811/balance

Response Payload (JSON - 200 OK):

{
  "aggregate_id": "acc-998811",
  "current_balance_cents": 150000,
  "currency": "USD",
  "last_updated_sequence": 14,
  "last_updated_at": "2026-05-23T10:00:00.223Z"
}

High-Level Architecture

CQRS and Event Sourcing decouple operations by splitting datastores into Write Models (Event Stores) and Read Models (Projections).

1. Command Write Pipeline

The client submits a command. The Command Handler fetches past events to rebuild the aggregate's current state, validates the business logic, and appends new events atomically to the Event Store. These events are published to a Message Bus.

graph TD
    Client[Client Command] -->|POST /api/v1/commands| Handler[Command Handler]
    Handler -->|1. Fetch past events| Store[(Event Store)]
    Handler -->|2. Validate & Append| Store
    Store -->|3. Publish Event| Bus[Kafka Event Bus]
    
    %% Style annotations
    classDef writeColor fill:#e1f5fe,stroke:#01579b,stroke-width:2px;
    class Handler,Store writeColor;

2. Read-Side Projection Pipeline

The Projection Processor consumes events from the Message Bus and writes updates to a read-optimized View Database (such as Elasticsearch or Redis). The Query Service serves client reads directly from this View DB with zero SQL joins.

graph TD
    Bus[Kafka Event Bus] -->|Consume| Projection[Projection Processor]
    Projection -->|1. Write optimized update| ViewDB[(Read-Optimized View DB)]
    ClientQuery[Client Query] -->|GET /api/v1/balance| QueryService[Query Service]
    QueryService -->|2. Fetch| ViewDB
    
    %% Style annotations
    classDef readColor fill:#fff3e0,stroke:#e65100,stroke-width:2px;
    class Projection,ViewDB readColor;

Low-Level Design and Schema

Below is a production-ready, compilable Java class modeling an Event Sourced Aggregate. It handles event replay, state mutation application, and command execution:

package com.codesprintpro.eventdriven;

import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.ConcurrentHashMap;

public class EventSourcedAccount {
    private final String aggregateId;
    private int sequence = 0;
    private long balanceCents = 0;
    private final List<AccountEvent> changes = new ArrayList<>();

    public EventSourcedAccount(String aggregateId) {
        this.aggregateId = aggregateId;
    }

    /**
     * Rebuilds aggregate state by replaying a sequence of past events.
     */
    public void replay(List<AccountEvent> history) {
        for (AccountEvent event : history) {
            apply(event, false);
        }
    }

    /**
     * Executes a deposit command. Validates parameters and commits
     * a corresponding event to the state log.
     */
    public void deposit(long amountCents) {
        if (amountCents <= 0) {
            throw new IllegalArgumentException("Amount must be positive");
        }
        AccountEvent event = new AccountEvent(
                this.aggregateId, ++this.sequence, "FUNDS_DEPOSITED", amountCents
        );
        apply(event, true);
    }

    private void apply(AccountEvent event, boolean isNew) {
        this.sequence = event.getSequence();
        if ("FUNDS_DEPOSITED".equals(event.getEventType())) {
            this.balanceCents += event.getPayload();
        }
        if (isNew) {
            this.changes.add(event);
        }
    }

    public List<AccountEvent> getChanges() {
        return this.changes;
    }

    public long getBalanceCents() {
        return this.balanceCents;
    }

    public static class AccountEvent {
        private final String aggregateId;
        private final int sequence;
        private final String eventType;
        private final long payload;

        public AccountEvent(String aggregateId, int seq, String type, long val) {
            this.aggregateId = aggregateId;
            this.sequence = seq;
            this.eventType = type;
            this.payload = val;
        }

        public String getAggregateId() { return aggregateId; }
        public int getSequence() { return sequence; }
        public String getEventType() { return eventType; }
        public long getPayload() { return payload; }
    }
}

The write-side event tables are mapped with primary composite keys on (aggregate_id, sequence) to ensure that no two events can share the same offset, enforcing physical partition integrity.


Scaling Challenges and Capacity Estimation

Operating event-sourced architectures at scale exposes distinct low-level system constraints:

1. Long Replay Latency (Large Event Logs)

If an account has been active for 5 years, it can contain 100,000 unique events. Loading and replaying 100,000 events to rebuild state for a single balance check is too slow, degrading write throughput.

  • Mitigation: Implement Snapshots. Every 100 events, the system serializes the current aggregate state (balance snapshot) and writes it to a snapshot table. When rebuilding state, the system loads the latest snapshot and only replays subsequent events.

2. Sizing Snapshots and Event Growth

Let's conduct a capacity estimation for a 10M active accounts system:

  • Events Per Day: Assume 5 Million transactions per day.
  • Raw Event Storage Size: Each event payload is 300 bytes. $$\text{Daily Event Storage} = 5,000,000 \text{ events} \times 300 \text{ bytes} = 1.5 \text{ GB per day}.$$
  • Snapshot Storage Size: A snapshot is written every 100 events. The size of an account state snapshot is 200 bytes. $$\text{Daily Snapshots} = (5,000,000 / 100) \times 200 \text{ bytes} = 10 \text{ MB per day}.$$ This shows that snapshots compress the active retrieval load significantly, keeping database read buffers extremely small.

3. Projection Sync Lag (Eventual Consistency)

Under write bursts, Kafka brokers can experience latency, increasing the window during which the read-optimized views are stale relative to the Event Store.

  • Mitigation: Implement Session Consistency Tokens. Return the event sequence number (last_updated_sequence) in the write API response. The client passes this token on reads, and the Query Service blocks until the read view has caught up to that sequence.

Failure Scenarios and Resilience

Resilience loops are required to handle system state drifts:

Scenario A: Projection Database Corruption

If the Elasticsearch read-side cluster crashes or corrupts its indices, all query views will become unavailable or return incorrect state.

  • Resiliency Mitigation: Execute Re-projection. Since the Event Store is the absolute, immutable source of truth, you can discard the corrupted Elasticsearch index, create a new, empty index, and replay the entire event stream from offset 0 to rebuild the views from scratch.

Scenario B: Concurrent Command Races (Optimistic Locking Failures)

If two commands attempt to mutate the same account concurrently, they will read the same sequence and attempt to commit events with the same sequence number, causing duplicate key conflicts in the Event Store.

  • Resiliency Mitigation: Enforce an optimistic locking column constraint on (aggregate_id, sequence) in the event table, causing the second write transaction to fail and retry safely.

Scenario C: Out-Of-Order Event Ingestion on Read-Side

Under network partitions, the projection processor may ingest events out of order (e.g. sequence 15 arrives before sequence 14).

  • Resiliency Mitigation: The projection processor checks the incoming event's sequence. If the event sequence is greater than the current read model's sequence plus one, it buffers the late-arriving event in a local queue and waits for the missing sequence to arrive.

Architectural Trade-offs

The choice of data pattern dictates the coordination boundaries of a platform:

Strategy Write Speed Read Speed Audit Capability Implementation Complexity
Standard CRUD Medium (Requires lock checks) Low (Complex joins) None Low
CQRS (Segregated DBs) Medium High (Read-optimized Views) None Medium
Event Sourcing (No CQRS) High (Append-only) Extremely Low (Slow replay) Absolute High
CQRS + Event Sourcing High (Append-only) High (Read-optimized Views) Absolute (Complete ledger history) Extremely High

Selecting the combined CQRS + Event Sourcing pattern yields maximum performance and audit compliance but introduces distributed coordination, eventual consistency lag, and code complexity.


Staff Engineer Perspective


Verbal Script

Interviewer: "What are the benefits of CQRS and Event Sourcing, and how do they differ from a standard database design?"

Candidate: "A standard database design uses a single, mutable record for both reads and writes. Under high traffic, this creates database locking contention and database performance degradation due to indexing and complex queries. CQRS segregates these paths: commands execute writes on a write-optimized database, while queries read from separate, read-optimized views. Event Sourcing takes this further—instead of storing the current mutable state, it records every mutation as an immutable event. State is rebuilt dynamically by replaying the sequence of events. This ensures an absolute, audit-ready ledger with zero write locks."

Interviewer: "Excellent. How does this segregation affect the consistency model of the application?"

Candidate: "It introduces Eventual Consistency. Because the write-optimized Event Store and the read-optimized views are separate databases, there is a delay while events are processed and written to the query views. This means a user might submit a write command, query the view immediately after, and observe stale data. To mitigate this and provide a clean user experience, we implement session consistency tokens. The write API returns the committed event sequence number, and the read path checks if the query index has caught up to that sequence before returning the balance."

Practical engineering notes

Get the next backend guide in your inbox

One useful note when a new deep dive is published: system design tradeoffs, Java production lessons, Kafka debugging, database patterns, and AI infrastructure.

No spam. Just practical notes you can use at work.

Sachin Sarawgi

Written by

Sachin Sarawgi

Engineering Manager and backend engineer with 10+ years building distributed systems across fintech, enterprise SaaS, and startups. CodeSprintPro is where I write practical guides on system design, Java, Kafka, databases, AI infrastructure, and production reliability.

Keep Learning

Move through the archive without losing the thread.

Related Articles

More deep dives chosen from shared tags, category overlap, and reading difficulty.

System DesignAdvanced

API Pagination at Scale: Why OFFSET 100,000 is a Database Killer

Designing a paginated API seems simple. Standard frameworks make it trivial: just use LIMIT 20 OFFSET 100. This works perfectly during development and for the first few pages of small tables. However, once your data scal…

Apr 20, 202611 min read
Deep DiveBackend Systems Mastery
#databases#java#performance
System DesignAdvanced

Bypassing the Kernel: User-Space Networking for Sub-Microsecond Performance

Mental Model For ultra-low-latency distributed systems—such as high-frequency trading (HFT) matching engines, real-time telemetry filters, and high-performance packet routers—even the optimized Linux kernel is too slow.…

Apr 20, 202611 min read
Deep DivePerformance & Optimization Mastery
#performance#system-design
System DesignAdvanced

HyperLogLog at Scale: Billion-Cardinality Estimation

Mental Model > Connecting isolated components into a resilient, scalable, and observable distributed web. Counting unique items (such as Daily Active Users - DAUs, unique page views, or IP addresses) is a classic problem…

Apr 20, 202614 min read
Deep Dive
#performance#system-design
System DesignAdvanced

gRPC Schema Evolution: Avoiding Breaking Changes

Mental Model > Connecting isolated components into a resilient, scalable, and observable distributed web. In globally distributed microservice architectures, deploying every service simultaneously to update an API is imp…

Apr 20, 202612 min read
Deep Dive
#performance#system-design

More in System Design

Category-based suggestions if you want to stay in the same domain.