The Saga Pattern: Consistency Without Distributed Locks
In monolithic architectures, maintaining data integrity is straightforward. The database engine coordinates multi-table updates using a single database transaction governed by ACID (Atomicity, Consistency, Isolation, Durability) properties. If any single query fails, the engine issues a rollback, restoring the entire system state.
In microservices architectures where each service owns its database (a practice known as the Database-per-Service pattern), transactions must span multiple physical network boundaries and storage engines (often a mix of SQL and NoSQL). Traditional distributed transaction protocols, like Two-Phase Commit (2PC) or Three-Phase Commit (3PC), are highly blocking, suffer from network latency degradation, and introduce severe availability bottlenecks (violating the CAP theorem by favoring CP at the cost of high latency).
To achieve resilient, highly scalable transactions across masterless NoSQL databases (like Apache Cassandra, Amazon DynamoDB, or MongoDB), we must rely on Eventual Consistency and utilize the Saga Pattern.
1. System Requirements & Goals
Let’s formulate our design goals using a typical e-commerce order checkout pipeline involving three independent services:
- Order Service: Creates an order in a
PENDINGstate. - Payment Service: Processes the user's credit card charges.
- Inventory Service: Reserves stock for the items purchased.
Functional Requirements
- Complete a multi-step checkout pipeline comprising order creation, payment authorization, and inventory reservation.
- Automatically rollback the entire system state if a downstream step fails (e.g., payment is declined or stock is unavailable).
- Ensure that users can query their order state throughout the entire transaction.
Non-Functional Requirements
- High Availability: The checkout flow must not block if a single service encounters transient latency or crash loops.
- Low Latency: The client request-response cycle must complete within 200ms, offloading long-running transaction steps asynchronously.
- Resiliency: Ensure "at-least-once" message delivery, guaranteeing that rollbacks or completions eventually execute.
2. HLD & API Endpoints Request Flow
A Saga is a sequence of local transactions. Each service executes its database updates locally and emits an event or message that triggers the next service in line. If any step fails, the Saga executes a reverse chain of Compensating Transactions to undo the changes.
There are two primary patterns to implement a Saga: Choreography and Orchestration.
Choreography (Decoupled, Event-Driven)
In Choreography, there is no central controller. Services coordinate by subscribing to message broker channels (e.g., Apache Kafka or RabbitMQ) and executing local actions based on incoming events.
sequenceDiagram
autonumber
participant Client
participant OrderService as Order Service
participant PaymentService as Payment Service
participant InventoryService as Inventory Service
participant Broker as Message Broker (Kafka)
Client->>OrderService: POST /api/v1/orders
OrderService->>OrderService: Create Local Order (PENDING)
OrderService-->>Client: 202 Accepted (OrderID)
OrderService->>Broker: Emit Event: OrderCreated
Broker-->>PaymentService: Consume: OrderCreated
PaymentService->>PaymentService: Authorize Credit Card
alt Payment Succeeded
PaymentService->>Broker: Emit Event: PaymentAuthorized
Broker-->>InventoryService: Consume: PaymentAuthorized
InventoryService->>InventoryService: Reserve Stock
InventoryService->>Broker: Emit Event: InventoryReserved
Broker-->>OrderService: Consume: InventoryReserved
OrderService->>OrderService: Set Order Status to APPROVED
else Payment Declined
PaymentService->>Broker: Emit Event: PaymentFailed
Broker-->>OrderService: Consume: PaymentFailed
OrderService->>OrderService: Set Order Status to CANCELLED
end
Orchestration (Command-Based Controller)
In Orchestration, a central state machine (the Saga Orchestrator) instructs each participant service to execute specific commands, receives their response payloads, and decides the next step or triggers compensation workflows.
sequenceDiagram
autonumber
participant Client
participant Orchestrator as Saga Orchestrator
participant Order as Order Service
participant Payment as Payment Service
participant Inventory as Inventory Service
Client->>Orchestrator: Start Checkout Saga
Orchestrator->>Order: Command: CreateOrder
Order-->>Orchestrator: Success (OrderID)
Orchestrator->>Payment: Command: ProcessPayment
alt Payment Fails (Declined)
Payment-->>Orchestrator: Failure (Insufficent Funds)
Orchestrator->>Order: Compensate: CancelOrder
Order-->>Orchestrator: Confirmed
Orchestrator-->>Client: Error: Checkout Failed (Payment Declined)
else Payment Succeeds
Payment-->>Orchestrator: Success (TxID)
Orchestrator->>Inventory: Command: ReserveInventory
alt Inventory Fails (Out of Stock)
Inventory-->>Orchestrator: Failure (No Stock)
Orchestrator->>Payment: Compensate: RefundPayment
Payment-->>Orchestrator: Confirmed
Orchestrator->>Order: Compensate: CancelOrder
Order-->>Orchestrator: Confirmed
Orchestrator-->>Client: Error: Checkout Failed (Out of Stock)
else Inventory Succeeds
Inventory-->>Orchestrator: Success (Reserved)
Orchestrator->>Order: Command: FinalizeOrder (APPROVED)
Order-->>Orchestrator: Success
Orchestrator-->>Client: Success: Order Completed
end
end
3. Low-Level Design (LLD) & Data Models
Because NoSQL databases lack cross-node transactional locks, every local step in a Saga must write its state transactionally inside its local boundaries.
Database Schema (PostgreSQL/NoSQL DDL)
To track the state of the Saga Orchestrator in an orchestration-based saga, we design a resilient, state-store table. This store must be transactional to prevent double-execution of steps:
-- DDL for the Saga State Store (Orchestrator Database)
CREATE TABLE saga_instances (
saga_id UUID PRIMARY KEY,
type VARCHAR(100) NOT NULL,
current_status VARCHAR(50) NOT NULL, -- STARTED, PAYMENT_SUCCESS, COMPENSATING, FAILED, COMPLETED
payload JSONB NOT NULL,
retry_count INT DEFAULT 0,
created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE saga_steps (
step_id UUID PRIMARY KEY,
saga_id UUID REFERENCES saga_instances(saga_id) ON DELETE CASCADE,
step_name VARCHAR(100) NOT NULL, -- CREATE_ORDER, AUTHORIZE_PAYMENT, RESERVE_INVENTORY
status VARCHAR(50) NOT NULL, -- PENDING, SUCCESS, FAILED, COMPENSATED
action_payload JSONB,
compensation_payload JSONB,
updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);
Compilable TypeScript Saga Orchestrator State Machine
Here is a complete, compilable TypeScript implementation of a resilient Command/Compensation Saga Coordinator using a simplified state-machine interface:
type StepResult = { success: boolean; data?: any; error?: string };
interface SagaStep {
name: string;
execute: (payload: any) => Promise<StepResult>;
compensate: (payload: any) => Promise<StepResult>;
}
export class SagaOrchestrator {
private steps: SagaStep[] = [];
private completedSteps: SagaStep[] = [];
constructor(steps: SagaStep[]) {
this.steps = steps;
}
public async run(initialPayload: any): Promise<{ success: boolean; history: string[] }> {
const history: string[] = [];
let currentPayload = { ...initialPayload };
for (const step of this.steps) {
console.log(`[SAGA] Executing Step: ${step.name}`);
history.push(`EXECUTE:${step.name}`);
try {
const result = await step.execute(currentPayload);
if (result.success) {
this.completedSteps.push(step);
currentPayload = { ...currentPayload, ...result.data };
history.push(`SUCCESS:${step.name}`);
} else {
console.error(`[SAGA] Step ${step.name} failed. Initiating rollbacks.`);
history.push(`FAIL:${step.name} - ${result.error}`);
await this.rollback(currentPayload, history);
return { success: false, history };
}
} catch (err: any) {
console.error(`[SAGA] Critical exception in step ${step.name}:`, err.message);
history.push(`EXCEPTION:${step.name}`);
await this.rollback(currentPayload, history);
return { success: false, history };
}
}
console.log("[SAGA] Saga completed successfully!");
return { success: true, history };
}
private async rollback(payload: any, history: string[]): Promise<void> {
// Execute compensating transactions in reverse order
for (let i = this.completedSteps.length - 1; i >= 0; i--) {
const step = this.completedSteps[i];
console.log(`[SAGA] [ROLLBACK] Compensating: ${step.name}`);
history.push(`COMPENSATE:${step.name}`);
let retries = 3;
while (retries > 0) {
try {
const compResult = await step.compensate(payload);
if (compResult.success) {
history.push(`COMPENSATED:${step.name}`);
break;
}
} catch (err: any) {
console.error(`[SAGA] [ALERT] Compensation ${step.name} failed. Retries remaining: ${retries - 1}`);
}
retries--;
if (retries === 0) {
// Send to Dead-Letter Queue (DLQ) for manual operations review
console.error(`[SAGA] [CRITICAL] Compensation failed for ${step.name}. Pushed to DLQ.`);
history.push(`DLQ_TRIGGERED:${step.name}`);
}
}
}
}
}
4. Scaling Challenges & Estimations
When designing Sagas at high throughput (e.g., 50,000 requests per second), back-of-the-envelope estimations dictate network capacity and queue partitions.
Back-of-the-Envelope Math
- Throughput: $50,000 \text{ checkout requests/sec}$.
- Payload size: Average message envelope is $1 \text{ KB}$.
- Message amplification:
- Under Choreography, each checkout requires 6 events (Order Created $\to$ Payment Authorized $\to$ Stock Reserved $\to$ Completed $\to$ plus confirmations).
- $50,000 \times 6 = 300,000 \text{ events/sec}$.
- Bandwidth requirement: $300,000 \text{ events/sec} \times 1 \text{ KB} = 300,000 \text{ KB/sec} \approx 300 \text{ MB/s}$ or $2.4 \text{ Gbps}$ egress/ingress across Kafka clusters.
- Storage requirements: If logging Saga steps for auditability:
- $50,000 \text{ Sagas/sec} \times 1 \text{ KB} = 50 \text{ MB/sec}$.
- $50 \text{ MB/sec} \times 86,400 \text{ sec/day} \approx 4.32 \text{ TB of state logs per day}$.
- Cassandra Sharding Strategy: We partition
saga_stepstable onsaga_idhash ring to ensure linear scale without hot hotspots.
5. Architectural Trade-offs: Choreography vs. Orchestration
| Dimension | Choreography (Event-Driven) | Orchestration (Command-State) |
|---|---|---|
| Coupling | Extremely Low: Services only know about the event broker and their local schemas. | Medium: Services must expose command routes/listeners defined by the orchestrator. |
| Complexity | High at scale: Hard to visualize the entire transaction path without distributed tracing tools. | Low: The central orchestrator logs the state machine trace explicitly. |
| Bottlenecks | None: Scaled out easily via message partitions. | High: Centralized orchestrator can create CPU bottlenecks or single points of failure. |
| Cycle Risk | High: Cyclic dependencies can emerge if service A emits an event that service B uses to trigger service A. | None: The execution path is explicitly controlled. |
6. Failure Scenarios, Resiliency & Mitigations
Sagas do not support Isolation (the "I" in ACID) because local transactions commit immediately. This creates three critical failure vectors:
A. Lost Updates & Dirty Reads
Saga $T_1$ updates data. Saga $T_2$ reads the uncommitted data, or overwrites it. If $T_1$ subsequently fails and issues rollbacks, $T_2$'s calculations will be based on dirty data.
- Mitigation (Semantic Locks): Add a
statusfield to the database rows (e.g.,status: 'PENDING_RESERVATION'). Downstream queries must filter out or block operations on records with pending locks until the Saga resolves.
B. Compensation Failures
What happens if the inventory reservation succeeds, payment succeeds, but the final shipping coordinator fails, and during rollback, the RefundPayment compensation fails?
- Mitigation (Idempotency and DLQs): Every compensating service must be strictly idempotent (handling identical retries with no side-effects). If a compensation cannot succeed after $N$ backoff retries, the orchestrator halts execution, generates alerts, and moves the Saga trace to a Dead Letter Queue (DLQ) for manual operator remediation.
C. Out-of-Order Events
Under heavy load, a Kafka cluster partition recovery might deliver a PaymentFailed event before the OrderCreated processing is complete.
- Mitigation: Implement event sequence tracking versions inside the saga instance payload. Drop or delay events whose prior sequence numbers are not yet accounted for in the local database stage.
7. Staff Engineer Perspective: Production Trade-offs
Communal Lock Strategy
To maintain the illusion of isolation without blocking threads, Staff Engineers design Commutative Operations. If your transaction is additive (like adding money to a ledger or deducting inventory count), design the database updates as decrement operations (UPDATE inventory SET stock = stock - 5 WHERE item_id = X AND stock >= 5) rather than setting an absolute value. This removes the need for locking, allowing multiple Sagas to proceed concurrently.
8. Verbatim Mock Interview Script
Interviewer Dialogue
-
Interviewer: "You've designed a Saga pattern for e-commerce checkouts. Sagas don't have isolation. How do you prevent a customer from buying inventory that another customer has reserved but hasn't paid for yet?"
-
Candidate: "This is a classic 'dirty read' problem caused by Sagas lacking isolation. Sagas commit local database updates immediately. If Customer A reserves 5 items and Customer B searches the stock, Customer B might see those 5 items as available or see them disappear, only for Customer A's checkout to fail and inventory to rollback.
To solve this at high throughput, I avoid pessimistic database locks. Instead, I implement Semantic Locks utilizing an explicit field state on our inventories. The reserveInventory step decrements the available_stock column but increments a pending_reservations_stock column inside a local atomic SQL update. Customer searches only calculate buyable stock using the formula available_stock - pending_reservations_stock.
If Customer A's payment succeeds, the orchestrator issues a finalize command that decrements the pending_reservations_stock and sets the items as permanently sold. If payment fails, the compensating transaction runs, decrementing pending_reservations_stock and adding the count back. This ensures no other customer can claim those 5 items during the 30-second checkout window, while preventing physical thread blocks."
-
Interviewer: "What happens if the message broker goes down right after the order service updates the state, but before emitting the message to the payment service?"
-
Candidate: "We must prevent 'lost updates' by enforcing the Transactional Outbox Pattern. The Order Service does not write to the DB and publish to Kafka sequentially in application memory. Instead, we write the
Orderrecord and anOutboxEventrecord into our database under a single local database transaction.
We then run a lightweight CDC daemon, such as Debezium, which continuously monitors our database binlog. The moment the transaction commits, Debezium streams the event out of the transaction log directly to Kafka. If Kafka goes down, the event remains safely stored in the database's write-ahead log (WAL) and outbox table, waiting to be sent the moment Kafka recovers, guaranteeing 'at-least-once' delivery."