The Saga Pattern: Managing Distributed Transactions in NoSQL

The Saga Pattern: Consistency Without Distributed Locks

In monolithic architectures, maintaining data integrity is straightforward. The database engine coordinates multi-table updates using a single database transaction governed by ACID (Atomicity, Consistency, Isolation, Durability) properties. If any single query fails, the engine issues a rollback, restoring the entire system state.

In microservices architectures where each service owns its database (a practice known as the Database-per-Service pattern), transactions must span multiple physical network boundaries and storage engines (often a mix of SQL and NoSQL). Traditional distributed transaction protocols, like Two-Phase Commit (2PC) or Three-Phase Commit (3PC), are highly blocking, suffer from network latency degradation, and introduce severe availability bottlenecks (violating the CAP theorem by favoring CP at the cost of high latency).

To achieve resilient, highly scalable transactions across masterless NoSQL databases (like Apache Cassandra, Amazon DynamoDB, or MongoDB), we must rely on Eventual Consistency and utilize the Saga Pattern.

1. System Requirements & Goals

Let’s formulate our design goals using a typical e-commerce order checkout pipeline involving three independent services:

Order Service: Creates an order in a PENDING state.
Payment Service: Processes the user's credit card charges.
Inventory Service: Reserves stock for the items purchased.

Functional Requirements

Complete a multi-step checkout pipeline comprising order creation, payment authorization, and inventory reservation.
Automatically rollback the entire system state if a downstream step fails (e.g., payment is declined or stock is unavailable).
Ensure that users can query their order state throughout the entire transaction.

Non-Functional Requirements

High Availability: The checkout flow must not block if a single service encounters transient latency or crash loops.
Low Latency: The client request-response cycle must complete within 200ms, offloading long-running transaction steps asynchronously.
Resiliency: Ensure "at-least-once" message delivery, guaranteeing that rollbacks or completions eventually execute.

2. HLD & API Endpoints Request Flow

A Saga is a sequence of local transactions. Each service executes its database updates locally and emits an event or message that triggers the next service in line. If any step fails, the Saga executes a reverse chain of Compensating Transactions to undo the changes.

There are two primary patterns to implement a Saga: Choreography and Orchestration.

Choreography (Decoupled, Event-Driven)

In Choreography, there is no central controller. Services coordinate by subscribing to message broker channels (e.g., Apache Kafka or RabbitMQ) and executing local actions based on incoming events.

sequenceDiagram
    autonumber
    participant Client
    participant OrderService as Order Service
    participant PaymentService as Payment Service
    participant InventoryService as Inventory Service
    participant Broker as Message Broker (Kafka)

    Client->>OrderService: POST /api/v1/orders
    OrderService->>OrderService: Create Local Order (PENDING)
    OrderService-->>Client: 202 Accepted (OrderID)
    OrderService->>Broker: Emit Event: OrderCreated
    Broker-->>PaymentService: Consume: OrderCreated
    PaymentService->>PaymentService: Authorize Credit Card
    alt Payment Succeeded
        PaymentService->>Broker: Emit Event: PaymentAuthorized
        Broker-->>InventoryService: Consume: PaymentAuthorized
        InventoryService->>InventoryService: Reserve Stock
        InventoryService->>Broker: Emit Event: InventoryReserved
        Broker-->>OrderService: Consume: InventoryReserved
        OrderService->>OrderService: Set Order Status to APPROVED
    else Payment Declined
        PaymentService->>Broker: Emit Event: PaymentFailed
        Broker-->>OrderService: Consume: PaymentFailed
        OrderService->>OrderService: Set Order Status to CANCELLED
    end

Orchestration (Command-Based Controller)

In Orchestration, a central state machine (the Saga Orchestrator) instructs each participant service to execute specific commands, receives their response payloads, and decides the next step or triggers compensation workflows.

sequenceDiagram
    autonumber
    participant Client
    participant Orchestrator as Saga Orchestrator
    participant Order as Order Service
    participant Payment as Payment Service
    participant Inventory as Inventory Service

    Client->>Orchestrator: Start Checkout Saga
    Orchestrator->>Order: Command: CreateOrder
    Order-->>Orchestrator: Success (OrderID)
    Orchestrator->>Payment: Command: ProcessPayment
    alt Payment Fails (Declined)
        Payment-->>Orchestrator: Failure (Insufficent Funds)
        Orchestrator->>Order: Compensate: CancelOrder
        Order-->>Orchestrator: Confirmed
        Orchestrator-->>Client: Error: Checkout Failed (Payment Declined)
    else Payment Succeeds
        Payment-->>Orchestrator: Success (TxID)
        Orchestrator->>Inventory: Command: ReserveInventory
        alt Inventory Fails (Out of Stock)
            Inventory-->>Orchestrator: Failure (No Stock)
            Orchestrator->>Payment: Compensate: RefundPayment
            Payment-->>Orchestrator: Confirmed
            Orchestrator->>Order: Compensate: CancelOrder
            Order-->>Orchestrator: Confirmed
            Orchestrator-->>Client: Error: Checkout Failed (Out of Stock)
        else Inventory Succeeds
            Inventory-->>Orchestrator: Success (Reserved)
            Orchestrator->>Order: Command: FinalizeOrder (APPROVED)
            Order-->>Orchestrator: Success
            Orchestrator-->>Client: Success: Order Completed
        end
    end

3. Low-Level Design (LLD) & Data Models

Because NoSQL databases lack cross-node transactional locks, every local step in a Saga must write its state transactionally inside its local boundaries.

Database Schema (PostgreSQL/NoSQL DDL)

To track the state of the Saga Orchestrator in an orchestration-based saga, we design a resilient, state-store table. This store must be transactional to prevent double-execution of steps:

-- DDL for the Saga State Store (Orchestrator Database)
CREATE TABLE saga_instances (
    saga_id UUID PRIMARY KEY,
    type VARCHAR(100) NOT NULL,
    current_status VARCHAR(50) NOT NULL, -- STARTED, PAYMENT_SUCCESS, COMPENSATING, FAILED, COMPLETED
    payload JSONB NOT NULL,
    retry_count INT DEFAULT 0,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE saga_steps (
    step_id UUID PRIMARY KEY,
    saga_id UUID REFERENCES saga_instances(saga_id) ON DELETE CASCADE,
    step_name VARCHAR(100) NOT NULL, -- CREATE_ORDER, AUTHORIZE_PAYMENT, RESERVE_INVENTORY
    status VARCHAR(50) NOT NULL, -- PENDING, SUCCESS, FAILED, COMPENSATED
    action_payload JSONB,
    compensation_payload JSONB,
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);

Compilable TypeScript Saga Orchestrator State Machine

Here is a complete, compilable TypeScript implementation of a resilient Command/Compensation Saga Coordinator using a simplified state-machine interface:

type StepResult = { success: boolean; data?: any; error?: string };

interface SagaStep {
  name: string;
  execute: (payload: any) => Promise<StepResult>;
  compensate: (payload: any) => Promise<StepResult>;
}

export class SagaOrchestrator {
  private steps: SagaStep[] = [];
  private completedSteps: SagaStep[] = [];

  constructor(steps: SagaStep[]) {
    this.steps = steps;
  }

  public async run(initialPayload: any): Promise<{ success: boolean; history: string[] }> {
    const history: string[] = [];
    let currentPayload = { ...initialPayload };

    for (const step of this.steps) {
      console.log(`[SAGA] Executing Step: ${step.name}`);
      history.push(`EXECUTE:${step.name}`);

      try {
        const result = await step.execute(currentPayload);
        if (result.success) {
          this.completedSteps.push(step);
          currentPayload = { ...currentPayload, ...result.data };
          history.push(`SUCCESS:${step.name}`);
        } else {
          console.error(`[SAGA] Step ${step.name} failed. Initiating rollbacks.`);
          history.push(`FAIL:${step.name} - ${result.error}`);
          await this.rollback(currentPayload, history);
          return { success: false, history };
        }
      } catch (err: any) {
        console.error(`[SAGA] Critical exception in step ${step.name}:`, err.message);
        history.push(`EXCEPTION:${step.name}`);
        await this.rollback(currentPayload, history);
        return { success: false, history };
      }
    }

    console.log("[SAGA] Saga completed successfully!");
    return { success: true, history };
  }

  private async rollback(payload: any, history: string[]): Promise<void> {
    // Execute compensating transactions in reverse order
    for (let i = this.completedSteps.length - 1; i >= 0; i--) {
      const step = this.completedSteps[i];
      console.log(`[SAGA] [ROLLBACK] Compensating: ${step.name}`);
      history.push(`COMPENSATE:${step.name}`);

      let retries = 3;
      while (retries > 0) {
        try {
          const compResult = await step.compensate(payload);
          if (compResult.success) {
            history.push(`COMPENSATED:${step.name}`);
            break;
          }
        } catch (err: any) {
          console.error(`[SAGA] [ALERT] Compensation ${step.name} failed. Retries remaining: ${retries - 1}`);
        }
        retries--;
        if (retries === 0) {
          // Send to Dead-Letter Queue (DLQ) for manual operations review
          console.error(`[SAGA] [CRITICAL] Compensation failed for ${step.name}. Pushed to DLQ.`);
          history.push(`DLQ_TRIGGERED:${step.name}`);
        }
      }
    }
  }
}

4. Scaling Challenges & Estimations

When designing Sagas at high throughput (e.g., 50,000 requests per second), back-of-the-envelope estimations dictate network capacity and queue partitions.

Back-of-the-Envelope Math

Throughput: $50,000 \text{ checkout requests/sec}$.
Payload size: Average message envelope is $1 \text{ KB}$.
Message amplification:
- Under Choreography, each checkout requires 6 events (Order Created $\to$ Payment Authorized $\to$ Stock Reserved $\to$ Completed $\to$ plus confirmations).
- $50,000 \times 6 = 300,000 \text{ events/sec}$.
- Bandwidth requirement: $300,000 \text{ events/sec} \times 1 \text{ KB} = 300,000 \text{ KB/sec} \approx 300 \text{ MB/s}$ or $2.4 \text{ Gbps}$ egress/ingress across Kafka clusters.
Storage requirements: If logging Saga steps for auditability:
- $50,000 \text{ Sagas/sec} \times 1 \text{ KB} = 50 \text{ MB/sec}$.
- $50 \text{ MB/sec} \times 86,400 \text{ sec/day} \approx 4.32 \text{ TB of state logs per day}$.
- Cassandra Sharding Strategy: We partition saga_steps table on saga_id hash ring to ensure linear scale without hot hotspots.

5. Architectural Trade-offs: Choreography vs. Orchestration

Dimension	Choreography (Event-Driven)	Orchestration (Command-State)
Coupling	Extremely Low: Services only know about the event broker and their local schemas.	Medium: Services must expose command routes/listeners defined by the orchestrator.
Complexity	High at scale: Hard to visualize the entire transaction path without distributed tracing tools.	Low: The central orchestrator logs the state machine trace explicitly.
Bottlenecks	None: Scaled out easily via message partitions.	High: Centralized orchestrator can create CPU bottlenecks or single points of failure.
Cycle Risk	High: Cyclic dependencies can emerge if service A emits an event that service B uses to trigger service A.	None: The execution path is explicitly controlled.

6. Failure Scenarios, Resiliency & Mitigations

Sagas do not support Isolation (the "I" in ACID) because local transactions commit immediately. This creates three critical failure vectors:

A. Lost Updates & Dirty Reads

Saga $T_1$ updates data. Saga $T_2$ reads the uncommitted data, or overwrites it. If $T_1$ subsequently fails and issues rollbacks, $T_2$'s calculations will be based on dirty data.

Mitigation (Semantic Locks): Add a status field to the database rows (e.g., status: 'PENDING_RESERVATION'). Downstream queries must filter out or block operations on records with pending locks until the Saga resolves.

B. Compensation Failures

What happens if the inventory reservation succeeds, payment succeeds, but the final shipping coordinator fails, and during rollback, the RefundPayment compensation fails?

Mitigation (Idempotency and DLQs): Every compensating service must be strictly idempotent (handling identical retries with no side-effects). If a compensation cannot succeed after $N$ backoff retries, the orchestrator halts execution, generates alerts, and moves the Saga trace to a Dead Letter Queue (DLQ) for manual operator remediation.

C. Out-of-Order Events

Under heavy load, a Kafka cluster partition recovery might deliver a PaymentFailed event before the OrderCreated processing is complete.

Mitigation: Implement event sequence tracking versions inside the saga instance payload. Drop or delay events whose prior sequence numbers are not yet accounted for in the local database stage.

7. Staff Engineer Perspective: Production Trade-offs

Communal Lock Strategy

To maintain the illusion of isolation without blocking threads, Staff Engineers design Commutative Operations. If your transaction is additive (like adding money to a ledger or deducting inventory count), design the database updates as decrement operations (UPDATE inventory SET stock = stock - 5 WHERE item_id = X AND stock >= 5) rather than setting an absolute value. This removes the need for locking, allowing multiple Sagas to proceed concurrently.

8. Verbatim Mock Interview Script

Interviewer Dialogue

Interviewer: "You've designed a Saga pattern for e-commerce checkouts. Sagas don't have isolation. How do you prevent a customer from buying inventory that another customer has reserved but hasn't paid for yet?"
Candidate: "This is a classic 'dirty read' problem caused by Sagas lacking isolation. Sagas commit local database updates immediately. If Customer A reserves 5 items and Customer B searches the stock, Customer B might see those 5 items as available or see them disappear, only for Customer A's checkout to fail and inventory to rollback.

To solve this at high throughput, I avoid pessimistic database locks. Instead, I implement Semantic Locks utilizing an explicit field state on our inventories. The reserveInventory step decrements the available_stock column but increments a pending_reservations_stock column inside a local atomic SQL update. Customer searches only calculate buyable stock using the formula available_stock - pending_reservations_stock.

If Customer A's payment succeeds, the orchestrator issues a finalize command that decrements the pending_reservations_stock and sets the items as permanently sold. If payment fails, the compensating transaction runs, decrementing pending_reservations_stock and adding the count back. This ensures no other customer can claim those 5 items during the 30-second checkout window, while preventing physical thread blocks."

Interviewer: "What happens if the message broker goes down right after the order service updates the state, but before emitting the message to the payment service?"
Candidate: "We must prevent 'lost updates' by enforcing the Transactional Outbox Pattern. The Order Service does not write to the DB and publish to Kafka sequentially in application memory. Instead, we write the Order record and an OutboxEvent record into our database under a single local database transaction.

We then run a lightweight CDC daemon, such as Debezium, which continuously monitors our database binlog. The moment the transaction commits, Debezium streams the event out of the transaction log directly to Kafka. If Kafka goes down, the event remains safely stored in the database's write-ahead log (WAL) and outbox table, waiting to be sent the moment Kafka recovers, guaranteeing 'at-least-once' delivery."