Lesson 13 of 105 10 minFlagship

Microservices Are Overrated for Most Startups

A contrarian but technically grounded case for starting with a well-structured monolith. Distributed transaction costs, network latency math, observability overhead, and when to actually break services apart.

Reading Mode

Hide the curriculum rail and keep the lesson centered for focused reading.

Key Takeaways

  • **Tracing (OpenTelemetry)**: Track a single request across 50 microservices.
  • **Metrics (Prometheus)**: Monitor Heap usage, Thread saturation, and P99 latencies.
  • **Structured Logging (ELK/Splunk)**: Never log raw strings; use JSON so you can query logs like a da
Recommended Prerequisites
System Design Interview FrameworkAPI Design & Rate Limiting

Premium outcome

From vague architecture answers to staff-level trade-off thinking.

Backend engineers preparing for senior, staff, and architecture rounds.

What you unlock

  • A reusable system design answer framework for ambiguous prompts
  • Clear language for consistency, scaling, and reliability trade-offs
  • Case-study depth across feeds, payments, storage, and messaging systems

Mental Model

A monolith call is an in-memory function call (~0.1ms); a microservice call is a distributed network request (~5ms). Breaking apart a monolith before you understand your domain bounds is a fast track to distributed spaghetti. The primary scaling axis is organizational (team limits), not purely technical.


Requirements and System Goals

When building an early-stage production platform (e.g., an e-commerce backend), we must establish clear requirements before arguing over modularity boundaries.

1. Functional Requirements

  • User Management: Authentication, profile management, and registration.
  • Order Processing: Creation, state progression (Created, Paid, Shipped), and history retrieval.
  • Payment Processing: Multi-gateway routing, security audits, and settlement checks.

2. Non-Functional Requirements & Performance Budgets

  • Availability: 99.99% availability ("Four Nines"), translating to no more than 52.56 minutes of unscheduled downtime per year.
  • Latency Budget:
    • Public client requests: P99 latency < 200ms.
    • Intra-service (East-West) latency budget: < 10ms.
  • Consistency Constraints: Financial ledgers and payment statuses require strict transactional Strong Consistency (ACID). Catalog indexing can tolerate Eventual Consistency (P99 delay < 2 seconds).

API Interfaces and Service Contracts

To clearly outline service boundaries, we define the exact RESTful API contracts. In a monolith, these are handled by a single unified API controller. In a decoupled microservice setup, these are exposed via an API Gateway.

1. Order Creation Endpoint

POST /api/v1/orders

Request Payload:

{
  "customerId": "usr_8923a10f",
  "items": [
    {
      "productId": "prod_7721",
      "quantity": 2
    }
  ],
  "paymentMethod": "pm_visa_9832"
}

Response Payload (201 Created):

{
  "orderId": "ord_99018274a",
  "status": "CREATED",
  "totalAmount": 149.98,
  "createdAt": "2026-05-29T13:07:00Z"
}

2. Payment Callback Contract

POST /api/v1/payments/webhook

Request Payload:

{
  "paymentId": "pay_8830172d",
  "orderId": "ord_99018274a",
  "amount": 149.98,
  "status": "SUCCESS",
  "timestamp": "2026-05-29T13:07:05Z"
}

High-Level Design and Visualizations

Understanding how these paths resolve visually demonstrates the "Complexity Tax."

All functional modules run within the same application process, sharing a single relational database. Context boundaries are enforced strictly via Java package visibility rather than physical network boundaries.

graph TD
    Client((Client)) --> LB[Load Balancer]
    LB --> Monolith[Unified JVM Process]
    
    subgraph Monolith [Monolith Application Boundary]
        Auth[Auth Module]
        Order[Order Module]
        Pay[Payment Module]
    end
    
    Monolith --> DB[(Single PostgreSQL Instance)]

2. The Distributed Microservices Architecture

Moving to microservices physically separates the modules. Each service now owns its own database, requiring network requests for cross-domain actions.

graph TD
    Client((Client)) --> Gateway[API Gateway]
    
    subgraph Services [Distributed Boundaries]
        Gateway --> AuthS[Auth Service]
        Gateway --> OrderS[Order Service]
        Gateway --> PayS[Payment Service]
    end
    
    AuthS --> AuthDB[(Auth DB)]
    OrderS --> OrderDB[(Order DB)]
    PayS --> PayDB[(Payment DB)]
    
    OrderS -- HTTP/gRPC --> PayS

Low-Level Design and Schema Strategies

The primary failure point of early microservice migrations is database sharing. Let's analyze the schemas for both patterns.

1. The Monolith Database Schema (Unified Schema)

In a monolith, we can easily enforce referential integrity across modules using foreign keys and perform transactional joins.

-- Enforces absolute relational integrity and ACID transitions
CREATE TABLE users (
    id VARCHAR(50) PRIMARY KEY,
    email VARCHAR(100) UNIQUE NOT NULL,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE orders (
    id VARCHAR(50) PRIMARY KEY,
    user_id VARCHAR(50) REFERENCES users(id),
    total_amount NUMERIC(12, 2) NOT NULL,
    status VARCHAR(20) NOT NULL DEFAULT 'CREATED',
    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE payments (
    id VARCHAR(50) PRIMARY KEY,
    order_id VARCHAR(50) REFERENCES orders(id),
    amount NUMERIC(12, 2) NOT NULL,
    status VARCHAR(20) NOT NULL,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);

2. Database-per-Service Schema (Decoupled Services)

Once split, the orders service can no longer check the users table via SQL referential integrity. Foreign key constraints are physically severed.

-- Order Service Database (order_db)
CREATE TABLE orders (
    id VARCHAR(50) PRIMARY KEY,
    user_id VARCHAR(50) NOT NULL, -- Logical reference only! No physical FK.
    total_amount NUMERIC(12, 2) NOT NULL,
    status VARCHAR(20) NOT NULL,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);

-- Payment Service Database (payment_db)
CREATE TABLE payments (
    id VARCHAR(50) PRIMARY KEY,
    order_id VARCHAR(50) NOT NULL, -- Logical reference only!
    amount NUMERIC(12, 2) NOT NULL,
    status VARCHAR(20) NOT NULL,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);

Scaling and Operational Challenges

1. The Network Hop Latency Budget (Mathematical Proof)

Let's calculate the latency math of a single checkout request requiring authorization, item catalog retrieval, shipping rate calculation, and payment processing.

In a Monolith: $$\text{Latency} = T_{\text{auth}} + T_{\text{catalog}} + T_{\text{shipping}} + T_{\text{payment}} + \text{DB Local I/O}$$ $$\text{Latency} \approx 0.1\text{ms} + 0.1\text{ms} + 0.1\text{ms} + 0.1\text{ms} + 2\text{ms} = \mathbf{2.4\text{ms}}$$

In a Microservices environment (assuming standard RPC round-trips of $5\text{ms}$): $$\text{Latency} = \sum (\text{Network RTT}) + \sum (\text{Compute})$$ $$\text{Latency} \approx (4 \times 5\text{ms}) + 2.4\text{ms} = \mathbf{22.4\text{ms}}$$

This is a 9.33x latency tax purely spent on serializing and deserializing JSON payloads over TCP. If you have nested microservice calls (Service A calls B, B calls C, C calls D), this latency balloons exponentially.

2. Distributed Database Connection Exhaustion

In a monolith, a single connection pool (like HikariCP) dynamically balances requests. In microservices, every stateless container maintains its own pool. If you scale your Order Service to 100 Kubernetes pods, and each pod maintains a minimum of 10 connections to your PostgreSQL instance, you will instantly exhaust your database’s file descriptor limits, leading to connection timeout failures across the entire cluster.


Architectural Trade-offs and Consistency Decisions

Deciding when to transition is an exercise in managing organizational scale against operational overhead.

Architectural Dimension Structured Monolith (Modulith) Fully Distributed Microservices
Transaction Model ACID (Local Transaction) Eventual Consistency (Saga/2PC)
Deployment Complexity Low (Single CI/CD artifact) High (Kubernetes, Service Mesh, Gateway)
P99 Latency Profile Extremely Low (In-memory hops) High (Multiple RPC network hops)
Refactoring Speed Fast (IDE rename, local type safety) Slow (API deprecation, schema versions)
Team Scaling Limits Poor (Constant merge conflicts for 50+ devs) Excellent (Complete domain ownership)
Resource Isolation Shared resources (CPU, JVM heap memory) Fine-grained (Scale CPU/RAM per service)

Failure Modes and Fault Tolerance Strategies

In a monolith, failure is binary: the app is either up or down. In microservices, failure is partial, which is far more complex.

1. Cascading Failures and the Retry Storm

If the Payment Service runs slow, the Order Service thread pool will saturate waiting for HTTP responses. Once saturated, Order Service will fail to accept new checkouts, bringing down the catalog page. To prevent this, we must configure strict circuit breakers.

Here is a production-grade Resilience4j configuration in Java to protect our services:

// Configure Circuit Breaker with rate-limiting and timeouts
CircuitBreakerConfig circuitBreakerConfig = CircuitBreakerConfig.custom()
    .failureRateThreshold(50.0f) // Open circuit if 50% of requests fail
    .slowCallRateThreshold(75.0f) // Open if 75% are slower than budget
    .slowCallDurationThreshold(Duration.ofMillis(2000)) // Budget is 2s
    .minimumNumberOfCalls(10)
    .waitDurationInOpenState(Duration.ofSeconds(10)) // Retry after 10s
    .build();

CircuitBreakerRegistry registry = CircuitBreakerRegistry.of(circuitBreakerConfig);
CircuitBreaker paymentBreaker = registry.circuitBreaker("paymentService");

// Execute call with fallback logic
String response = paymentBreaker.executeSupplier(() -> 
    paymentClient.charge(orderRequest)
);

2. Distributed Transactions: Saga Pattern vs. Dual-Writes

When database-per-service is active, you cannot perform an ACID transaction to update an order and a payment simultaneously.

  • The Pitfall of Dual-Writes: Writing to your local SQL DB and then publishing an event to Kafka in the same block is unsafe. If Kafka is down, the database commit succeeds, but the message is lost forever.
  • The Solution: Use the Transactional Outbox Pattern with Debezium to read database WAL logs, ensuring that state updates and event publishing occur atomically.

Saga Orchestration vs. Choreography

When implementing the Saga pattern, you must choose between two primary operational patterns:

  1. Choreography: Each service publishes an event to Kafka after completing its transaction, and other services listen to those events and perform their steps. While simple to implement for 2 or 3 services, it becomes extremely difficult to trace and debug as the system scales, leading to "spaghetti events" where no single developer understands the complete state-machine flow.
  2. Orchestration: A centralized orchestrator class (or service) explicitly defines the state-machine workflow. It issues commands to the individual services and listens for execution responses. If any step fails (e.g., the billing transaction is declined), the orchestrator is responsible for executing Compensating Transactions (e.g., calling cancelOrder or refundWallet) in the reverse order of execution to restore system consistency.

Staff Engineer Perspective

The Modulith Alternative

Before breaking your monolith apart, build a Modulith. Enforce strict package visibility rules in Java. Use ArchUnit tests to prevent the Payment package from directly importing internal classes from the Order package. Ensure all communication between packages goes through clean interfaces. If you can't build clean boundaries in a single repository, you will build a distributed monolith in microservices—which is the absolute worst architectural state.


Production Readiness Checklist

Before signing off on a microservices migration, verify:

  • Distributed Tracing: OpenTelemetry traces are active on all incoming and internal RPC calls.
  • Independent Deployability: Services can be built, tested, and deployed to Kubernetes without releasing other services simultaneously.
  • Contract Versioning: Semantic versioning is enforced on all API schemas; backward-compatible fallback paths are active.
  • Outbox Pattern Active: No dual-writes are used; all message publication is atomic with the primary DB transaction.
  • Circuit Breakers Configured: Timeout thresholds are strictly enforced using Resilience4j or an Envoy sidecar.


Verbal Script

Interviewer: "A startup wants to build their MVP using a microservices architecture to ensure scalability from Day 1. What is your advice?"

Candidate: "I would strongly advise against this. Building an MVP is about validating product-market fit, which requires high agility and constant refactoring of domain boundaries. In a microservices architecture, changing a domain boundary requires migrating multiple distinct databases, modifying API contracts across several services, and managing backward compatibility. This creates massive friction.

Technically, a microservices setup introduces a heavy 'Complexity Tax.' Instead of simple in-memory function calls, you have network serialization costs, distributed transaction problems that require complex Saga orchestrations, and massive observability overhead.

Instead, I would recommend starting with a Structured Monolith (or Modulith). You get the operational simplicity of a single deployable unit and the transactional integrity of a single relational database. By enforcing strict module boundaries in code using package visibility, you keep the architecture clean. Once the startup reaches massive scale and organizational challenges arise—such as having 50+ developers overlapping on commits—only then would I decouple high-volume boundaries into microservices."

Want to track your progress?

Sign in to save your progress, track completed lessons, and pick up where you left off.