API Design: REST vs GraphQL vs gRPC — When to Use Each

Mental Model

Deciding on an API protocol dictates the networking efficiency and runtime boundaries of your system. REST provides standardized stateless resources for public clients; GraphQL empowers frontend devices to request precise nested schemas and avoid over-fetching; gRPC leverages binary serialization over multiplexed HTTP/2 streams to maximize low-latency, high-throughput inter-service microservice communications.

Requirements and System Goals

When designing an enterprise integration architecture, we must define specific operational requirements and budgets for network communication protocols.

1. Functional Requirements

Standardized Public Integration: Expose public APIs that are universally compatible with third-party web browsers and external partners without requiring specialized SDKs.
Dynamic Frontend Data Querying: Allow frontend applications (web, iOS, Android) to fetch custom structured hierarchies (e.g. user profile and user posts) in a single request.
Low-Latency Microservice Communication: Drive high-performance RPC calls between backend internal microservices with strict schema guarantees.

2. Non-Functional Requirements & Performance Budgets

Ultra-Low Inter-Service Latency: Microservice RPC communications must return in less than 10ms P99 latency.
Minimal Serialization Overhead: Payload compression and binary translation must be optimized to prevent CPU bottlenecks at peak throughput.
Network Bandwidth Conservation: Reduce payload size over WAN links to prevent network saturation and lower egress costs.

API Interfaces and Service Contracts

We define equivalent contracts for retrieving a catalog of products under all three paradigms.

1. REST API Contract

GET /api/v1/products?category=electronics

Response Payload (200 OK):

{
  "products": [
    {
      "productId": "prod_1002",
      "name": "Noise-Cancelling Headphones",
      "price": 199.99,
      "inStock": true
    }
  ]
}

2. GraphQL Schema Query

POST /graphql

Request Payload:

{
  "query": "query GetProductCatalog { products(category: \"electronics\") { name price } }"
}

Response Payload (200 OK):

{
  "data": {
    "products": [
      {
        "name": "Noise-Cancelling Headphones",
        "price": 199.99
      }
    ]
  }
}

3. gRPC Protobuf Contract

We declare the Proto3 schema defining our request and response structures.

syntax = "proto3";

package product.v1;

service ProductService {
  rpc GetProducts (GetProductsRequest) returns (GetProductsResponse);
}

message GetProductsRequest {
  string category = 1;
}

message Product {
  string product_id = 1;
  string name = 2;
  double price = 3;
  bool in_stock = 4;
}

message GetProductsResponse {
  repeated Product products = 1;
}

High-Level Design and Visualizations

Each protocol manages network channels and routing pathways differently at the transport layer.

graph TD
    subgraph REST Pattern [HTTP/1.1 REST Resource]
        Client_A[Client Browser] -->|GET /products/1| RB_1[Rest Router]
        RB_1 -->|Fetch JSON Payload| DB_1[(Database)]
    end

    subgraph GraphQL Pattern [HTTP/1.1 GraphQL Unified Entry]
        Client_B[Mobile Client] -->|POST /graphql Query| GW[GraphQL Gateway]
        GW -->|Parse AST & Resolve| DB_2[(Database)]
    end

    subgraph gRPC Pattern [HTTP/2 multiplexed RPC Stream]
        Client_C[Backend Microservice] -->|Binary Multiplexed Stream| gRPC_Svc[gRPC Server]
        gRPC_Svc -->|Resolve Proto Buf| DB_3[(Database)]
    end

HTTP/2 Multiplexed TCP Pipeline

Under traditional HTTP/1.1 (used by REST and GraphQL), every request requires a separate TCP connection or is executed sequentially. gRPC uses HTTP/2 multiplexing, allowing multiple concurrent requests to share a single TCP connection.

sequenceDiagram
    autonumber
    participant Client as gRPC Client
    participant TCP as Single Shared TCP Socket
    participant Server as gRPC Server

    Client->>TCP: Stream 1 (Header: Request /products)
    Client->>TCP: Stream 3 (Header: Request /users)
    Client->>TCP: Stream 1 (Data: Frame chunk)
    Client->>TCP: Stream 3 (Data: Frame chunk)
    TCP->>Server: Multiplexed binary frames processed
    Server-->>TCP: Stream 1 Response (Finalized payload)
    Server-->>TCP: Stream 3 Response (Finalized payload)
    TCP-->>Client: Streams resolved concurrently

Low-Level Design and Schema Strategies

To understand why gRPC binary serialization is highly efficient, we analyze the low-level byte representation of a product payload compared to JSON text format.

1. Serialized Payload layout: JSON vs. Protobuf

Assume a single record: {"productId": "prod_1002", "price": 199.99}.

A. JSON Representation (Text-based)

JSON raw string: {"productId":"prod_1002","price":199.99}
Storage cost: Each character occupies $1$ byte. The string has $41$ characters, resulting in $41$ bytes of text metadata overhead. The keys ("productId", "price") are repeated inside every single array entry.

B. Protobuf Representation (Binary-based)

Protobuf uses tag-value varint encoding, stripping key text completely.
Layout:
- Field Tag $1$ (for product_id): 0x0A (1 byte)
- Value length: 0x09 (1 byte)
- Value ("prod_1002"): 70 72 6F 64 5F 31 30 30 32 (9 bytes)
- Field Tag $3$ (for price): 0x19 (1 byte)
- Value (199.99 double): 9A 99 99 99 99 9F 68 40 (8 bytes)
Total size: $20$ bytes.
Compression Efficiency: Protobuf reduces the payload size by greater than 50% immediately, removing key string repetition completely.

C. Protocol Buffer Varint (Variable-Width Integer) Encoding Mechanics

To achieve extreme serialization density, Protocol Buffers represent numerical integers dynamically using Varints.

The Sizing Problem: Traditional programming languages allocate a fixed $4$ bytes ($32$ bits) for standard integer fields. If a database record holds a small value (e.g., status_code = 3), the memory still transmits $3$ bytes of empty zeros: 00000000 00000000 00000000 00000011.
The Varint Solution: Varint uses the most significant bit (MSB)—known as the continuation bit—to signal whether the next byte is part of the same number. Each byte stores only $7$ bits of numerical payload. For small numbers less than $128$, the continuation bit is set to 0, meaning the entire number is compressed and represented in exactly $1$ byte instead of $4$ bytes. This dynamic variable-width parsing reduces integer metadata transmission sizes across WAN interconnects by up to $75%$ for typical low-value records.

Scaling and Operational Challenges

1. Sequential Serialization Bandwidth Calculations

To demonstrate the network savings of gRPC binary payloads over WAN connections, we compare egress bandwidth requirements at scale.

Volume: A catalog service processes $10,000$ product records per query at peak traffic of $10,000$ queries/second.
Payload Sizing:
- Average REST JSON catalog response size = 1.5 Kilobytes (KB).
- Average gRPC Protobuf catalog response size = 0.3 Kilobytes (KB).
Egress Bandwidth Calculations:
- REST JSON Bandwidth: $$10,000 \text{ queries/s} \times 1.5 \text{ KB} = 15,000 \text{ KB/s} = 15 \text{ MB/s}$$ $$\text{Convert to network bits (Mbps): } 15 \text{ MB/s} \times 8 \text{ bits/byte} \approx 120 \text{ Mbps}$$
- gRPC Protobuf Bandwidth: $$10,000 \text{ queries/s} \times 0.3 \text{ KB} = 3,000 \text{ KB/s} = 3 \text{ MB/s}$$ $$\text{Convert to network bits (Mbps): } 3 \text{ MB/s} \times 8 \text{ bits/byte} \approx 24 \text{ Mbps}$$
- The Mathematical Benefit: gRPC binary serialization saves $96 \text{ Mbps}$ of network bandwidth at peak, cutting WAN egress costs by $80%$ while reducing local CPU serialization processing cycles.

2. HTTP/2 Multiplexing head-of-line Blocking

Although HTTP/2 multiplexing allows concurrent request pipelines over a single TCP connection, it introduces a new bottleneck.

The Challenge: If a single TCP packet is lost in the network, TCP halts all streams on that connection while waiting for the missing packet to be re-transmitted. This means all multiplexed gRPC requests on that socket suffer a sudden latency spike (TCP Head-of-Line blocking).
Staff Solution: Group services into logical connection pools, and monitor packet loss thresholds. For extremely high-throughput paths, evaluate gRPC over HTTP/3 (QUIC), which uses UDP to allow independent stream packet recovery, completely bypassing TCP head-of-line blocking.

Architectural Trade-offs and Protocol Decisions

Selecting the right protocol is a balance between client flexibility, transport efficiency, and developer integration costs.

Operational Dimension	REST (JSON / HTTP/1.1)	GraphQL (JSON / HTTP/1.1)	gRPC (ProtoBuf / HTTP/2)
Data Format	Text-based JSON	Text-based JSON	Binary (Protocol Buffers)
Query Flexibility	Fixed resources	Absolute (Client defines fields)	Fixed RPC methods
Multiplexed Streams	No (Requires connection pools)	No (Requires connection pools)	Yes (Single TCP Connection)
Public Browser Compatibility	Excellent (Universal native support)	Excellent (Universal native support)	Poor (Requires gateway proxies like gRPC-Web)
Schema enforcement	Weak (Implicit agreements)	Strong (Active schema validations)	Strict (Compile-time contract compile)

Failure Modes and Fault Tolerance Strategies

1. gRPC Connection Pool Starvation

Because gRPC streams run over a single TCP connection, long-running streaming queries or heavy slow requests can saturate the HTTP/2 maximum concurrent streams setting (typically capped at $100$ per connection).

The Failure: New client calls block indefinitely or fail immediately with HTTP/2 stream errors.
Staff Mitigation: Configure the gRPC client proxy to support dynamic Connection Pooling. Once active concurrent streams reach $80%$ of max settings, spin up a new underlying TCP socket connection dynamically, spreading the request load.

2. GraphQL Cascading Memory Starvation

GraphQL allows clients to request massive nested joins (e.g. fetching users, their posts, their comments, and those comments' authors in one query).

The Danger: When resolved, the combined JSON payload grows to megabytes of text data. Parsing and generating this single massive JSON response in Node.js or Java block event loops, saturating server memory, and causing immediate server crashes.
Mitigation: Implement Query Cost Limits and strict timeout boundaries at the gateway. Assign cost scores to fields, reject queries exceeding limits, and implement Cursor-Based Pagination strictly for all array resources.

Staff Engineer Perspective

Production Readiness Checklist

Before moving an API architecture to production:

Protobuf schema compiled: Downstream client packages use compiled static code to prevent runtime string parsing overhead.
HTTP/2 stream limits configured: Gateway servers set maximum concurrent stream thresholds to at least $100$ per connection.
gRPC-Web proxy active for UI: If frontend web browsers consume gRPC, Envoy proxy mappings are configured.
Cursor-based Pagination enforced: Public GraphQL and REST collections reject unpaginated bulk lookups.

Verbal Script

Interviewer: "If you were designing a global catalog system with mobile frontend clients and a dense mesh of backend microservices, how would you choose between REST, GraphQL, and gRPC?"

Candidate: "To design a resilient, high-performance global catalog system, I would adopt a Hybrid Protocol Strategy, matching each protocol to the specific boundaries of the system architecture.

For our public frontend mobile and web clients, I would implement GraphQL as the edge coordination layer. Mobile clients operate under highly unpredictable cellular networks. GraphQL solves the classic over-fetching problem by allowing the client to request precisely the product catalog fields they need. This keeps mobile payloads tiny, conserving device battery and mobile bandwidth.

However, for our dense backend microservice-to-microservice mesh, I would completely avoid GraphQL and REST, and standardize strictly on gRPC.

Internal microservices demand ultra-low latency. gRPC uses binary Protocol Buffers which serialize data into compact byte packages, completely bypassing the massive metadata string overhead of JSON text. Our capacity math shows that serializing catalog records in Protobuf binary reduces payload sizes by greater than 50% and saves 80% of WAN egress bandwidth at peak.

Furthermore, gRPC communicates over HTTP/2 multiplexed streams. This allows hundreds of backend requests to share a single TCP connection, eliminating the constant latency overhead of setting up and tearing down TCP connections.

Finally, gRPC enforces compile-time schema contracts. We compile our ProtoBuf files directly into static client SDKs. This prevents runtime serialization failures, ensuring high integration stability across our microservice teams."