Pagination at Scale
Mental Model
Connecting isolated components into a resilient, scalable, and observable distributed web.
Pagination looks simple until your table has tens of millions of rows, users sort by multiple fields, and product asks for infinite scroll with strict latency SLOs.
Many APIs start with LIMIT ... OFFSET ... because it is easy to implement:
- page 1 ->
LIMIT 20 OFFSET 0 - page 2 ->
LIMIT 20 OFFSET 20 - page 5000 ->
LIMIT 20 OFFSET 99980
This works for small datasets, but offset-based pagination becomes expensive and unstable at scale.
Why OFFSET gets slower over time
graph LR
Producer[Producer Service] -->|Publish Event| Kafka[Kafka / Event Bus]
Kafka -->|Consume| Consumer1[Consumer Group A]
Kafka -->|Consume| Consumer2[Consumer Group B]
Consumer1 --> DB1[(Primary DB)]
Consumer2 --> Cache[(Redis)]
The database still needs to walk past skipped rows before returning the page.
Even with an index, OFFSET 100000 often means:
- scan index entries for 100,020 rows
- discard the first 100,000
- return only 20
That extra work creates:
- higher CPU and I/O on DB nodes
- bigger latency tail (
p95andp99) - noisy-neighbor risk for other workloads
In other words, deep pagination is not "20 rows of work", it is often "100,020 rows of work".
Correctness issue with OFFSET: drifting pages
Offset pages can duplicate or miss records if new rows are inserted between requests.
Example:
- User loads page 1 sorted by
created_at DESC - New rows arrive
- User requests page 2 with
OFFSET 20
Now some items that were at positions 19-25 shifted, causing overlap or gaps. For user-facing feeds, this feels broken.
Keyset pagination (cursor pagination)
Keyset pagination uses the last seen sort key as the anchor for the next query, instead of counting rows to skip.
If sorted by (created_at DESC, id DESC):
- first page:
WHERE true ORDER BY created_at DESC, id DESC LIMIT 20 - next page:
WHERE (created_at, id) < (:last_created_at, :last_id) ORDER BY created_at DESC, id DESC LIMIT 20
The DB can jump directly to the correct range through the index.
Benefits of keyset pagination
- Near-constant query cost across pages
- Better cache and index locality
- More stable user experience under concurrent writes
- Lower latency variance for very large datasets
Choosing stable sort keys
A keyset strategy is only as good as your ordering key.
Good rules:
- Use immutable or rarely changed columns for cursor position
- Make ordering deterministic with a tiebreaker (usually
id) - Match index order to query order exactly
Bad example:
- sorting only by
updated_at(not unique) without tiebreaker
Good example:
- sorting by
updated_at DESC, id DESC - cursor carries both values
Index design for keyset queries
If query is:
WHERE tenant_id = ? AND (created_at, id) < (?, ?) ORDER BY created_at DESC, id DESC LIMIT 50
Then index should typically be:
(tenant_id, created_at DESC, id DESC)
Without index alignment, keyset pagination loses most of its performance advantage.
Cursor format and API contract
Do not expose raw DB internals directly. Use an opaque cursor token:
- JSON payload with sort keys + filter hash + optional expiry
- base64url encode it
- sign it (HMAC) to prevent tampering
A response format:
items: current pagenextCursor: token for next pagehasMore: boolean
For reverse navigation, add prevCursor with inverted query direction.
Handling filters and multi-tenant data
Cursors must be bound to filter context. If filter changes, old cursor should be invalid.
Include in cursor metadata:
- tenant/account scope
- search/filter params checksum
- sort mode
If mismatch is detected, return a clear validation error and ask client to restart from first page.
Keyset drawbacks and trade-offs
Keyset is not perfect:
- No direct "jump to page 237" semantics
- Cursor complexity is higher than offset
- Frontend needs to treat pages as a sequence, not absolute page numbers
For admin panels that need random page jumps over small datasets, offset can still be acceptable.
Hybrid strategy used in production
A practical approach:
- Offset for shallow pages (e.g., first 5 pages) in low-volume backoffice UIs
- Keyset for user feeds, transaction history, logs, and high-volume APIs
For analytics-like experiences that need random access, consider a search engine or materialized snapshots instead of deep offset scanning.
Migration plan: OFFSET to keyset
- Keep existing endpoint behavior unchanged
- Add optional cursor parameters (
cursor,limit) - Return both
nextCursorand legacy pagination metadata temporarily - Migrate clients incrementally
- Remove deep offset usage after adoption
Measure:
- query plan changes (
EXPLAIN ANALYZE) p95/p99latency- DB CPU and buffer cache hit ratio
Failure modes to guard against
- Non-deterministic ordering: duplicates across pages
- Mutable sort fields: item jumps between pages
- Unsigned cursors: tampering and data leakage
- Cursor not tied to filters: inconsistent results
- Missing tiebreaker: skipped rows
Practical decision rule
Use offset when:
- dataset is small
- traffic is low
- random page jumps are required
Use keyset when:
- data volume or write rate is high
- feed/history experience matters
- strict latency SLOs exist
At scale, pagination is not just API design; it is a database cost-control mechanism.
Engineering Standard: The "Staff" Perspective
In high-throughput distributed systems, the code we write is often the easiest part. The difficulty lies in how that code interacts with other components in the stack.
1. Data Integrity and The "P" in CAP
Whenever you are dealing with state (Databases, Caches, or In-memory stores), you must account for Network Partitions. In a standard Java microservice, we often choose Availability (AP) by using Eventual Consistency patterns. However, for financial ledgers, we must enforce Strong Consistency (CP), which usually involves distributed locks (Redis Redlock or Zookeeper) or a strictly linearizable sequence.
2. The Observability Pillar
Writing logic without observability is like flying a plane without a dashboard. Every production service must implement:
- Tracing (OpenTelemetry): Track a single request across 50 microservices.
- Metrics (Prometheus): Monitor Heap usage, Thread saturation, and P99 latencies.
- Structured Logging (ELK/Splunk): Never log raw strings; use JSON so you can query logs like a database.
3. Production Incident Prevention
To survive a 3:00 AM incident, we use:
- Circuit Breakers: Stop the bleeding if a downstream service is down.
- Bulkheads: Isolate thread pools so one failing endpoint doesn't crash the entire app.
- Retries with Exponential Backoff: Avoid the "Thundering Herd" problem when a service comes back online.
Critical Interview Nuance
When an interviewer asks you about this topic, don't just explain the code. Explain the Trade-offs. A Staff Engineer is someone who knows that every architectural decision is a choice between two "bad" outcomes. You are picking the one that aligns with the business goal.
Performance Checklist for High-Load Systems:
- Minimize Object Creation: Use primitive arrays and reusable buffers.
- Batching: Group 1,000 small writes into 1 large batch to save I/O cycles.
- Async Processing: If the user doesn't need the result immediately, move it to a Message Queue (Kafka/SQS).
Technical Trade-offs: Messaging Systems
| Pattern | Ordering | Durability | Throughput | Complexity |
|---|---|---|---|---|
| Log-based (Kafka) | Strict (per partition) | High | Very High | High |
| Memory-based (Redis Pub/Sub) | None | Low | High | Very Low |
| Push-based (RabbitMQ) | Fair | Medium | Medium | Medium |
Key Takeaways
- page 1 ->
LIMIT 20 OFFSET 0 - page 2 ->
LIMIT 20 OFFSET 20 - page 5000 ->
LIMIT 20 OFFSET 99980
Read Next
- System Design: Designing an Ad Click Aggregator
- Service Mesh Internals: How Envoy and Istio Manage the Mesh
- Distributed Garbage Collection: Managing References Across Networks
Verbal Interview Script
Interviewer: "How would you ensure high availability and fault tolerance for this specific architecture?"
Candidate: "To achieve 'Five Nines' (99.999%) availability, we must eliminate all Single Points of Failure (SPOF). I would deploy the API Gateway and stateless microservices across multiple Availability Zones (AZs) behind an active-active load balancer. For the data layer, I would use asynchronous replication to a read-replica in a different region for disaster recovery. Furthermore, it's not enough to just deploy redundantly; we must protect the system from cascading failures. I would implement strict timeouts, retry mechanisms with exponential backoff and jitter, and Circuit Breakers (using a library like Resilience4j) on all synchronous network calls between microservices."