Pagination looks simple until your database table has tens of millions of rows, users sort by multiple non-unique columns concurrently, and the product team demands a seamless infinite scroll with strict sub-$50\text{ ms}$ latency objectives.
Many APIs start with Offset-Based Pagination because it is easy to implement and fits standard SQL logic perfectly. However, offset pagination becomes expensive, unstable, and operationally dangerous at scale. In this article, we analyze why deep offset scans saturate database CPU and IOPS, deconstruct the mechanics of Keyset-Based Pagination (Cursor Pagination), design secure opaque cursor contracts, and implement a production-ready signed cursor engine in Java.
System Requirements and Goals
Designing pagination mechanics for massive, high-throughput datasets requires defining strict operational bounds.
1. Functional Requirements
- Consistent Deep Traversal: Users must be able to scroll infinitely through millions of activity log lines, product catalogs, or news feeds without performance degradation.
- Deterministic Results (Drift Resiliency): Prevent duplicates or skipped records when items are concurrently inserted or deleted at the head of the list during traversal.
- Secure Tokenization: Exclude database details (like primary keys or raw timestamps) from public exposure, protecting internal schema layouts.
2. Non-Functional Requirements
- Predictable Execution Time: Query latencies must remain constant ($O(1)$ or $O(\log N)$) across all pages, whether loading the first page or the $10,000$th page.
- Database Efficiency: Eliminate full table scans and primary key index dereferences, protecting buffer pools from page pollution under deep scrolls.
- Tamper Proofing: Cursors must be cryptographically signed, preventing users from altering page states or scraping data volumes.
High-Level Design Architecture
To understand the core database bottleneck, we must look at how the storage engine (such as InnoDB) handles page traversals.
1. The Bottleneck: Offset-Based Deep Scan
When running LIMIT 20 OFFSET 100000, the database cannot simply jump to the $100,000$th row. It must scan through $100,020$ index entries, perform primary key page lookups on disk to fetch the data rows, and then discard the first $100,000$ rows, returning only $20$:
graph TD
%% Define Nodes
subgraph "Offset Scan Flow (O(N) Complexity)"
StartOffset[Client requests OFFSET 100000 LIMIT 20]
IndexScanOffset[Database reads 100020 index nodes]
DiskLookupOffset[Fetch 100020 records from physical disk]
DiscardOffset[Discard first 100000 records in memory]
ReturnOffset[Return remaining 20 records to Client]
StartOffset --> IndexScanOffset
IndexScanOffset --> DiskLookupOffset
DiskLookupOffset --> DiscardOffset
DiscardOffset --> ReturnOffset
end
%% Styling
classDef offset fill:#e74c3c,stroke:#fff,stroke-width:1px,color:#fff;
class IndexScanOffset,DiskLookupOffset,DiscardOffset offset;
2. The Solution: Keyset-Based Index Jump
In Keyset Pagination, the client passes a secure cursor token carrying the sort values of the last seen item. The database uses these values as an anchor, executing an index-only range search to jump directly to the target rows:
graph TD
%% Define Nodes
subgraph "Keyset Scan Flow (O(log N) Complexity)"
StartKeyset[Client requests page with Cursor: JSON]
AppServer[App Server Decrypts & Decodes Cursor]
IndexSearch[Database executes Index-Range query: WHERE created_at < anchor]
IndexJump[Jump directly to target index node via B+Tree]
DiskLookupKeyset[Fetch exactly 20 records from physical disk]
ReturnKeyset[Return 20 records + signed nextCursor token]
StartKeyset --> AppServer
AppServer --> IndexSearch
IndexSearch --> IndexJump
IndexJump --> DiskLookupKeyset
DiskLookupKeyset --> ReturnKeyset
end
%% Styling
classDef keyset fill:#2ecc71,stroke:#fff,stroke-width:1px,color:#fff;
class IndexJump,DiskLookupKeyset keyset;
API Design and Interface Contracts
To avoid leaking database architecture patterns, public API contracts must expose cursors as opaque, encrypted strings.
1. Opaque Cursor Payload Structure (Decrypted Internal JSON)
Internally, the cursor token encapsulates the precise database keys needed to anchor the next SQL query:
{
"last_value": 1713549200000,
"last_id": 902819401,
"filter_hash": "a4f21b90",
"direction": "next",
"expires_at": 1713635600000
}
2. Paginated Response JSON Schema (GET /v1/feed)
The client receives a clean list of items along with the next opaque cursor string:
{
"items": [
{
"id": 902819401,
"title": "Scaling System Design",
"createdAt": "2026-05-23T02:30:00.000Z"
}
],
"pagination": {
"nextCursor": "eyJ1cGRhdGVkX2F0IjoxNzEzNTQ5MjAwMDAwLCJpZCI6OTA4MTJ9.a1b2c3d4e5f6...",
"hasMore": true
}
}
Low-Level Design & Component Mechanics
1. The Clustered Primary Index Lookup Penalty
In engines like MySQL's InnoDB, secondary indexes (e.g., INDEX (created_at)) do not store the physical disk address of the row. Instead, they store the Primary Key value.
When executing a query:
SELECT * FROM orders ORDER BY created_at LIMIT 20 OFFSET 100000;
For every single row scanned, the database must:
- Traverse the secondary index B+Tree to locate the
created_atvalue. - Retrieve the Primary Key value (e.g.,
id). - Traverse the clustered primary index B+Tree to fetch the complete data row from disk (a process called clustered index lookups).
- Discard the row.
At OFFSET 100000, this forces $100,000$ random disk reads, saturating storage IOPS and creating a severe performance bottleneck.
2. Signed Opaque Cursor Engine in Java
Below is a highly secure, compilable Java class that serializes cursor states to JSON, signs them using HMAC-SHA256 to prevent client tampering, and encodes them using Base64URL.
package com.codesprintpro.pagination;
import com.fasterxml.jackson.databind.ObjectMapper;
import javax.crypto.Mac;
import javax.crypto.spec.SecretKeySpec;
import java.nio.charset.StandardCharsets;
import java.security.InvalidKeyException;
import java.security.NoSuchAlgorithmException;
import java.util.Base64;
import java.util.HashMap;
import java.util.Map;
public class SecureCursorEngine {
private static final String HMAC_ALGORITHM = "HmacSHA256";
private static final ObjectMapper OBJECT_MAPPER = new ObjectMapper();
private final byte[] secretKeyBytes;
public SecureCursorEngine(String secretKey) {
this.secretKeyBytes = secretKey.getBytes(StandardCharsets.UTF_8);
}
/**
* Serializes and signs database cursor values into a secure opaque string.
*/
public String generateCursor(long lastValue, long lastId, String filterHash) throws Exception {
Map<String, Object> payload = new HashMap<>();
payload.put("lv", lastValue);
payload.put("li", lastId);
payload.put("fh", filterHash);
payload.put("exp", System.currentTimeMillis() + (86400 * 1000)); // 24hr expiry
String jsonPayload = OBJECT_MAPPER.writeValueAsString(payload);
String encodedPayload = Base64.getUrlEncoder().withoutPadding().encodeToString(jsonPayload.getBytes(StandardCharsets.UTF_8));
String signature = calculateHmac(encodedPayload);
// Append signature to payload
return encodedPayload + "." + signature;
}
/**
* Parses, validates, and decodes an incoming opaque cursor token.
*/
public Map<String, Object> parseCursor(String cursorToken, String currentFilterHash) throws Exception {
String[] parts = cursorToken.split("\\.");
if (parts.length != 2) {
throw new IllegalArgumentException("Invalid cursor format");
}
String encodedPayload = parts[0];
String providedSignature = parts[1];
// 1. Verify Signature to prevent user tampering
String calculatedSignature = calculateHmac(encodedPayload);
if (!calculatedSignature.equals(providedSignature)) {
throw new SecurityException("Cursor signature verification failed: Tampering detected");
}
// 2. Decode payload
byte[] decodedBytes = Base64.getUrlDecoder().decode(encodedPayload);
Map<String, Object> payload = OBJECT_MAPPER.readValue(decodedBytes, Map.class);
// 3. Verify expiration
long expiresAt = ((Number) payload.get("exp")).longValue();
if (System.currentTimeMillis() > expiresAt) {
throw new IllegalArgumentException("Cursor token has expired");
}
// 4. Verify filter hash compatibility
String cursorFilterHash = (String) payload.get("fh");
if (!cursorFilterHash.equals(currentFilterHash)) {
throw new IllegalArgumentException("Cursor filter scope mismatch: filters have changed");
}
return payload;
}
private String calculateHmac(String data) throws NoSuchAlgorithmException, InvalidKeyException {
Mac mac = Mac.getInstance(HMAC_ALGORITHM);
SecretKeySpec secretKeySpec = new SecretKeySpec(secretKeyBytes, HMAC_ALGORITHM);
mac.init(secretKeySpec);
byte[] rawHmac = mac.doFinal(data.getBytes(StandardCharsets.UTF_8));
return Base64.getUrlEncoder().withoutPadding().encodeToString(rawHmac);
}
}
Scaling Challenges & Production Bottlenecks
1. The Drifting Page Concurrency Issue
In high-throughput apps like social media feeds or transaction ledgers, new items are continuously written to the head of the database table.
- The Problem: If a user is paginating through offset-based pages:
- User loads Page 1 (
LIMIT 20 OFFSET 0). - $5$ new items are inserted into the database.
- User loads Page 2 (
LIMIT 20 OFFSET 20). - Because of the $5$ new inserts, the last $5$ items from Page 1 are pushed down, appearing again at the top of Page 2. The user sees duplicate records.
- User loads Page 1 (
- The Keyset Resolution: Keyset pagination is drift-resilient. Because the query is anchored to a specific value (e.g.,
WHERE (created_at, id) < ('2026-05-23 02:30:00', 90281)), new inserts at the head of the table are ignored, ensuring a seamless user experience.
2. Multi-Column Sorting and Tie-Breaker Ordering
If a user sorts by a non-unique column (like first_name), keyset pagination can skip rows during index scans. If ten users share the name "John", and you paginate to the next page using WHERE first_name > 'John', the index scan will skip all other "John" records that were not returned on the first page.
The LLD Solution:
Always introduce an immutable tie-breaker column (typically the primary key id) to guarantee strict, deterministic sorting. Your query index must use a composite structure matching this order:
-- Query using multi-column composite keyset pagination
SELECT * FROM users
WHERE (first_name, id) > ('John', 88201)
ORDER BY first_name ASC, id ASC
LIMIT 20;
- Required Index:
(first_name ASC, id ASC)to enable fast index-only range scans.
Technical Trade-offs & Strategic Compromises
Relational database query costs represent a direct trade-off between random-access capability and read performance at scale.
| Strategy | Pros | Cons | Performance / Complexity Matrix |
|---|---|---|---|
| Offset-Based Pagination | * Supports arbitrary "Jump to Page X" navigation. * Extremely simple to implement across SQL backends. |
* Query latencies degrade linearly ($O(N)$) as pages go deeper. * Highly vulnerable to data drift duplicates. |
* Latency (Shallow): Low * Latency (Deep): Extremely High * Complexity: Zero |
| Keyset-Based Pagination | * Constant-time performance ($O(\log N)$) across deep scans. * Immune to data drift duplicates during active writes. |
* Does not support jumping to arbitrary page numbers (e.g., "Go to page 45"). * High development cost for managing signed cursor states. |
* Latency (Shallow): Low * Latency (Deep): Low (Constant) * Complexity: High |
| Hybrid Strategy | * Low-latency keyset reads for high-volume customer feeds. * Clean offset jumps for admin dashboards. |
* Requires maintaining dual endpoints and codebase paths. | * User Experience: Excellent * Maintenance Cost: Medium-High |
Failure Scenarios and Fault Tolerance
1. Invalidation of Schema/Datatype Cursors
If a database migration changes the data type of the sorting column (e.g., migrating id from INT to BIGINT), cached client-side cursors will fail to parse, throwing exception loops.
Fault-Tolerance Mitigation:
- Wrap cursor parsing in clean exception boundaries. If a decryption or type-conversion error occurs, reject the cursor and fall back to returning Page 1.
- Embed a version signature (
"v": 2) inside the cursor payload. If the application server parses an outdated cursor version, it triggers a graceful reset, guiding the client back to the start of the feed.
2. Scraper Data Leakage (The German Tank Problem)
If cursors expose sequential primary keys (id: 104209), competitors can scrape sequential pages, read the primary IDs, and calculate the exact transaction volume of your platform.
Fault-Tolerance Mitigation:
Always sign and encrypt cursor payloads using symmetric encryption (AES-GCM-256) or verify them via HMAC signatures. This renders the public token string indistinguishable from random characters, securing internal transaction figures.
Staff Engineer Perspective
[!TIP] Apply Cursors to Microservice APIs: Avoid using offset parameters on HTTP calls between internal microservices. If Service A queries Service B using offsets under heavy loads, thread pools will block waiting for deep db scans. Enforce signed, opaque cursors on all internal system integrations to ensure microservice network paths remain fast.
Verbal Script & Mock Interview
Verbal Script: Diagnosing and Fixing Deep Page Latency
Interviewer: "Our high-traffic infinite scroll feed is slowing down. As users scroll deeper, p99 latencies jump from 20ms to over 3 seconds, and database CPU is hitting 98%. How do you diagnose and permanently resolve this problem?"
Candidate: "The symptoms described—namely, query latencies degrading from $20\text{ ms}$ to over $3$ seconds as users scroll deeper—confirm we have a classic database offset pagination bottleneck.
To diagnose this under the hood, I would run EXPLAIN ANALYZE on the query. We will observe that the database is executing an $O(N)$ linear index scan. When the client requests deep offsets (e.g., LIMIT 20 OFFSET 100000), the InnoDB storage engine must traverse $100,020$ index entries, perform $100,020$ random clustered primary index lookups on disk to fetch the full rows, and then discard the first $100,000$ in memory, returning only $20$. This massive I/O overhead saturates the database CPU and pollutes the buffer cache.
To resolve this permanently, I would migrate the API to Keyset-Based Pagination (Cursor Pagination).
Instead of counting rows to skip, the query will use the last seen sorting attributes of the previous page as an anchor. For a feed sorted by date, we establish a deterministic, composite order using a unique tie-breaker column, typically (created_at DESC, id DESC). The SQL query jumps directly to the target rows via a B+Tree index range search in $O(\log N)$ constant time:
SELECT * FROM feed_items
WHERE (created_at, id) < (?, ?)
ORDER BY created_at DESC, id DESC
LIMIT 20;
To support this query efficiently, I would create a composite index on (created_at DESC, id DESC).
On the API layer, I would enforce strict encapsulation. Instead of exposing database columns or raw timestamps to the client—which leaks schema internals and risks competitor scraping—I would implement an Opaque Signed Cursor Engine.
The cursor encapsulates a JSON payload containing the sorting anchor keys, an expiration timestamp, and a cryptographic hash of the query filters. This payload is encrypted and signed using HMAC-SHA256 and Base64URL-encoded before being returned as a nextCursor token.
When the client requests the next page, the server decrypts the cursor, verifies the HMAC signature to prevent client-side tampering, validates that the query parameters match the cursor's filter hash to prevent scope drift, and binds the anchor keys directly to the database query parameters.
This architecture guarantees constant-time sub-50ms database fetches, remains resilient to duplicate items during active writes, and maintains data security under deep scrolls."