Caching Strategies: Redis, CDN, and Invalidation
Mental Model
A low-latency memory buffer protecting your primary data source from read spikes.
Caching is the single most effective way to improve the performance and scalability of a distributed system. By storing frequently accessed data in a fast storage layer (RAM), we reduce the load on our primary databases.
1. Why Cache?
graph LR
Producer[Producer Service] -->|Publish Event| Kafka[Kafka / Event Bus]
Kafka -->|Consume| Consumer1[Consumer Group A]
Kafka -->|Consume| Consumer2[Consumer Group B]
Consumer1 --> DB1[(Primary DB)]
Consumer2 --> Cache[(Redis)]
- Latency: RAM is 100x faster than SSD.
- Throughput: One Redis node can handle 100k+ requests per second.
- Cost: Reducing database hits can significantly lower infrastructure bills.
2. Types of Caching
- Client-side: Local storage in the browser.
- CDN: Content Delivery Networks (Edge caching) for static assets.
- Application-side: In-memory caches like Caffeine or Guava.
- Distributed Cache: External stores like Redis or Memcached shared by all instances.
3. Cache Invalidation Strategies
Cache invalidation is famously one of the two hardest problems in computer science.
- Cache Aside (Lazy Loading): Application checks cache first. If it's a miss, read from DB and write to cache.
- Write-Through: Application writes to cache, and cache synchronously writes to DB.
- Write-Back (Write-Behind): Application writes to cache, and cache asynchronously writes to DB (Risk of data loss).
- TTL (Time to Live): Every key has an expiration time.
4. Common Pitfalls
- Cache Stampede (Thundering Herd): When a popular key expires and thousands of requests hit the DB simultaneously.
- Big Key Problem: Storing a massive JSON object that blocks the Redis event loop.
Final Takeaway
Caching is not a "magic button." It adds complexity and potential data stale-ness. Use it where performance gains outweigh the consistency cost.
The real question is not "how do we make cache perfectly consistent?" The better question is: how stale can this data be, and what happens if it is wrong?
Different data needs different cache strategies. A product description can be stale for minutes. A bank balance cannot. A feature flag might tolerate seconds of staleness. A permission change may need immediate invalidation.
Cache-Aside
Cache-aside is the most common pattern:
public Product getProduct(String productId) {
String key = "product:" + productId;
Product cached = redis.get(key, Product.class);
if (cached != null) {
return cached;
}
Product product = productRepository.findById(productId);
redis.set(key, product, Duration.ofMinutes(10));
return product;
}
The application owns cache population. On a miss, read from the database and put the result into Redis.
Pros:
- simple
- works with any database
- easy to apply per endpoint
Cons:
- first request after expiry is slow
- stale data exists until TTL or explicit eviction
- cache stampede risk on hot keys
TTL-Based Invalidation
TTL is the simplest invalidation strategy:
redis.set("product:" + productId, product, Duration.ofMinutes(10));
Use TTL when:
- data changes infrequently
- some staleness is acceptable
- correctness is not safety-critical
Avoid long TTLs for user-specific permissions, pricing, inventory, or account state unless you have another invalidation mechanism.
TTL should be based on business tolerance, not guesswork:
| Data | Example TTL |
|---|---|
| Static catalog metadata | 30-60 minutes |
| Product price | 1-5 minutes |
| User profile | 5-15 minutes |
| Permissions | seconds or explicit invalidation |
| Account balance | usually avoid caching or use strict invalidation |
Explicit Eviction on Write
When the source of truth changes, delete the cache:
@Transactional
public void updateProduct(String productId, UpdateProductRequest request) {
productRepository.update(productId, request);
redis.delete("product:" + productId);
}
This is better than only relying on TTL. But there is a subtle issue: if the transaction rolls back after deletion, the cache may be removed even though the database did not change. Usually that is acceptable because the next read repopulates the old value. More dangerous is deleting before the database commit while another reader repopulates stale data.
Prefer evicting after commit:
@TransactionalEventListener(phase = TransactionPhase.AFTER_COMMIT)
public void onProductUpdated(ProductUpdatedEvent event) {
redis.delete("product:" + event.productId());
}
Write-Through Cache
In write-through, writes go through the cache layer, which updates both cache and database:
public void updateProduct(Product product) {
productRepository.save(product);
redis.set("product:" + product.id(), product, Duration.ofMinutes(10));
}
This reduces stale reads after writes, but it still has failure windows. If the database write succeeds and Redis write fails, the cache may remain stale. You still need TTL as a fallback.
Write-through works best when:
- writes are not too frequent
- read-after-write consistency matters
- the write path is centralized
It gets messy when multiple services can update the same entity.
Event-Driven Invalidation
For distributed systems, publish an event when data changes:
{
"eventType": "PRODUCT_UPDATED",
"productId": "p123",
"changedAt": "2025-07-22T12:00:00Z"
}
Consumers evict relevant keys:
@KafkaListener(topics = "product-events")
public void handle(ProductUpdated event) {
redis.delete("product:" + event.productId());
redis.delete("product-summary:" + event.productId());
}
This is powerful when multiple services cache the same data. The product service does not need to know every cache key in every downstream service. It publishes a domain event, and each service invalidates its own caches.
Failure mode: consumers can lag. Keep TTLs even with event invalidation so stale data eventually disappears.
Versioned Cache Keys
Versioned keys avoid delete races:
String key = "product:" + productId + ":v" + product.getVersion();
If the product version increments on update, old cache entries become unreachable:
product:p123:v41
product:p123:v42
This is useful when you can cheaply know the current version. The old keys expire naturally via TTL.
Versioned keys are excellent for immutable or semi-immutable objects, but can create many keys if updates are frequent.
Cache Stampede Prevention
When a hot key expires, many requests can miss at once and hit the database together.
Use a short lock:
Product cached = redis.get(key, Product.class);
if (cached != null) return cached;
String lockKey = "lock:" + key;
boolean locked = redis.setIfAbsent(lockKey, "1", Duration.ofSeconds(5));
if (locked) {
try {
Product product = productRepository.findById(productId);
redis.set(key, product, Duration.ofMinutes(10));
return product;
} finally {
redis.delete(lockKey);
}
}
Thread.sleep(50);
return redis.get(key, Product.class);
In high-traffic systems, also add TTL jitter:
Duration ttl = Duration.ofMinutes(10)
.plusSeconds(ThreadLocalRandom.current().nextInt(0, 60));
Jitter prevents many keys from expiring at exactly the same time.
Negative Caching
If missing data is requested frequently, cache the miss:
if (product == null) {
redis.set("product:" + productId, "NOT_FOUND", Duration.ofMinutes(1));
throw new NotFoundException();
}
Use a short TTL. Negative caching can hide newly created data if the TTL is too long.
Production Checklist
- Define acceptable staleness per data type
- Use TTL even with explicit/event invalidation
- Evict after database commit
- Use event-driven invalidation across services
- Add TTL jitter for hot keys
- Prevent stampedes on expensive cache misses
- Use versioned keys when object versions are easy to access
- Avoid caching highly sensitive correctness-critical state unless the invalidation story is strong
- Monitor hit rate, miss rate, evictions, Redis latency, and database load after cache expiry
Caching is not just a performance optimization. It is a consistency design decision. The best cache strategy is the one that makes staleness explicit and survivable.
Technical Trade-offs: Messaging Systems
| Pattern | Ordering | Durability | Throughput | Complexity |
|---|---|---|---|---|
| Log-based (Kafka) | Strict (per partition) | High | Very High | High |
| Memory-based (Redis Pub/Sub) | None | Low | High | Very Low |
| Push-based (RabbitMQ) | Fair | Medium | Medium | Medium |
Key Takeaways
- Latency: RAM is 100x faster than SSD.
- Throughput: One Redis node can handle 100k+ requests per second.
- Cost: Reducing database hits can significantly lower infrastructure bills.
Read Next
- Redis Beyond Cache: Sorted Sets, Streams, and Pub/Sub Patterns
- Database Indexing Deep Dive: B-Trees, Hash Indexes, and Query Planning
- PostgreSQL Performance Tuning: From Slow Queries to Sub-Millisecond Reads
Verbal Interview Script
Interviewer: "What happens to this database architecture if we experience a sudden 10x spike in write traffic?"
Candidate: "A 10x spike in write traffic would immediately bottleneck a traditional relational database due to row-level locking and the overhead of maintaining ACID transactions, specifically the Write-Ahead Log (WAL) and B-Tree index updates. To handle this, we have a few options. If strict ACID compliance is required, we would need to implement Database Sharding, distributing the write load across multiple primary nodes using a consistent hashing ring. If eventual consistency is acceptable, I would decouple the ingestion by placing a Kafka message queue in front of the database to act as a shock absorber, smoothing out the write spikes into a manageable stream for our background workers to process."