1. Problem Statement
Mental Model
Breaking down a complex problem into its most efficient algorithmic primitive.
Design a data structure that follows the constraints of a Least Recently Used (LRU) cache.
Implement the LRUCache class:
LRUCache(int capacity): Initialize the cache with positive sizecapacity.int get(int key): Return the value of the key if it exists, otherwise return -1.void put(int key, int value): Update the value of the key if it exists. Otherwise, add the key-value pair to the cache. If the number of keys exceeds the capacity, evict the least recently used key.
Constraint: All operations must be in $O(1)$ average time complexity.
2. The Mental Model: The "Hybrid" Architecture
In a standard Java HashMap, you get $O(1)$ lookup, but you lose the "Order" of arrival. In a LinkedList, you get $O(1)$ re-ordering (if you have the pointer), but finding the element is $O(N)$.
To build an LRU cache, we must combine these two worlds.
- HashMap: Stores
Key -> ListNode. This allows us to jump to any node in the list in $O(1)$. - Doubly Linked List: Stores the "Recency" order.
- Head: Most Recently Used (MRU). Every time a key is accessed, we move it here.
- Tail: Least Recently Used (LRU). When the cache is full, we delete from here.
3. Visual Execution (Hybrid Structure)
graph LR
subgraph "HashMap (O(1) Jump)"
Map[Map] --> K1[Key 1]
Map --> K2[Key 2]
Map --> K3[Key 3]
end
subgraph "Doubly Linked List (O(1) Move)"
H[Head / MRU] <--> N1[Node 1]
N1 <--> N2[Node 2]
N2 <--> N3[Node 3]
N3 <--> T[Tail / LRU]
end
K1 -. pointer .-> N1
K2 -. pointer .-> N2
K3 -. pointer .-> N3
4. Java Implementation
class LRUCache {
class Node {
int key, val;
Node prev, next;
Node(int k, int v) { key = k; val = v; }
}
private final int capacity;
private final Map<Integer, Node> map;
private final Node head, tail;
public LRUCache(int capacity) {
this.capacity = capacity;
this.map = new HashMap<>();
// 1. The Sentinel Node Pattern: Head and Tail are permanent "Goal posts"
head = new Node(0, 0);
tail = new Node(0, 0);
head.next = tail;
tail.prev = head;
}
public int get(int key) {
if (!map.containsKey(key)) return -1;
Node node = map.get(key);
remove(node); // Extract from current position
insert(node); // Move to MRU position (head.next)
return node.val;
}
public void put(int key, int value) {
if (map.containsKey(key)) {
remove(map.get(key));
} else if (map.size() == capacity) {
// 2. Evict from LRU position (tail.prev)
map.remove(tail.prev.key);
remove(tail.prev);
}
insert(new Node(key, value));
}
// Helper: Remove node from its current spot in the chain
private void remove(Node node) {
node.prev.next = node.next;
node.next.prev = node.prev;
}
// Helper: Insert node at the MRU position (just after head)
private void insert(Node node) {
map.put(node.key, node);
node.next = head.next;
node.next.prev = node;
head.next = node;
node.prev = head;
}
}
5. Verbal Interview Script (Staff Tier)
Interviewer: "How would you handle thread safety in this LRU cache?"
You: "In a production environment, this implementation is not thread-safe because multiple threads could be updating the pointers of the Doubly Linked List simultaneously, leading to a race condition or a corrupted structure. To fix this, I would use ReentrantLock or wrap the methods in synchronized blocks. However, a more scalable approach for high-concurrency Java would be to use a ConcurrentHashMap for the lookup and a ConcurrentLinkedQueue for the recency tracking, although maintaining strict LRU order with concurrent structures is complex. I might also look at Caffeine or Ehcache, which use advanced 'Window TinyLFU' algorithms to balance hit rate and concurrency overhead."
6. The "Staff-Tier" Pivot: System Design
Interviewer: "What if the cache is too big for a single machine's RAM?"
You: "That is when we transition to Distributed Caching. I would use a system like Redis. Redis implements LRU logic internally but uses a more memory-efficient 'Approximated LRU' algorithm. Instead of a full DLL, it samples a few keys and evicts the oldest from that sample. To scale horizontally, I would use Consistent Hashing to shard keys across multiple Redis nodes, ensuring that if one node goes down, we only lose a portion of our cache."
7. Performance Nuances (Staff Level)
- Memory Overhead: Each
Nodeobject in the DLL carries significant overhead (32-40 bytes for pointers and metadata). For primitive-heavy caches, consider using a long array as an ID-to-Recency map to reduce GC pressure. - LinkedHashMap: In Java, you can implement this in 5 lines using
LinkedHashMapby overridingremoveEldestEntry. In an interview, always implement it manually with the DLL first to show you understand the pointers, then mention the built-in Java class as a production alternative.
6. Staff-Level Interview Follow-Ups
Once you provide the optimized solution, a senior interviewer at Google or Meta will likely push you further. Here is how to handle the most common follow-ups:
Follow-up 1: "How does this scale to a Distributed System?"
If the input data is too large to fit on a single machine (e.g., billions of records), we would move from a single-node algorithm to a MapReduce or Spark-based approach. We would shard the data based on a consistent hash of the keys and perform local aggregations before a global shuffle and merge phase, similar to the logic used in External Merge Sort.
Follow-up 2: "What are the Concurrency implications?"
In a multi-threaded Java environment, we must ensure that our state (e.g., the DP table or the frequency map) is thread-safe. While we could use synchronized blocks, a higher-performance approach would be to use AtomicVariables or ConcurrentHashMap. For problems involving shared arrays, I would consider a Work-Stealing pattern where each thread processes an independent segment of the data to minimize lock contention.
7. Performance Nuances (The Java Perspective)
- Autoboxing Overhead: When using
HashMap<Integer, Integer>, Java performs autoboxing which creates thousands ofIntegerobjects on the heap. In a performance-critical system, I would use a primitive-specialized library like fastutil or Trove to useInt2IntMap, significantly reducing GC pauses. - Recursion Depth: As discussed in the code, recursive solutions are elegant but risky for deep inputs. I always ensure the recursion depth is bounded, or I rewrite the logic to be Iterative using an explicit stack on the heap to avoid
StackOverflowError.
Key Takeaways
LRUCache(int capacity): Initialize the cache with positive sizecapacity.int get(int key): Return the value of the key if it exists, otherwise return -1.void put(int key, int value): Update the value of the key if it exists. Otherwise, add the key-value pair to the cache. If the number of keys exceeds the capacity, evict the least recently used key.