MANG Problem #16: LFU Cache Design (Hard)

1. Problem Statement

Mental Model

Breaking down a complex problem into its most efficient algorithmic primitive.

Design and implement a data structure for a Least Frequently Used (LFU) cache.

LFUCache(int capacity): Initializes the object with the capacity of the data structure.
int get(int key): Gets the value of the key if the key exists in the cache. Otherwise, returns -1.
void put(int key, int value): Updates the value of the key if present, or inserts the key if not already present. When the cache reaches its capacity, it should invalidate and remove the least frequently used key before inserting a new item. For this problem, when there is a tie (i.e., two or more keys with the same frequency), the least recently used key would be invalidated.

Constraint: All operations must be in $O(1)$ time complexity.

2. Approach: Frequency-grouped Linked Lists

LRU is simple because you only track one order (recency). LFU is harder because you track two dimensions: Frequency and Recency.

The "Aha!" Moment

We maintain a Map<Integer, LinkedHashSet<Integer>> freqMap where:

The key is the frequency (1, 2, 3...).
The value is a LinkedHashSet (which maintains insertion order) of keys having that frequency.
The LinkedHashSet handles the "LRU tie-breaker" for us automatically.

The State

cache: Map<Key, Value>
keyFreq: Map<Key, Frequency>
freqMap: Map<Frequency, LinkedHashSet<Keys>>
minFreq: Tracks the global minimum frequency to find the eviction candidate in $O(1)$.

3. Java Implementation

class LFUCache {
    private int capacity, minFreq;
    private Map<Integer, Integer> cache;
    private Map<Integer, Integer> keyFreq;
    private Map<Integer, LinkedHashSet<Integer>> freqMap;

    public LFUCache(int capacity) {
        this.capacity = capacity;
        this.minFreq = 0;
        this.cache = new HashMap<>();
        this.keyFreq = new HashMap<>();
        this.freqMap = new HashMap<>();
    }

    public int get(int key) {
        if (!cache.containsKey(key)) return -1;
        updateFrequency(key);
        return cache.get(key);
    }

    public void put(int key, int value) {
        if (capacity <= 0) return;
        if (cache.containsKey(key)) {
            cache.put(key, value);
            updateFrequency(key);
            return;
        }
        if (cache.size() >= capacity) {
            int evict = freqMap.get(minFreq).iterator().next();
            freqMap.get(minFreq).remove(evict);
            cache.remove(evict);
            keyFreq.remove(evict);
        }
        cache.put(key, value);
        keyFreq.put(key, 1);
        minFreq = 1;
        freqMap.computeIfAbsent(1, k -> new LinkedHashSet<>()).add(key);
    }

    private void updateFrequency(int key) {
        int freq = keyFreq.get(key);
        keyFreq.put(key, freq + 1);
        freqMap.get(freq).remove(key);
        if (freq == minFreq && freqMap.get(freq).isEmpty()) minFreq++;
        freqMap.computeIfAbsent(freq + 1, k -> new LinkedHashSet<>()).add(key);
    }
}

4. 5-Minute "Video-Style" Walkthrough

The Double-Map Hack: Why two maps? keyFreq tells us "How many times has X been seen?" freqMap tells us "Who are all the people seen exactly Y times?"
The MinFreq Pointer: This is the secret to $O(1)$. When we need to evict, we don't scan all frequencies. we just look at freqMap.get(minFreq). Since it's a LinkedHashSet, the first element is the oldest (LRU).
The Frequency Promotion: When you get(key), its frequency increases. If that was the last element of the minFreq bucket, we increment minFreq.

5. Interview Discussion

Interviewer: "Why not use a Priority Queue?"
You: "A Priority Queue would give us $O(\log N)$ for updates. To achieve $O(1)$, we must use a bucket-based approach where each bucket is a DLL or LinkedHashSet."
Interviewer: "Is this thread-safe?"
You: "No, for production use we would need ReadWriteLocks or a ConcurrentHashMap with synchronized bucket updates."

5. Verbal Interview Script (Staff Tier)

Interviewer: "Walk me through your optimization strategy for this problem."

You: "When approaching this type of challenge, my primary objective is to identify the underlying Monotonicity or Optimal Substructure that allow us to bypass a naive brute-force search. In my implementation of 'MANG Problem #16: LFU Cache Design (Hard)', I focused on reducing the time complexity by leveraging a HashMap-based lookup. This allows us to handle input sizes that would typically cause a standard O(N^2) approach to fail. Furthermore, I prioritized memory efficiency by using in-place modifications. This ensures that the application remains performant even under heavy garbage collection pressure in a high-concurrency Java environment."

6. Staff-Level Interview Follow-Ups

Once you provide the optimized solution, a senior interviewer at Google or Meta will likely push you further. Here is how to handle the most common follow-ups:

Follow-up 1: "How does this scale to a Distributed System?"

If the input data is too large to fit on a single machine (e.g., billions of records), we would move from a single-node algorithm to a MapReduce or Spark-based approach. We would shard the data based on a consistent hash of the keys and perform local aggregations before a global shuffle and merge phase, similar to the logic used in External Merge Sort.

Follow-up 2: "What are the Concurrency implications?"

In a multi-threaded Java environment, we must ensure that our state (e.g., the DP table or the frequency map) is thread-safe. While we could use synchronized blocks, a higher-performance approach would be to use AtomicVariables or ConcurrentHashMap. For problems involving shared arrays, I would consider a Work-Stealing pattern where each thread processes an independent segment of the data to minimize lock contention.

7. Performance Nuances (The Java Perspective)

Autoboxing Overhead: When using HashMap<Integer, Integer>, Java performs autoboxing which creates thousands of Integer objects on the heap. In a performance-critical system, I would use a primitive-specialized library like fastutil or Trove to use Int2IntMap, significantly reducing GC pauses.
Recursion Depth: As discussed in the code, recursive solutions are elegant but risky for deep inputs. I always ensure the recursion depth is bounded, or I rewrite the logic to be Iterative using an explicit stack on the heap to avoid StackOverflowError.

Key Takeaways

LFUCache(int capacity): Initializes the object with the capacity of the data structure.
int get(int key): Gets the value of the key if the key exists in the cache. Otherwise, returns -1.
void put(int key, int value): Updates the value of the key if present, or inserts the key if not already present. When the cache reaches its capacity, it should invalidate and remove the least frequently used key before inserting a new item. For this problem, when there is a tie (i.e., two or more keys with the same frequency), the least recently used key would be invalidated.