Lesson 47 of 70 6 min

MANG Problem #39: Design Search Autocomplete System (Hard)

Learn how to design an efficient search autocomplete system using a Trie combined with caching of top historical results.

Reading Mode

Hide the curriculum rail and keep the lesson centered for focused reading.

1. Problem Statement

Mental Model

Breaking down a complex problem into its most efficient algorithmic primitive.

Design a search autocomplete system for a search engine. Users may input a sentence (at least one word and end with a special character '#').

You are given a string array sentences and an integer array times representing the historical frequencies.

Implement the AutocompleteSystem class:

  • AutocompleteSystem(String[] sentences, int[] times) Initializes the object with the historical data.
  • List<String> input(char c) The user types a character c. If c == '#', the user has finished typing. Otherwise, return the top 3 historical hot sentences that have the same prefix as the part of the sentence already typed.
  • Return the top 3 sentences sorted by frequency (descending), breaking ties by ASCII-code (ascending).

2. Approach: Trie + Precomputed Top 3

If we just use a standard Trie, finding the top 3 sentences requires a full DFS of the subtree at every keystroke, taking $O(V)$ time where $V$ is all nodes in the subtree. This is too slow for real-time autocomplete.

The "Aha!" Moment

Instead of calculating the top 3 at query time, pre-calculate and store them in the TrieNode itself. Every node in the Trie will hold a small list of the top 3 sentences that pass through it.

  1. State:
    • Trie: Stores characters. Each node has a List<String> top3.
    • Map<String, Integer> counts: Stores the global frequency of all sentences.
    • StringBuilder currentPrefix: Tracks what the user is currently typing.
    • TrieNode currNode: Tracks the current position in the Trie to avoid searching from the root on every keystroke.
  2. Input('#'): Save the currentPrefix to the Map, increment its count, and re-insert it into the Trie to update the top3 lists along its path. Reset currentPrefix and currNode.

3. Java Implementation

class AutocompleteSystem {
    class TrieNode {
        Map<Character, TrieNode> children = new HashMap<>();
        List<String> top3 = new ArrayList<>();
    }

    private TrieNode root;
    private TrieNode currNode;
    private StringBuilder currSentence;
    private Map<String, Integer> counts;

    public AutocompleteSystem(String[] sentences, int[] times) {
        root = new TrieNode();
        counts = new HashMap<>();
        for (int i = 0; i < sentences.length; i++) {
            counts.put(sentences[i], times[i]);
            insert(sentences[i]);
        }
        currNode = root;
        currSentence = new StringBuilder();
    }

    private void insert(String sentence) {
        TrieNode node = root;
        for (char c : sentence.toCharArray()) {
            node.children.putIfAbsent(c, new TrieNode());
            node = node.children.get(c);
            updateTop3(node, sentence);
        }
    }

    private void updateTop3(TrieNode node, String sentence) {
        if (!node.top3.contains(sentence)) {
            node.top3.add(sentence);
        }
        node.top3.sort((a, b) -> {
            int countA = counts.get(a);
            int countB = counts.get(b);
            if (countA == countB) return a.compareTo(b);
            return countB - countA;
        });
        if (node.top3.size() > 3) {
            node.top3.remove(3);
        }
    }

    public List<String> input(char c) {
        if (c == '#') {
            String sentence = currSentence.toString();
            counts.put(sentence, counts.getOrDefault(sentence, 0) + 1);
            insert(sentence);
            currSentence = new StringBuilder();
            currNode = root;
            return new ArrayList<>();
        }

        currSentence.append(c);
        if (currNode != null) {
            currNode = currNode.children.get(c);
        }
        
        return currNode == null ? new ArrayList<>() : currNode.top3;
    }
}

4. 5-Minute "Video-Style" Walkthrough

  1. The Read/Write Tradeoff: In Autocomplete, reads (typing a character) happen 100x more often than writes (submitting a sentence). By doing the heavy lifting (sorting and maintaining the top 3) during insert(), our input() method becomes $O(1)$!
  2. The Pointer Trick: Notice the currNode variable. When a user types "a", we go to the 'a' node. When they type "p", we don't start from the root to find "ap". We just look at currNode.children.get('p'). This stateful tracking is crucial for latency.

5. Interview Discussion

  • Interviewer: "What happens if a sentence gets very popular and its rank changes?"
  • You: "Our updateTop3 logic handles this because when # is typed, we re-run insert(), which re-sorts the top3 list at every node along the path."
  • Interviewer: "How would you scale this for Google?"
  • You: "We would shard the Trie based on the first character (or prefix) and use a distributed cache (Redis). The Trie would be built offline via MapReduce and pushed to read-replicas."

5. Verbal Interview Script (Staff Tier)

Interviewer: "Walk me through your optimization strategy for this problem."

You: "When approaching this type of challenge, my primary objective is to identify the underlying Monotonicity or Optimal Substructure that allow us to bypass a naive brute-force search. In my implementation of 'MANG Problem #39: Design Search Autocomplete System (Hard)', I focused on reducing the time complexity by leveraging a HashMap-based lookup. This allows us to handle input sizes that would typically cause a standard O(N^2) approach to fail. Furthermore, I prioritized memory efficiency by using in-place modifications. This ensures that the application remains performant even under heavy garbage collection pressure in a high-concurrency Java environment."

6. Staff-Level Interview Follow-Ups

Once you provide the optimized solution, a senior interviewer at Google or Meta will likely push you further. Here is how to handle the most common follow-ups:

Follow-up 1: "How does this scale to a Distributed System?"

If the input data is too large to fit on a single machine (e.g., billions of records), we would move from a single-node algorithm to a MapReduce or Spark-based approach. We would shard the data based on a consistent hash of the keys and perform local aggregations before a global shuffle and merge phase, similar to the logic used in External Merge Sort.

Follow-up 2: "What are the Concurrency implications?"

In a multi-threaded Java environment, we must ensure that our state (e.g., the DP table or the frequency map) is thread-safe. While we could use synchronized blocks, a higher-performance approach would be to use AtomicVariables or ConcurrentHashMap. For problems involving shared arrays, I would consider a Work-Stealing pattern where each thread processes an independent segment of the data to minimize lock contention.

7. Performance Nuances (The Java Perspective)

  1. Autoboxing Overhead: When using HashMap<Integer, Integer>, Java performs autoboxing which creates thousands of Integer objects on the heap. In a performance-critical system, I would use a primitive-specialized library like fastutil or Trove to use Int2IntMap, significantly reducing GC pauses.
  2. Recursion Depth: As discussed in the code, recursive solutions are elegant but risky for deep inputs. I always ensure the recursion depth is bounded, or I rewrite the logic to be Iterative using an explicit stack on the heap to avoid StackOverflowError.

Key Takeaways

  • AutocompleteSystem(String[] sentences, int[] times) Initializes the object with the historical data.
  • List<String> input(char c) The user types a character c. If c == '#', the user has finished typing. Otherwise, return the top 3 historical hot sentences that have the same prefix as the part of the sentence already typed.
  • Return the top 3 sentences sorted by frequency (descending), breaking ties by ASCII-code (ascending).

Want to track your progress?

Sign in to save your progress, track completed lessons, and pick up where you left off.