Lesson 35 of 70 5 min

MANG Problem #32: Word Break II (Hard)

Master the combination of Dynamic Programming and Backtracking to generate all possible sentence constructions.

Reading Mode

Hide the curriculum rail and keep the lesson centered for focused reading.

1. Problem Statement

Mental Model

Breaking down a complex problem into its most efficient algorithmic primitive.

Given a string s and a dictionary of strings wordDict, add spaces in s to construct a sentence where each word is a valid dictionary word. Return all such possible sentences in any order.

Input: s = "catsanddog", wordDict = ["cat","cats","and","sand","dog"]
Output: ["cats and dog","cat sand dog"]

2. Approach: DFS with Memoization

A pure backtracking approach will Time Out ($O(2^N)$). We need to memoize the results of suffixes.

  1. State: map.get(s) returns the list of valid sentences that can be formed from string s.
  2. Recursion:
    • For a string s, check if it starts with any word in the dictionary.
    • If it does, recursively call the function on the remainder of the string.
    • Append the current word to all sentences returned by the recursive call.
  3. Memoization: Store the result for s in a HashMap.

3. Java Implementation

public List<String> wordBreak(String s, List<String> wordDict) {
    return dfs(s, new HashSet<>(wordDict), new HashMap<String, List<String>>());
}

private List<String> dfs(String s, Set<String> wordDict, Map<String, List<String>> map) {
    if (map.containsKey(s)) return map.get(s);
    
    List<String> res = new ArrayList<>();
    if (s.isEmpty()) {
        res.add("");
        return res;
    }
    
    for (String word : wordDict) {
        if (s.startsWith(word)) {
            String sub = s.substring(word.length());
            List<String> subList = dfs(sub, wordDict, map);
            for (String subSentence : subList) {
                res.add(word + (subSentence.isEmpty() ? "" : " ") + subSentence);
            }
        }
    }
    map.put(s, res);
    return res;
}

4. 5-Minute "Video-Style" Walkthrough

  1. The "Aha!" Moment: If "sanddog" can be broken into ["sand dog"], then when processing "cat", we just prepend "cat " to all the results of "sanddog".
  2. The Memoization Map: The map stores String -> List<String>. If we've already figured out how to break "dog", we don't compute it again. We just return ["dog"].
  3. The Empty String Base Case: When s.isEmpty(), returning [""] is critical. It allows the final word in the string to append itself without an extra trailing space.

5. Interview Discussion

  • Interviewer: "What is the time complexity?"
  • You: "In the worst case (e.g., s = aaaaab, dict = [a, aa, aaa]), it is $O(2^N)$ because there are exponential valid sentences. But memoization drastically prunes branches that lead to invalid words."
  • Interviewer: "Can we use a Trie?"
  • You: "Yes! Instead of s.startsWith(word), we can traverse a Trie built from wordDict. This avoids creating multiple substrings and is faster when the dictionary is large."

5. Verbal Interview Script (Staff Tier)

Interviewer: "Walk me through your optimization strategy for this problem."

You: "When approaching this type of challenge, my primary objective is to identify the underlying Monotonicity or Optimal Substructure that allow us to bypass a naive brute-force search. In my implementation of 'MANG Problem #32: Word Break II (Hard)', I focused on reducing the time complexity by leveraging a HashMap-based lookup. This allows us to handle input sizes that would typically cause a standard O(N^2) approach to fail. Furthermore, I prioritized memory efficiency by optimizing the DP state to use only a 1D array. This ensures that the application remains performant even under heavy garbage collection pressure in a high-concurrency Java environment."

6. Staff-Level Interview Follow-Ups

Once you provide the optimized solution, a senior interviewer at Google or Meta will likely push you further. Here is how to handle the most common follow-ups:

Follow-up 1: "How does this scale to a Distributed System?"

If the input data is too large to fit on a single machine (e.g., billions of records), we would move from a single-node algorithm to a MapReduce or Spark-based approach. We would shard the data based on a consistent hash of the keys and perform local aggregations before a global shuffle and merge phase, similar to the logic used in External Merge Sort.

Follow-up 2: "What are the Concurrency implications?"

In a multi-threaded Java environment, we must ensure that our state (e.g., the DP table or the frequency map) is thread-safe. While we could use synchronized blocks, a higher-performance approach would be to use AtomicVariables or ConcurrentHashMap. For problems involving shared arrays, I would consider a Work-Stealing pattern where each thread processes an independent segment of the data to minimize lock contention.

7. Performance Nuances (The Java Perspective)

  1. Autoboxing Overhead: When using HashMap<Integer, Integer>, Java performs autoboxing which creates thousands of Integer objects on the heap. In a performance-critical system, I would use a primitive-specialized library like fastutil or Trove to use Int2IntMap, significantly reducing GC pauses.
  2. Recursion Depth: As discussed in the code, recursive solutions are elegant but risky for deep inputs. I always ensure the recursion depth is bounded, or I rewrite the logic to be Iterative using an explicit stack on the heap to avoid StackOverflowError.

Key Takeaways

  • For a string s, check if it starts with any word in the dictionary.
  • If it does, recursively call the function on the remainder of the string.
  • Append the current word to all sentences returned by the recursive call.

Want to track your progress?

Sign in to save your progress, track completed lessons, and pick up where you left off.