MANG Problem #13: Edit Distance (Hard)

1. Problem Statement

Mental Model

Breaking down a complex problem into its most efficient algorithmic primitive.

Given two strings word1 and word2, return the minimum number of operations required to convert word1 to word2.

You have the following three operations permitted on a word:

Insert a character
Delete a character
Replace a character

Input: word1 = "horse", word2 = "ros"
Output: 3 (horse -> rorse -> rose -> ros)

2. Approach: 2D Bottom-Up DP

graph TD
    Table[DP Table: word1 x word2]
    Match[Match: dp[i-1][j-1]]
    Replace[Replace: dp[i-1][j-1] + 1]
    Delete[Delete: dp[i-1][j] + 1]
    Insert[Insert: dp[i][j-1] + 1]

    Table --> If{word1[i] == word2[j]?}
    If -- Yes --> Match
    If -- No --> Min{Min of...}
    Min --> Replace
    Min --> Delete
    Min --> Insert

We want to find the min distance between every prefix of word1 and word2.

State: dp[i][j] is the edit distance between word1[0...i] and word2[0...j].
Base Case:
- dp[i][0] = i: Converting word of length i to empty string requires i deletions.
- dp[0][j] = j: Converting empty string to word of length j requires j insertions.
Transition:
- If word1[i-1] == word2[j-1]: The characters match! dp[i][j] = dp[i-1][j-1] (No new operation).
- Else: We take the minimum of 3 possibilities + 1:
  - dp[i-1][j] (Delete)
  - dp[i][j-1] (Insert)
  - dp[i-1][j-1] (Replace)

3. Java Implementation

public int minDistance(String word1, String word2) {
    int m = word1.length(), n = word2.length();
    int[][] dp = new int[m + 1][n + 1];

    for (int i = 0; i <= m; i++) dp[i][0] = i;
    for (int j = 0; j <= n; j++) dp[0][j] = j;

    for (int i = 1; i <= m; i++) {
        for (int j = 1; j <= n; j++) {
            if (word1.charAt(i - 1) == word2.charAt(j - 1)) {
                dp[i][j] = dp[i - 1][j - 1];
            } else {
                dp[i][j] = 1 + Math.min(dp[i - 1][j - 1], // Replace
                           Math.min(dp[i - 1][j],       // Delete
                                    dp[i][j - 1]));      // Insert
            }
        }
    }
    return dp[m][n];
}

4. 5-Minute "Video-Style" Walkthrough

The "Aha!" Moment: Think of the 2D grid as a map. Each cell (i, j) represents the cost to get to that state. Matching characters let you walk diagonally for free.
The Operations:
- Moving Down is a Deletion from word1.
- Moving Right is an Insertion into word1.
- Moving Diagonally (mismatch) is a Replacement.
The Grid: The result is at the bottom-right corner of the matrix, representing the cost to transform the full length of both strings.

5. Interview Discussion

Interviewer: "What is the time complexity?"
You: "O(M * N) where M and N are the lengths of the words. We fill every cell in the matrix once."
Interviewer: "How can we optimize space?"
You: "Since we only ever need the previous row (or even just the previous few cells), we can reduce the space complexity from O(M * N) to O(N) by using a single 1D array to store the current row's results."

5. Verbal Interview Script (Staff Tier)

Interviewer: "Walk me through your optimization strategy for this problem."

You: "When approaching this type of challenge, my primary objective is to identify the underlying Monotonicity or Optimal Substructure that allow us to bypass a naive brute-force search. In my implementation of 'MANG Problem #13: Edit Distance (Hard)', I focused on reducing the time complexity by leveraging a Dynamic Programming state transition. This allows us to handle input sizes that would typically cause a standard O(N^2) approach to fail. Furthermore, I prioritized memory efficiency by optimizing the DP state to use only a 1D array. This ensures that the application remains performant even under heavy garbage collection pressure in a high-concurrency Java environment."

6. Staff-Level Interview Follow-Ups

Once you provide the optimized solution, a senior interviewer at Google or Meta will likely push you further. Here is how to handle the most common follow-ups:

Follow-up 1: "How does this scale to a Distributed System?"

If the input data is too large to fit on a single machine (e.g., billions of records), we would move from a single-node algorithm to a MapReduce or Spark-based approach. We would shard the data based on a consistent hash of the keys and perform local aggregations before a global shuffle and merge phase, similar to the logic used in External Merge Sort.

Follow-up 2: "What are the Concurrency implications?"

In a multi-threaded Java environment, we must ensure that our state (e.g., the DP table or the frequency map) is thread-safe. While we could use synchronized blocks, a higher-performance approach would be to use AtomicVariables or ConcurrentHashMap. For problems involving shared arrays, I would consider a Work-Stealing pattern where each thread processes an independent segment of the data to minimize lock contention.

7. Performance Nuances (The Java Perspective)

Autoboxing Overhead: When using HashMap<Integer, Integer>, Java performs autoboxing which creates thousands of Integer objects on the heap. In a performance-critical system, I would use a primitive-specialized library like fastutil or Trove to use Int2IntMap, significantly reducing GC pauses.
Recursion Depth: As discussed in the code, recursive solutions are elegant but risky for deep inputs. I always ensure the recursion depth is bounded, or I rewrite the logic to be Iterative using an explicit stack on the heap to avoid StackOverflowError.

Key Takeaways

dp[i][0] = i: Converting word of length i to empty string requires i deletions.
dp[0][j] = j: Converting empty string to word of length j requires j insertions.
If word1[i-1] == word2[j-1]: The characters match! dp[i][j] = dp[i-1][j-1] (No new operation).

A curated practice bank for turning patterns into repetition.