1. Problem Statement
Mental Model
Breaking down a complex problem into its most efficient algorithmic primitive.
Given an input string s and a pattern p, implement regular expression matching with support for '.' and '*' where:
'.'Matches any single character.'*'Matches zero or more of the preceding element.
The matching should cover the entire input string (not partial).
Input: s = "aab", p = "c*a*b"
Output: true
2. The Mental Model: The "Non-Deterministic State Machine"
Imagine the pattern p as a set of instructions for a machine.
- A character like
'a'means: "Consume one 'a'." - A dot
'.'means: "Consume any single character." - A star
'*'is the complex one. It means: "Look at the previous character. You can either delete it (zero occurrences) or keep it and stay in this state (multiple occurrences)."
Because of the '*', a single state can lead to multiple future states. This is why we use Dynamic Programming to track all possible "Matches" simultaneously.
3. Visual Execution (The DP Transition)
graph TD
Table[DP Table: s[i] x p[j]]
Table --> CharMatch{p[j] == s[i] or '.'?}
CharMatch -- Yes --> Diagonal[dp[i][j] = dp[i-1][j-1]]
Table --> StarMatch{p[j] == '*'}
StarMatch -- Case 1 --> ZeroOccur[dp[i][j] = dp[i][j-2]]
StarMatch -- Case 2 --> MultiOccur[If p[j-1] matches s[i]: dp[i][j] = dp[i-1][j]]
4. Java Implementation (Bottom-Up DP)
public boolean isMatch(String s, String p) {
int m = s.length();
int n = p.length();
// 1. Initialize (m+1) x (n+1) table
// dp[i][j] means s[0..i-1] matches p[0..j-1]
boolean[][] dp = new boolean[m + 1][n + 1];
// 2. Base Case: Empty string matches empty pattern
dp[0][0] = true;
// 3. Handle patterns like a* or a*b* matching empty string
for (int j = 2; j <= n; j++) {
if (p.charAt(j - 1) == '*') {
dp[0][j] = dp[0][j - 2];
}
}
// 4. Fill the DP Table
for (int i = 1; i <= m; i++) {
for (int j = 1; j <= n; j++) {
char sc = s.charAt(i - 1);
char pc = p.charAt(j - 1);
if (pc == sc || pc == '.') {
// Diagonal Match
dp[i][j] = dp[i - 1][j - 1];
} else if (pc == '*') {
// Case 1: Match 0 of the preceding character (Skip the star and its prefix)
dp[i][j] = dp[i][j - 2];
// Case 2: Match 1 or more of the preceding character
char preChar = p.charAt(j - 2);
if (preChar == sc || preChar == '.') {
dp[i][j] = dp[i][j] || dp[i - 1][j];
}
}
}
}
return dp[m][n];
}
5. Verbal Interview Script (Staff Tier)
Interviewer: "Walk me through the logic of the '' character in your DP table."*
You: "The Kleene star is the most complex state transition because it allows the pattern to effectively 'branch' into two possibilities. In my implementation, when p[j] is a star, I evaluate both paths. First, the Zero-Occurrence path: I check dp[i][j-2], which effectively asks: 'Does the string match the pattern if we delete the character before the star?' Second, the Multi-Occurrence path: if the character before the star matches the current character in the string, I check dp[i-1][j]. This is the 'Aha!' moment—by looking at the previous row in the same column, we are checking if the string ending one character earlier also matched the same star pattern. This allows the star to greedily consume as many matching characters as needed without creating an exponential recursion tree."
6. Staff-Level Follow-Ups
Follow-up 1: "Can we use Recursion with Memoization?"
- The Answer: "Yes. A Top-Down approach with a
Boolean[][] memowould be equally efficient in terms of time complexity. It is often easier to write during a 45-minute interview, but it carries the $O(S+P)$ stack overhead. The Bottom-Up Tabulation I implemented is generally more memory-efficient and predictable for production compilers."
Follow-up 2: "How do you handle regex with '?' or '+'?"
- The Answer: "A
+is just one mandatory character followed by a*(e.g.,a+isaa*). A?is just a choice betweendp[i-1][j-1](match) ordp[i][j-1](skip). The DP framework I built is highly extensible to any standard regular expression operator."
7. Performance Nuances (The Java Perspective)
- Boolean Table Overhead: A
boolean[1000][1000]matrix in Java consumes roughly 1MB of heap space. For extremely long patterns, we can reduce this to $O(P)$ space by using only the current and previous rows, asdp[i][j]only ever references rowiori-1. - String.charAt(): For large loops, I'd cache the string lengths and potentially convert the strings to
char[]once to minimize the overhead of multiplecharAt()calls and their associated bounds checks.
Key Takeaways
'.'Matches any single character.'*'Matches zero or more of the preceding element.- A character like
'a'means: "Consume one 'a'."