1. Problem Statement
Mental Model
Breaking down a complex problem into its most efficient algorithmic primitive.
Given an unsorted integer array nums, return the smallest missing positive integer.
You must implement an algorithm that runs in $O(n)$ time and uses $O(1)$ auxiliary space.
Input: nums = [3,4,-1,1]
Output: 2
2. Approach: Cyclic Sort (Index as Map)
The smallest missing positive must be in the range [1, n+1].
The Insight
If the array has size n, and the numbers were perfectly sorted from 1 to n, then nums[0] would be 1, nums[1] would be 2... nums[i] would be i + 1.
The Strategy
We will iterate through the array and try to "place" every number x at its correct index x - 1.
- If
nums[i] = 3, we swap it with the element atnums[2]. - We only swap if:
nums[i]is positive.nums[i]is within bounds ($\le n$).nums[i]is not already at the correct spot.
3. Java Implementation
public int firstMissingPositive(int[] nums) {
int n = nums.length;
// 1. Cyclic Sort
for (int i = 0; i < n; i++) {
while (nums[i] > 0 && nums[i] <= n && nums[nums[i] - 1] != nums[i]) {
// Swap nums[i] with nums[nums[i]-1]
int temp = nums[nums[i] - 1];
nums[nums[i] - 1] = nums[i];
nums[i] = temp;
}
}
// 2. Find first index mismatch
for (int i = 0; i < n; i++) {
if (nums[i] != i + 1) {
return i + 1;
}
}
return n + 1;
}
4. 5-Minute "Video-Style" Walkthrough
- The "Aha!" Moment: You don't need a HashSet. The array is your HashSet. By swapping numbers into their "correct" index, you are using the array's memory as a frequency map.
- The While Loop: Why a
whileand not anif? Because after a swap, the new number atnums[i]might also need to be swapped to a different index. - The Range: If the array is
[1, 2, 3], the missing one is4. If it's[1, 1, 1], the missing one is2. The loop handles both cases.
5. Interview Discussion
- Interviewer: "Is the time complexity really O(N) with that nested while loop?"
- You: "Yes. Although there is a nested loop, every swap puts at least one number into its final correct position. A number can be swapped at most once to its final position, so the total number of swaps across all iterations is at most $N$."
- Interviewer: "What if the input is immutable?"
- You: "Then $O(1)$ space is impossible if we must return the answer in $O(N)$ time. We would need to use $O(N)$ extra space for a Set or a copy of the array."
5. Verbal Interview Script (Staff Tier)
Interviewer: "Walk me through your optimization strategy for this problem."
You: "When approaching this type of challenge, my primary objective is to identify the underlying Monotonicity or Optimal Substructure that allow us to bypass a naive brute-force search. In my implementation of 'MANG Problem #27: First Missing Positive (Hard)', I focused on reducing the time complexity by leveraging a HashMap-based lookup. This allows us to handle input sizes that would typically cause a standard O(N^2) approach to fail. Furthermore, I prioritized memory efficiency by using in-place modifications. This ensures that the application remains performant even under heavy garbage collection pressure in a high-concurrency Java environment."
6. Staff-Level Interview Follow-Ups
Once you provide the optimized solution, a senior interviewer at Google or Meta will likely push you further. Here is how to handle the most common follow-ups:
Follow-up 1: "How does this scale to a Distributed System?"
If the input data is too large to fit on a single machine (e.g., billions of records), we would move from a single-node algorithm to a MapReduce or Spark-based approach. We would shard the data based on a consistent hash of the keys and perform local aggregations before a global shuffle and merge phase, similar to the logic used in External Merge Sort.
Follow-up 2: "What are the Concurrency implications?"
In a multi-threaded Java environment, we must ensure that our state (e.g., the DP table or the frequency map) is thread-safe. While we could use synchronized blocks, a higher-performance approach would be to use AtomicVariables or ConcurrentHashMap. For problems involving shared arrays, I would consider a Work-Stealing pattern where each thread processes an independent segment of the data to minimize lock contention.
7. Performance Nuances (The Java Perspective)
- Autoboxing Overhead: When using
HashMap<Integer, Integer>, Java performs autoboxing which creates thousands ofIntegerobjects on the heap. In a performance-critical system, I would use a primitive-specialized library like fastutil or Trove to useInt2IntMap, significantly reducing GC pauses. - Recursion Depth: As discussed in the code, recursive solutions are elegant but risky for deep inputs. I always ensure the recursion depth is bounded, or I rewrite the logic to be Iterative using an explicit stack on the heap to avoid
StackOverflowError.
Key Takeaways
- If
nums[i] = 3, we swap it with the element atnums[2]. - We only swap if:
- Interviewer: "Is the time complexity really O(N) with that nested while loop?"