Lesson 55 of 70 5 min

MANG Problem #27: First Missing Positive (Hard)

Learn the "Index as Hash" trick to find the smallest missing positive integer in linear time with zero extra space.

Reading Mode

Hide the curriculum rail and keep the lesson centered for focused reading.

1. Problem Statement

Mental Model

Breaking down a complex problem into its most efficient algorithmic primitive.

Given an unsorted integer array nums, return the smallest missing positive integer.

You must implement an algorithm that runs in $O(n)$ time and uses $O(1)$ auxiliary space.

Input: nums = [3,4,-1,1]
Output: 2

2. Approach: Cyclic Sort (Index as Map)

The smallest missing positive must be in the range [1, n+1].

The Insight

If the array has size n, and the numbers were perfectly sorted from 1 to n, then nums[0] would be 1, nums[1] would be 2... nums[i] would be i + 1.

The Strategy

We will iterate through the array and try to "place" every number x at its correct index x - 1.

  • If nums[i] = 3, we swap it with the element at nums[2].
  • We only swap if:
    1. nums[i] is positive.
    2. nums[i] is within bounds ($\le n$).
    3. nums[i] is not already at the correct spot.

3. Java Implementation

public int firstMissingPositive(int[] nums) {
    int n = nums.length;
    
    // 1. Cyclic Sort
    for (int i = 0; i < n; i++) {
        while (nums[i] > 0 && nums[i] <= n && nums[nums[i] - 1] != nums[i]) {
            // Swap nums[i] with nums[nums[i]-1]
            int temp = nums[nums[i] - 1];
            nums[nums[i] - 1] = nums[i];
            nums[i] = temp;
        }
    }
    
    // 2. Find first index mismatch
    for (int i = 0; i < n; i++) {
        if (nums[i] != i + 1) {
            return i + 1;
        }
    }
    
    return n + 1;
}

4. 5-Minute "Video-Style" Walkthrough

  1. The "Aha!" Moment: You don't need a HashSet. The array is your HashSet. By swapping numbers into their "correct" index, you are using the array's memory as a frequency map.
  2. The While Loop: Why a while and not an if? Because after a swap, the new number at nums[i] might also need to be swapped to a different index.
  3. The Range: If the array is [1, 2, 3], the missing one is 4. If it's [1, 1, 1], the missing one is 2. The loop handles both cases.

5. Interview Discussion

  • Interviewer: "Is the time complexity really O(N) with that nested while loop?"
  • You: "Yes. Although there is a nested loop, every swap puts at least one number into its final correct position. A number can be swapped at most once to its final position, so the total number of swaps across all iterations is at most $N$."
  • Interviewer: "What if the input is immutable?"
  • You: "Then $O(1)$ space is impossible if we must return the answer in $O(N)$ time. We would need to use $O(N)$ extra space for a Set or a copy of the array."

5. Verbal Interview Script (Staff Tier)

Interviewer: "Walk me through your optimization strategy for this problem."

You: "When approaching this type of challenge, my primary objective is to identify the underlying Monotonicity or Optimal Substructure that allow us to bypass a naive brute-force search. In my implementation of 'MANG Problem #27: First Missing Positive (Hard)', I focused on reducing the time complexity by leveraging a HashMap-based lookup. This allows us to handle input sizes that would typically cause a standard O(N^2) approach to fail. Furthermore, I prioritized memory efficiency by using in-place modifications. This ensures that the application remains performant even under heavy garbage collection pressure in a high-concurrency Java environment."

6. Staff-Level Interview Follow-Ups

Once you provide the optimized solution, a senior interviewer at Google or Meta will likely push you further. Here is how to handle the most common follow-ups:

Follow-up 1: "How does this scale to a Distributed System?"

If the input data is too large to fit on a single machine (e.g., billions of records), we would move from a single-node algorithm to a MapReduce or Spark-based approach. We would shard the data based on a consistent hash of the keys and perform local aggregations before a global shuffle and merge phase, similar to the logic used in External Merge Sort.

Follow-up 2: "What are the Concurrency implications?"

In a multi-threaded Java environment, we must ensure that our state (e.g., the DP table or the frequency map) is thread-safe. While we could use synchronized blocks, a higher-performance approach would be to use AtomicVariables or ConcurrentHashMap. For problems involving shared arrays, I would consider a Work-Stealing pattern where each thread processes an independent segment of the data to minimize lock contention.

7. Performance Nuances (The Java Perspective)

  1. Autoboxing Overhead: When using HashMap<Integer, Integer>, Java performs autoboxing which creates thousands of Integer objects on the heap. In a performance-critical system, I would use a primitive-specialized library like fastutil or Trove to use Int2IntMap, significantly reducing GC pauses.
  2. Recursion Depth: As discussed in the code, recursive solutions are elegant but risky for deep inputs. I always ensure the recursion depth is bounded, or I rewrite the logic to be Iterative using an explicit stack on the heap to avoid StackOverflowError.

Key Takeaways

  • If nums[i] = 3, we swap it with the element at nums[2].
  • We only swap if:
  • Interviewer: "Is the time complexity really O(N) with that nested while loop?"

Want to track your progress?

Sign in to save your progress, track completed lessons, and pick up where you left off.