Remove Duplicates from Sorted Array (In-Place)

1. Problem Statement

Mental Model

Imposing order to reduce the search space complexity.

Given an integer array nums sorted in non-decreasing order, remove the duplicates in-place such that each unique element appears only once. The relative order of the elements should be kept the same.

Return k after placing the final result in the first k slots of nums.

Input: nums = [1,1,2]
Output: 2, nums = [1,2,_]

2. The Mental Model: The "Read & Write" Pointers

In most Two Pointer problems, the pointers move towards each other. However, in In-Place Modification problems, the pointers usually move in the same direction.

The Reader (i): Scans every single element in the array looking for a "new" value.
The Writer (k): Only moves when we find a value that is different from the last one we wrote. It marks the boundary of our "Clean" array.

Why does this work? Because the array is sorted. If a duplicate exists, it must be adjacent to its twin.

3. Visual Execution

flowchart LR
    Start[1, 1, 2] --> P1[R=1, W=1 | 1 is first]
    P1 --> P2[R=1, W=1 | Match! skip]
    P2 --> P3[R=2, W=2 | New! Write 2 at index 1]
    P3 --> End[Result: 2 unique]

4. Java Implementation

public int removeDuplicates(int[] nums) {
    if (nums.length == 0) return 0;
    
    int k = 1; // The Write Pointer (starts at 1 because index 0 is always unique)
    
    for (int i = 1; i < nums.length; i++) {
        // If the current element is different from the previous unique element
        if (nums[i] != nums[i - 1]) {
            nums[k] = nums[i]; // Write it to the k-th position
            k++; // Move the write boundary
        }
    }
    
    return k;
}

5. Verbal Interview Script (The Staff Way)

Interviewer: "How do you remove duplicates in-place without using extra space?"

You: "Since the array is sorted, duplicates are guaranteed to be adjacent. I'll use a Two-Pointer strategy with a 'Read' pointer and a 'Write' pointer. The Read pointer will iterate through the entire array. The Write pointer will track the position where the next unique element should be placed. Whenever the Read pointer encounters a value that is different from the one immediately before it, we know we've found a new unique element. We then overwrite the value at the Write pointer and increment it. This allows us to process the array in $O(N)$ time with $O(1)$ auxiliary space."

6. Common Pitfalls & Edge Cases

Empty Array: Many candidates forget the if (length == 0) check, which leads to an ArrayIndexOutOfBoundsException on the first loop.
K-th element vs K-length: The problem asks for the count of unique elements, but also requires the array to be modified. Ensure you are returning the length of the unique segment, not the last index.
Returning the entire array: Don't return the array itself. Return the integer k.

7. Comparative Analysis

Approach	Time	Space	Note
HashSet	$O(N)$	$O(N)$	Simple, but fails the "In-Place" constraint.
Two Pointers	$O(N)$	$O(1)$	Optimal. Leverages sorted property.
Nested Loop	$O(N^2)$	$O(1)$	Shifting elements back manually is too slow.

5. Verbal Interview Script (Staff Tier)

Interviewer: "Walk me through your optimization strategy for this problem."

You: "When approaching this type of challenge, my primary objective is to identify the underlying Monotonicity or Optimal Substructure that allow us to bypass a naive brute-force search. In my implementation of 'Remove Duplicates from Sorted Array (In-Place)', I focused on reducing the time complexity by leveraging a HashMap-based lookup. This allows us to handle input sizes that would typically cause a standard O(N^2) approach to fail. Furthermore, I prioritized memory efficiency by using in-place modifications. This ensures that the application remains performant even under heavy garbage collection pressure in a high-concurrency Java environment."

6. Staff-Level Interview Follow-Ups

Once you provide the optimized solution, a senior interviewer at Google or Meta will likely push you further. Here is how to handle the most common follow-ups:

Follow-up 1: "How does this scale to a Distributed System?"

If the input data is too large to fit on a single machine (e.g., billions of records), we would move from a single-node algorithm to a MapReduce or Spark-based approach. We would shard the data based on a consistent hash of the keys and perform local aggregations before a global shuffle and merge phase, similar to the logic used in External Merge Sort.

Follow-up 2: "What are the Concurrency implications?"

In a multi-threaded Java environment, we must ensure that our state (e.g., the DP table or the frequency map) is thread-safe. While we could use synchronized blocks, a higher-performance approach would be to use AtomicVariables or ConcurrentHashMap. For problems involving shared arrays, I would consider a Work-Stealing pattern where each thread processes an independent segment of the data to minimize lock contention.

7. Performance Nuances (The Java Perspective)

Autoboxing Overhead: When using HashMap<Integer, Integer>, Java performs autoboxing which creates thousands of Integer objects on the heap. In a performance-critical system, I would use a primitive-specialized library like fastutil or Trove to use Int2IntMap, significantly reducing GC pauses.
Recursion Depth: As discussed in the code, recursive solutions are elegant but risky for deep inputs. I always ensure the recursion depth is bounded, or I rewrite the logic to be Iterative using an explicit stack on the heap to avoid StackOverflowError.

Key Takeaways

****The Reader (i): Scans every single element in the array looking for a "new" value.
****The Writer (k): Only moves when we find a value that is different from the last one we wrote. It marks the boundary of our "Clean" array.
Lesson: Arrays and Memory Management