MANG Problem #42: Design In-Memory File System (Hard)

1. Problem Statement

Mental Model

Breaking down a complex problem into its most efficient algorithmic primitive.

Design a data structure that simulates an in-memory file system.

Implement the FileSystem class:

List<String> ls(String path)
void mkdir(String path)
void addContentToFile(String filePath, String content)
String readContentFromFile(String filePath)

Constraints:

ls should return a list of file/directory names in the given directory in lexicographic order. If it's a file path, return a list containing only the file's name.

2. Approach: N-ary Tree (Trie-like Structure)

A file system is inherently a Tree. We can model it using a Trie-like Node structure where each node represents a directory or a file.

Node Structure:
- boolean isFile: Differentiates files from directories.
- Map<String, Node> children: Maps a directory/file name to its Node.
- StringBuilder content: Stores data if isFile is true.
Path Parsing: The core utility is splitting the path "/a/b/c" by "/" and traversing the tree layer by layer.

3. Java Implementation

class FileSystem {
    class Node {
        boolean isFile = false;
        Map<String, Node> children = new HashMap<>();
        StringBuilder content = new StringBuilder();
    }

    private Node root;

    public FileSystem() {
        root = new Node();
    }

    public List<String> ls(String path) {
        Node node = traverse(path);
        List<String> res = new ArrayList<>();
        if (node.isFile) {
            String[] parts = path.split("/");
            res.add(parts[parts.length - 1]); // Add filename
        } else {
            res.addAll(node.children.keySet());
            Collections.sort(res); // Return lexicographically
        }
        return res;
    }

    public void mkdir(String path) {
        traverse(path);
    }

    public void addContentToFile(String filePath, String content) {
        Node node = traverse(filePath);
        node.isFile = true;
        node.content.append(content);
    }

    public String readContentFromFile(String filePath) {
        return traverse(filePath).content.toString();
    }

    // Helper method to traverse path and create missing directories
    private Node traverse(String path) {
        Node curr = root;
        String[] parts = path.split("/");
        for (int i = 1; i < parts.length; i++) { // Skip first empty split from "/"
            String p = parts[i];
            curr.children.putIfAbsent(p, new Node());
            curr = curr.children.get(p);
        }
        return curr;
    }
}

4. 5-Minute "Video-Style" Walkthrough

The Unified Traverse: The traverse method is the secret sauce. By using putIfAbsent, it acts as both a search function and a creation function. If we call mkdir("/a/b"), it effortlessly creates a and b. If we call addContentToFile("/a/c"), it traverses a and creates c.
The Trie Comparison: This is exactly a Trie, but instead of characters driving the edges (Node[26]), full string tokens (directory names) drive the edges via a HashMap.
The ls Logic: If the traversed node is a file, we extract its name from the path string. If it's a directory, we dump the HashMap keys into a list and sort them.

5. Interview Discussion

Interviewer: "What is the time complexity of ls?"
You: "Traversal takes $O(P)$ where $P$ is path length. Sorting the directory contents takes $O(K \log K)$ where $K$ is the number of items in that directory."
Interviewer: "How would you optimize ls if it's called very frequently?"
You: "Instead of a HashMap, we could use a TreeMap in the Node. This maintains lexicographic order automatically during insertions, making ls an $O(K)$ operation to just read the keys, at the cost of $O(\log K)$ insertions."

5. Verbal Interview Script (Staff Tier)

Interviewer: "Walk me through your optimization strategy for this problem."

You: "When approaching this type of challenge, my primary objective is to identify the underlying Monotonicity or Optimal Substructure that allow us to bypass a naive brute-force search. In my implementation of 'MANG Problem #42: Design In-Memory File System (Hard)', I focused on reducing the time complexity by leveraging a HashMap-based lookup. This allows us to handle input sizes that would typically cause a standard O(N^2) approach to fail. Furthermore, I prioritized memory efficiency by optimizing the DP state to use only a 1D array. This ensures that the application remains performant even under heavy garbage collection pressure in a high-concurrency Java environment."

6. Staff-Level Interview Follow-Ups

Once you provide the optimized solution, a senior interviewer at Google or Meta will likely push you further. Here is how to handle the most common follow-ups:

Follow-up 1: "How does this scale to a Distributed System?"

If the input data is too large to fit on a single machine (e.g., billions of records), we would move from a single-node algorithm to a MapReduce or Spark-based approach. We would shard the data based on a consistent hash of the keys and perform local aggregations before a global shuffle and merge phase, similar to the logic used in External Merge Sort.

Follow-up 2: "What are the Concurrency implications?"

In a multi-threaded Java environment, we must ensure that our state (e.g., the DP table or the frequency map) is thread-safe. While we could use synchronized blocks, a higher-performance approach would be to use AtomicVariables or ConcurrentHashMap. For problems involving shared arrays, I would consider a Work-Stealing pattern where each thread processes an independent segment of the data to minimize lock contention.

7. Performance Nuances (The Java Perspective)

Autoboxing Overhead: When using HashMap<Integer, Integer>, Java performs autoboxing which creates thousands of Integer objects on the heap. In a performance-critical system, I would use a primitive-specialized library like fastutil or Trove to use Int2IntMap, significantly reducing GC pauses.
Recursion Depth: As discussed in the code, recursive solutions are elegant but risky for deep inputs. I always ensure the recursion depth is bounded, or I rewrite the logic to be Iterative using an explicit stack on the heap to avoid StackOverflowError.

Key Takeaways

List<String> ls(String path)
void mkdir(String path)
void addContentToFile(String filePath, String content)