Recursion on a Tree

Recursion is often used in "tree-like" structures. For example:

  • Nested dictionaries

  • File systems

  • HTML documents

  • JSON objects

That's because trees can have unknown depth. It's hard to write a series of loops because you don't know how many levels deep the tree goes.

for entry_i in directory:
    if entry_i.is_dir:
        for entry_j in entry_i:
            if entry_j.is_dir:
                for entry_k in entry_j:
                    ...

Assignment

You're responsible for a module in Doc2Doc that can scan a file system (represented in our code as nested dictionaries) and create a list of the filenames.

Complete the recursive list_files function. It accepts two arguments:

  • parent_directory: A dictionary of dictionaries representing the current directory. A child directory's value is a dictionary and a file's value is None.

  • current_filepath: A string representing the current path (e.g. /dir1/dir2/filename.txt)

It should return a list of all filepaths in the parent_directory.

Steps

Example parent_directory:

{
    "Documents": {
        "Proposal.docx": None,
        "Receipts": {
            "January": {
                "receipt1.txt": None,
                "receipt2.txt": None
            },
            "February": {
                "receipt3.txt": None
            }
        }
    },
}

Resulting list of file paths:

[
    "/Documents/Proposal.docx",
    "/Documents/Receipts/January/receipt1.txt",
    "/Documents/Receipts/January/receipt2.txt",
    "/Documents/Receipts/February/receipt3.txt"
]

Tips

  • What's the base case? It looks a bit different than before, but it's just when a nested node's value is None because in that case we don't have any more directories to explore.

  • You might be wondering why we used a loop! Loops are nice when we know how many times we need to iterate (the number of keys in the dictionary). Recursion is nice when we don't know how many times we need to iterate (the number of nested dictionaries).

  • Use .extend() to add a list's items to another list:

items = ["six-string lute", "petrified dragon egg", "the only ring"]
items.extend(["philospher's stone", "invisibility cloak", "moistened scimitar"])
# ["six-string lute", "petrified dragon egg", "the only ring", "philospher's stone", "invisibility cloak", "moistened scimitar"]

Solution

def list_files(parent_directory, current_filepath=""):
    file_list = []
    for key in parent_directory:
        new_filepath = current_filepath + "/" + key
        val = parent_directory[key]
        if val is None:
            file_list.append(new_filepath)
        else:
            file_list.extend(list_files(val, new_filepath))
    return file_list    

Explanation

  • Purpose of the Function

    • Overall Goal:

      • The function list_files recursively scans a nested dictionary representing a file system and returns a flat list of complete file paths for all files.

    • How the Structure is Represented:

      • Each key in the dictionary is a file or directory name.

      • If the value is None, it represents a file.

      • If the value is another dictionary, it represents a subdirectory.


  • Step 1: Initializing the File List

    • Snippet:

      file_list = []
    • Explanation:

      • The function starts by creating an empty list file_list that will eventually hold all the file paths.

    • Callout: This list is used to accumulate results as the function processes each level of the directory structure.


  • Step 2: Looping Through the Directory

    • Snippet:

      for key in parent_directory:
    • Explanation:

      • The function iterates over each key in the current dictionary (parent_directory).

    • Callout: Each key represents either a file or a subdirectory.


  • Step 3: Building the New File Path

    • Snippet:

      new_filepath = current_filepath + "/" + key
    • Explanation:

      • This line creates a new path by appending "/" + key to the existing current_filepath.

      • Example: If current_filepath is "/dir1" and key is "file.txt", then new_filepath becomes "/dir1/file.txt".

    • Callout: This ensures that every file or directory has a full path relative to the root.


  • Step 4: Determining File vs. Directory

    • Snippet:

      val = parent_directory[key]
      if val is None:
          file_list.append(new_filepath)
    • Explanation:

      • The function checks the value associated with the current key.

      • If val is None, it means this key is a file.

      • Example: For an entry like "file.txt": None, the function adds "/dir1/file.txt" to file_list.

    • Callout: Files are indicated by a None value in this representation.


  • Step 5: Handling Subdirectories with Recursion

    • Snippet:

      else:
          file_list.extend(list_files(val, new_filepath))
    • Explanation:

      • If the value is not None, it must be a dictionary representing a subdirectory.

      • The function then calls itself recursively, passing in:

        • val (the subdirectory dictionary)

        • new_filepath (the updated path for this subdirectory)

      • The recursive call returns a list of file paths from that subdirectory, which are then added to file_list using .extend().

      • Example: For an entry like "subdir": { "inner.txt": None }:

        • If current_filepath is "/dir1", new_filepath becomes "/dir1/subdir".

        • The recursive call will process the subdirectory and eventually find "/dir1/subdir/inner.txt".

    • Callout: Recursion allows the function to handle nested directories no matter how deep they go.


  • Step 6: Returning the Final List

    • Snippet:

      return file_list
    • Explanation:

      • After iterating through all keys in the current directory and processing files and subdirectories, the function returns the accumulated list of file paths.

    • Callout: This final return provides a complete, flat list of file paths from the entire nested structure.


  • Tying It All Together

    • The function begins at a root directory and builds file paths by adding each directory or file name to the current path.

    • For files, it simply records the complete path.

    • For directories, it calls itself to explore deeper into the nested structure, ensuring every file is eventually recorded.

    • The end result is a list of strings, where each string is the full path to a file in the nested directory structure.

Last updated