Recursion on a Tree
Recursion is often used in "tree-like" structures. For example:
Nested dictionaries
File systems
HTML documents
JSON objects
That's because trees can have unknown depth. It's hard to write a series of loops because you don't know how many levels deep the tree goes.
Assignment
You're responsible for a module in Doc2Doc that can scan a file system (represented in our code as nested dictionaries) and create a list of the filenames.
Complete the recursive list_files
function. It accepts two arguments:
parent_directory
: A dictionary of dictionaries representing the current directory. A child directory's value is a dictionary and a file's value isNone
.current_filepath
: A string representing the current path (e.g./dir1/dir2/filename.txt
)
It should return a list of all filepaths in the parent_directory
.
Steps
Return the list of file paths.
Example parent_directory
:
Resulting list of file paths:
Tips
What's the base case? It looks a bit different than before, but it's just when a nested node's value is
None
because in that case we don't have any more directories to explore.You might be wondering why we used a loop! Loops are nice when we know how many times we need to iterate (the number of keys in the dictionary). Recursion is nice when we don't know how many times we need to iterate (the number of nested dictionaries).
Use
.extend()
to add a list's items to another list:
Solution
Explanation
Purpose of the Function
Overall Goal:
The function
list_files
recursively scans a nested dictionary representing a file system and returns a flat list of complete file paths for all files.
How the Structure is Represented:
Each key in the dictionary is a file or directory name.
If the value is
None
, it represents a file.If the value is another dictionary, it represents a subdirectory.
Step 1: Initializing the File List
Snippet:
Explanation:
The function starts by creating an empty list
file_list
that will eventually hold all the file paths.
Callout: This list is used to accumulate results as the function processes each level of the directory structure.
Step 2: Looping Through the Directory
Snippet:
Explanation:
The function iterates over each key in the current dictionary (
parent_directory
).
Callout: Each key represents either a file or a subdirectory.
Step 3: Building the New File Path
Snippet:
Explanation:
This line creates a new path by appending
"/" + key
to the existingcurrent_filepath
.Example: If
current_filepath
is"/dir1"
andkey
is"file.txt"
, thennew_filepath
becomes"/dir1/file.txt"
.
Callout: This ensures that every file or directory has a full path relative to the root.
Step 4: Determining File vs. Directory
Snippet:
Explanation:
The function checks the value associated with the current key.
If
val
isNone
, it means this key is a file.Example: For an entry like
"file.txt": None
, the function adds"/dir1/file.txt"
tofile_list
.
Callout: Files are indicated by a
None
value in this representation.
Step 5: Handling Subdirectories with Recursion
Snippet:
Explanation:
If the value is not
None
, it must be a dictionary representing a subdirectory.The function then calls itself recursively, passing in:
val
(the subdirectory dictionary)new_filepath
(the updated path for this subdirectory)
The recursive call returns a list of file paths from that subdirectory, which are then added to
file_list
using.extend()
.Example: For an entry like
"subdir": { "inner.txt": None }
:If
current_filepath
is"/dir1"
,new_filepath
becomes"/dir1/subdir"
.The recursive call will process the subdirectory and eventually find
"/dir1/subdir/inner.txt"
.
Callout: Recursion allows the function to handle nested directories no matter how deep they go.
Step 6: Returning the Final List
Snippet:
Explanation:
After iterating through all keys in the current directory and processing files and subdirectories, the function returns the accumulated list of file paths.
Callout: This final return provides a complete, flat list of file paths from the entire nested structure.
Tying It All Together
The function begins at a root directory and builds file paths by adding each directory or file name to the current path.
For files, it simply records the complete path.
For directories, it calls itself to explore deeper into the nested structure, ensuring every file is eventually recorded.
The end result is a list of strings, where each string is the full path to a file in the nested directory structure.
Last updated