Usage
The os.walk()
method in the os module is designed to recursively traverse a directory tree. For each directory it encounters, os.walk()
yields a 3-tuple containing:
dirpath
: A string representing the current directory path being processed.dirnames
: A list of subdirectory names within the dirpath.filenames
: A list of non-directory file names within the dirpath.
Syntax
import os entries = os.walk(top,topdown=True,onerror=None,followlinks=False)
Parameters
Parameter | Condition | Description |
top | Required | The path to the top-level directory from which you want to start traversing |
topdown | Optional | If True, directories are scanned from top-down (parent directories before their children). If False, directories are scanned from bottom-up. Default is True. |
onerror | Optional | A function to call if an error occurs during directory access. |
followlinks | Optional | Whether to follow symbolic links or not. Default is False. |
How os.walk() Works?
You provide os.walk()
a starting directory, also known as the ‘top’ directory. It processes the entries for that directory and returns a tuple (dirpath
, dirnames
, filenames
).
The dirnames
part identifies the names of the subdirectories within the current directory. Then, os.walk()
proceeds to call itself recursively for each subdirectory identified, using each one as a new starting point.
This process repeats until it has traversed the entire directory structure, ensuring a thorough exploration of all directories and files from top to bottom.
Basic Example
Let’s consider a sample directory tree for this example:
Example-site
├── index.html
├── css/
│ ├── style.css
└── images/
├── logo.png
In this structure, Example-site contains two folders (css and images), each containing a file (style.css and logo.png, respectively), and there’s an additional file (index.html) directly under Example-site.
Here’s how you can use os.walk()
to navigate through this directory tree, printing the names of the directories and files:
import os
for dirpath, dirnames, filenames in os.walk(r"C:\Example-site"):
print("Current Path:", dirpath)
print("Directories:", dirnames)
print("Files:", filenames)
print() # To separate outputs for each visited directory
# Output:
# Current Path: C:\Example-site
# Directories: ['css', 'images']
# Files: ['index.html']
# Current Path: C:\Example-site\css
# Directories: []
# Files: ['style.css']
# Current Path: C:\Example-site\images
# Directories: []
# Files: ['logo.png']
Topdown vs. Bottom-up Order
The topdown
parameter affects the order in which directories are visited. If topdown
is true or not specified, the triple for a directory is generated before the triples for any of its subdirectories (directories are generated top down). Conversely, if topdown
is false, the triple for a directory is generated after the triples for all of its subdirectories (directories are generated bottom up).
This can be useful for operations that require processing children before their parents, such as deleting directories.
Here’s an example of how to use os.walk()
with topdown=False
to list the contents of a directory tree in bottom-up order:
import os
for dirpath, dirnames, filenames in os.walk(r"C:\Example-site", topdown=False):
print("Current Path:", dirpath)
print("Directories:", dirnames)
print("Files:", filenames)
print() # To separate outputs for each visited directory
# Output:
# Current Path: C:\Example-site\css
# Directories: []
# Files: ['style.css']
# Current Path: C:\Example-site\images
# Directories: []
# Files: ['logo.png']
# Current Path: C:\Example-site
# Directories: ['css', 'images']
# Files: ['index.html']
As you can see, because topdown=False
, the files within the ‘css’ and ‘images’ subdirectories are listed before the contents of the parent directory ‘Example-site’.
Handling Errors
By default, os.walk()
silently ignores errors that might arise while trying to list subdirectories. The onerror
parameter gives you control over how to deal with these potential errors.
It allows you to specify a function that will be called whenever an exception, typically an OSError
, occurs during the directory walk. Common OSErrors
include a PermissionError
, which occurs when your program attempts to access a directory for which it does not have the necessary permissions, and a FileNotFoundError
, which occurs when a directory path no longer exists.
Here’s how you can define an error handler function and pass it to os.walk()
:
def log_error(err):
print(f"Error accessing file: {err.filename}, {err.strerror}")
os.walk('/path', onerror=log_error)
Following Symbolic Links
By default, os.walk()
does not follow symbolic links to subdirectories on systems that support them. To enable this functionality, set the optional argument followlinks
to true.
However, setting followlinks
to True should be done with caution because it can result in infinite recursion if a symbolic link points to its own parent directory.