Python os.walk() Method

Usage

The os.walk() method in the os module is designed to recursively traverse a directory tree. For each directory it encounters, os.walk() yields a 3-tuple containing:

  • dirpath: A string representing the current directory path being processed.
  • dirnames: A list of subdirectory names within the dirpath.
  • filenames: A list of non-directory file names within the dirpath.

Syntax

import os entries = os.walk(top,topdown=True,onerror=None,followlinks=False)

Parameters

ParameterConditionDescription
topRequiredThe path to the top-level directory from which you want to start traversing
topdownOptionalIf True, directories are scanned from top-down (parent directories before their children).
If False, directories are scanned from bottom-up.
Default is True.
onerrorOptionalA function to call if an error occurs during directory access.
followlinksOptionalWhether to follow symbolic links or not.
Default is False.

How os.walk() Works?

You provide os.walk() a starting directory, also known as the ‘top’ directory. It processes the entries for that directory and returns a tuple (dirpath, dirnames, filenames).

The dirnames part identifies the names of the subdirectories within the current directory. Then, os.walk() proceeds to call itself recursively for each subdirectory identified, using each one as a new starting point.

This process repeats until it has traversed the entire directory structure, ensuring a thorough exploration of all directories and files from top to bottom.

Basic Example

Let’s consider a sample directory tree for this example:

Example-site
    ├── index.html
	├── css/
	│   ├── style.css
	└── images/
		├── logo.png

In this structure, Example-site contains two folders (css and images), each containing a file (style.css and logo.png, respectively), and there’s an additional file (index.html) directly under Example-site.

Here’s how you can use os.walk() to navigate through this directory tree, printing the names of the directories and files:

import os

for dirpath, dirnames, filenames in os.walk(r"C:\Example-site"):
    print("Current Path:", dirpath)
    print("Directories:", dirnames)
    print("Files:", filenames)
    print()  # To separate outputs for each visited directory

# Output:
# Current Path: C:\Example-site
# Directories: ['css', 'images']
# Files: ['index.html']

# Current Path: C:\Example-site\css
# Directories: []
# Files: ['style.css']

# Current Path: C:\Example-site\images
# Directories: []
# Files: ['logo.png']

Topdown vs. Bottom-up Order

The topdown parameter affects the order in which directories are visited. If topdown is true or not specified, the triple for a directory is generated before the triples for any of its subdirectories (directories are generated top down). Conversely, if topdown is false, the triple for a directory is generated after the triples for all of its subdirectories (directories are generated bottom up).

This can be useful for operations that require processing children before their parents, such as deleting directories.

Here’s an example of how to use os.walk() with topdown=False to list the contents of a directory tree in bottom-up order:

import os

for dirpath, dirnames, filenames in os.walk(r"C:\Example-site", topdown=False):
    print("Current Path:", dirpath)
    print("Directories:", dirnames)
    print("Files:", filenames)
    print()  # To separate outputs for each visited directory

# Output:
# Current Path: C:\Example-site\css
# Directories: []
# Files: ['style.css']

# Current Path: C:\Example-site\images
# Directories: []
# Files: ['logo.png']

# Current Path: C:\Example-site
# Directories: ['css', 'images']
# Files: ['index.html']

As you can see, because topdown=False, the files within the ‘css’ and ‘images’ subdirectories are listed before the contents of the parent directory ‘Example-site’.

Handling Errors

By default, os.walk() silently ignores errors that might arise while trying to list subdirectories. The onerror parameter gives you control over how to deal with these potential errors.

It allows you to specify a function that will be called whenever an exception, typically an OSError, occurs during the directory walk. Common OSErrors include a PermissionError, which occurs when your program attempts to access a directory for which it does not have the necessary permissions, and a FileNotFoundError, which occurs when a directory path no longer exists.

Here’s how you can define an error handler function and pass it to os.walk():

def log_error(err):
    print(f"Error accessing file: {err.filename}, {err.strerror}")

os.walk('/path', onerror=log_error)

By default, os.walk() does not follow symbolic links to subdirectories on systems that support them. To enable this functionality, set the optional argument followlinks to true.

However, setting followlinks to True should be done with caution because it can result in infinite recursion if a symbolic link points to its own parent directory.