Listing Files in a Directory with Python

To list files in a directory using Python, you have multiple options, each with its strengths. Consider these factors when choosing the best approach:

Simple Listing: If you need a basic list of files and directories in a specific directory, os.listdir() is sufficient.
File Metadata: If you require file information (file type, etc.), os.scandir() offers efficiency and richer details.
Recursive Traversal: To process files throughout an entire directory tree, os.walk() is the appropriate choice.
Pattern-Based Filtering: For matching files based on names or patterns, glob.glob() provides an easy solution.
Object-Oriented Approach: For a modern, object-oriented style and cross-platform compatibility, use pathlib.Path.

Let’s explore each method in more detail:

Using os.listdir()

The os.listdir() is a widely used function in Python’s os module for listing the contents of a directory.

It returns a list containing the names of all entries (files and directories) in the specified directory. If no directory path is specified, it defaults to the current working directory.

Syntax

import os entries = os.listdir(path=’.’)

Python os.listdir() method parameters
Parameter	Condition	Description
path	Optional	A string representing the path of the directory whose contents you want to list. Default value is ‘.’ (current working directory)

Examples

Here’s a simple example of using os.listdir() to list the contents of a directory:

import os

# Specify the directory path
path = '/path/to/directory'

# List all entries in the specified directory
entries = os.listdir(path)

print(entries)
# Output: ['file1.txt', 'folder1', 'image.jpg', ...]

Note that if no directory path is specified, it defaults to the current working directory.

import os

# List all entries in the current directory
entries = os.listdir()

print(entries)
# Output: ['file1.py', ‘file.txt’]

To list all files and directories in the current working directory , you can pass . as the argument, whereas passing .. would list the contents of the parent directory, offering a convenient way to navigate file system hierarchies in Python.

import os

# List all entries in the parent directory
entries = os.listdir(‘..’)

print(entries)
# Output: ['Scripts', ‘Packages’]

As you can see, os.listdir() does not distinguish files from directories. It returns the names of both files and directories, mixed in a single list.

To distinguish files from directories, you must use os.path.isdir() or os.path.isfile().

import os

# Specify the directory path
path = '/path/to/directory'

# List all entries in the directory
entries = os.listdir(path)

for entry in entries:
    # Join the path and entry to get the full path
    full_path = os.path.join(path, entry)
    
    # Check if it's a file or directory
    if os.path.isfile(full_path):
        print(f"{entry} is a file.")
    elif os.path.isdir(full_path):
        print(f"{entry} is a directory.")

# Output:
# file1.txt is a file.
# folder1 is a directory.
# image.jpg is a file.

os.listdir() can raise exceptions, most commonly FileNotFoundError if the specified path does not exist, PermissionError if there is insufficient permission to list the contents of the directory, or OSError for other file system-related errors. It’s good practice to use exception handling (try-except blocks) to deal with these potential errors.

import os

# Specify the directory path
path = '/path/to/directory'

try:
    # Attempt to list the contents of the specified directory
    entries = os.listdir(path)
    print(entries)

except FileNotFoundError:
    # Handle the case where the specified directory does not exist
    print(f"Error: The directory '{path}' does not exist.")

except PermissionError:
    # Handle the case where permission is denied
    print(f"Error: Permission denied to access the directory '{path}'.")

except OSError as error:
    # Handle other OS-related errors
    print(f"Error: An OS error occurred: {error}")

Using os.scandir()

While os.listdir() is sufficient for simple tasks like obtaining a list of names in a directory, it becomes inefficient for large directories, especially when you need to know the entry’s metadata, such as whether it is a directory or a file. This is where the more advanced and memory-efficient os.scandir() function (introduced in Python 3.5) comes in.

Instead of plain filenames, os.scandir() returns an iterator of os.DirEntry objects for each entry in the directory. These objects include not only the filename but also information about the entry’s type (file, directory, etc.). More importantly, os.scandir() retrieves this metadata without additional system calls, resulting in significant performance improvements over os.listdir().

Syntax

import os entries = os.scandir(path=’.’)

Python os.scandir() method parameters
Parameter	Condition	Description
path	Optional	A string representing the path of the directory whose contents you want to list. Default value is ‘.’ (current working directory)

Return Value

os.scandir() returns an iterator of os.DirEntry objects for each entry (file or directory) in the specified directory.

os.DirEntry Object

os.DirEntry objects expose the following key attributes and methods, allowing you to gather information about each entry without making additional system calls:

name: Filename or directory name.
path: The full path to the file or directory.
is_file(): Returns True if the entry is a regular file, otherwise False.
is_dir(): Returns True if the entry is a directory, otherwise False.
Other methods include file type information, access times, and stat data (stat(), lstat(), and so on).

Examples

Here’s an example of using os.scandir() to list the contents of a directory:

import os

# Specify the directory path
path = '/path/to/directory'

# List all entries in the specified directory
entries = os.scandir(path)

# Print the entry names
for entry in entries:
    print(entry.name)

# Output:
# file1.txt
# folder1
# image.jpg

To distinguish between files and directories, you can use the is_file() and is_dir() methods provided by the os.DirEntry objects. These methods return True or False, depending on whether the entry is a file or a directory.

Here’s a simple example of how to use os.scandir() to iterate through all entries in a specified directory and distinguish between files and directories:

import os

# Specify the directory path
path = '/path/to/your/directory/'

# List all entries in the specified directory
entries = os.scandir(path)

for entry in entries:
    # Check if it's a file or directory
    if entry.is_file():
        print(f"{entry.name} is a file.")
    elif entry.is_dir():
        print(f"{entry.name} is a directory.")

# Output:
# file1.txt is a file.
# folder1 is a directory.
# image.jpg is a file.

When using os.scandir() with a with statement, it automatically handles the opening and closing of the directory stream, ensuring that resources are freed up properly once the directory listing is no longer needed.

import os

# Using with statement with os.scandir() for efficient resource management
with os.scandir('/path/to/your/directory/') as entries:
    for entry in entries:
        if entry.is_file():
            print(f"{entry.name} is a file.")
        elif entry.is_dir():
            print(f"{entry.name} is a directory.")

# Output:
# file1.txt is a file.
# folder1 is a directory.
# image.jpg is a file.

Similar to os.listdir(), os.scandir() can raise exceptions like FileNotFoundError if the directory does not exist and PermissionError if access to the directory is denied.

import os

# Specify the directory path
path = '/path/to/directory'

try:
    # Using with statement with os.scandir() for efficient resource management
    with os.scandir(path) as entries:
        for entry in entries:
            if entry.is_file():
                print(f"{entry.name} is a file.")
            elif entry.is_dir():
                print(f"{entry.name} is a directory.")

except FileNotFoundError:
    # Handle the case where the specified directory does not exist
    print(f"Error: The directory '{path}' does not exist.")

except PermissionError:
    # Handle the case where permission is denied
    print(f"Error: Permission denied to access the directory '{path}'.")

except OSError as error:
    # Handle other OS-related errors
    print(f"Error: An OS error occurred: {error}")

Using os.walk()

Both os.listdir() and os.scandir() provide non-recursive directory listings. This means they only list the contents of the directory specified in their argument, without descending into subdirectories to list their contents.

If you need to list files and directories recursively, including all subdirectories and their contents, you should use os.walk(), which is specifically designed for this task.

os.walk() returns a generator for each directory it encounters, yielding a 3-tuple (dirpath, dirnames, and filenames):

dirpath: A string representing the current directory path being processed.
dirnames: A list of subdirectory names within the dirpath.
filenames: A list of non-directory file names within the dirpath.

Here’s How os.walk() Works

You provide os.walk() a starting directory, also known as the ‘top’ directory. It processes the entries for that directory and returns a tuple (dirpath, dirnames, filenames).

The dirnames part identifies the names of the subdirectories within the current directory. Then, os.walk() proceeds to call itself recursively for each subdirectory identified, using each one as a new starting point.

This process repeats until it has traversed the entire directory structure, ensuring a thorough exploration of all directories and files from top to bottom.

Syntax

import os entries = os.walk(top,topdown=True,onerror=None,followlinks=False)

Python os.walk() method parameters
Parameter	Condition	Description
top	Required	The path to the top-level directory from which you want to start traversing
topdown	Optional	If True, directories are scanned from top-down (parent directories before their children). If False, directories are scanned from bottom-up. Default is True.
onerror	Optional	A function to call if an error occurs during directory access.
followlinks	Optional	Whether to follow symbolic links or not.Default is False.

Examples

Let’s consider a sample directory tree for this example:

Example-site
    ├── index.html
	├── css/
	│   ├── style.css
	└── images/
		├── logo.png

In this structure, Example-site contains two folders (css and images), each containing a file (style.css and logo.png, respectively), and there’s an additional file (index.html) directly under Example-site.

Here’s how you can use os.walk() to navigate through this directory tree, printing the names of the directories and files:

import os

for dirpath, dirnames, filenames in os.walk(r"C:\Example-site"):
    print("Current Path:", dirpath)
    print("Directories:", dirnames)
    print("Files:", filenames)
    print()  # To separate outputs for each visited directory

# Output:
# Current Path: C:\Example-site
# Directories: ['css', 'images']
# Files: ['index.html']

# Current Path: C:\Example-site\css
# Directories: []
# Files: ['style.css']

# Current Path: C:\Example-site\images
# Directories: []
# Files: ['logo.png']

Topdown vs. Bottom-up

The topdown parameter affects the order in which directories are visited. If topdown is true or not specified, the triple for a directory is generated before the triples for any of its subdirectories (directories are generated top down). Conversely, if topdown is false, the triple for a directory is generated after the triples for all of its subdirectories (directories are generated bottom up).

This can be useful for operations that require processing children before their parents, such as deleting directories.

Here’s an example of how to use os.walk() with topdown=False to list the contents of a directory tree in bottom-up order:

import os

for dirpath, dirnames, filenames in os.walk(r"C:\Example-site", topdown=False):
    print("Current Path:", dirpath)
    print("Directories:", dirnames)
    print("Files:", filenames)
    print()  # To separate outputs for each visited directory

# Output:
# Current Path: C:\Example-site\css
# Directories: []
# Files: ['style.css']

# Current Path: C:\Example-site\images
# Directories: []
# Files: ['logo.png']

# Current Path: C:\Example-site
# Directories: ['css', 'images']
# Files: ['index.html']

As you can see, because topdown=False, the files within the ‘css’ and ‘images’ subdirectories are listed before the contents of the parent directory ‘Example-site’.

Handling Errors

By default, os.walk() silently ignores errors that might arise while trying to list subdirectories. The onerror parameter gives you control over how to deal with these potential errors.

It allows you to specify a function that will be called whenever an exception, typically an OSError, occurs during the directory walk. Common OSErrors include a PermissionError, which occurs when your program attempts to access a directory for which it does not have the necessary permissions, and a FileNotFoundError, which occurs when a directory path no longer exists.

Here’s how you can define an error handler function and pass it to os.walk():

def log_error(err):
    print(f"Error accessing file: {err.filename}, {err.strerror}")

os.walk('/path', onerror=log_error)

Following Symbolic Links

By default, os.walk() does not follow symbolic links to subdirectories on systems that support them. To enable this functionality, set the optional argument followlinks to true.

However, setting followlinks to True should be done with caution because it can result in infinite recursion if a symbolic link points to its own parent directory.

Using glob.glob()

The glob module in Python includes a function glob.glob(), which is used to retrieve files matching a specified pattern, eliminating the need for manual filtering of directory listing results.

The pattern matching is based on the rules of Unix globbing (wildcards such as *,?, [characters], and so on), which are similar to regular expressions but much simpler.

Syntax

import glob files = glob.glob(pathname,*,recursive=False)

Python glob.glob() method parameters
Parameter	Condition	Description
pathname	Required	A string defining the path and pattern to match files or directories against.
recursive	Optional	A boolean flag that specifies whether the pattern should match files in a recursive manner. If False (default): Only matches items within the immediate directory specified. If True: Includes files and directories from subdirectories as well.

Pattern Rules

Here are some pattern rules you can use with glob.glob():

Pattern Rule	Description	Examples
Literal	Matches the character exactly	data.txt matches only the file “data.txt”
*	Matches zero or more characters	*.jpg matches “image.jpg”, “my_photo.jpg”, etc.
?	Matches exactly one character	data?.csv matches “data1.csv”, “dataA.csv”, etc.
[abc]	Matches a single character from the set within the brackets	report-[abc].txt matches “report-a.txt”, “report-b.txt”, etc.
[0-9]	Matches a single character from the range 0 to 9	image[0-5].jpg matches “image1.jpg”, “image2.jpg”, etc.
[!abc]	Matches any character not within the brackets	[!0-9]file.txt matches files not starting with a digit

Additional Notes:

Recursive Search: When recursive is set to True, ** will match any files and zero or more directories, subdirectories, and symbolic links to directories. For example, glob.glob("**/*.txt", recursive=True) will match .txt files in all subdirectories.
Combining Patterns: You can combine patterns for more complex matching. For example, [A-Z]*.pdf will match files starting with uppercase letters followed by .pdf.
Escape Characters: To match literal instances of *, ?, or [, escape them with a backslash \

Examples

Let’s consider a sample directory tree for following examples:

sample_directory/
├── data.txt
├── report-1.txt
├── report-a.txt
├── report-b.txt
├── my_photo.jpg
├── image1.jpg
├── image2.jpg
├── folder1/
│   ├── a1.txt
│   ├── b2.pdf
└── folder2/
    ├── c3.txt

Example 1. Literal (Match characters exactly)

import glob

# Matches all files and directories in the specified path
for name in glob.glob(r'C:\sample_directory\data.txt'):
    print(name)

# Output: C:\sample_directory\data.txt

Example 2. Asterisk * (Match Zero or More Characters)

This pattern matches any number of characters in a file or directory name.

import glob

# Matches all .jpg files in the specified directory
for name in glob.glob(r'C:\sample_directory\*.jpg'):
    print(name)

# Output:
# C:\sample_directory\image1.jpg
# C:\sample_directory\image2.jpg
# C:\sample_directory\my_photo.jpg

Example 3. Question Mark ? (Match Any Single Character)

This pattern matches exactly one character.

import glob

# Matches any file starting with report-, followed by any single character, and ends with .txt
for name in glob.glob(r'C:\sample_directory\report-?.txt'):
    print(name)

# Output:
# C:\sample_directory\report-1.txt
# C:\sample_directory\report-a.txt
# C:\sample_directory\report-b.txt

Example 4. Square Brackets [] (Match Any Character in Set)

This pattern matches any of the enclosed characters.

import glob

# Matches any file starting with report-, followed by a, b, or c, and ends with .txt
for name in glob.glob(r'C:\sample_directory\report-[abc].txt'):
    print(name)

# Output: C:\sample_directory\report-a.txt
# Output: C:\sample_directory\report-b.txt

# Matches any file starting with image, followed by any number 0 to 5, and ends with .jpg
for name in glob.glob(r'C:\sample_directory\image[0-5].jpg'):
    print(name)

# Output: C:\sample_directory\image1.jpg
# Output: C:\sample_directory\image2.jpg

Example 5. Negated Square Brackets [!] (Match Any Character Not in Set)

This pattern matches any one character not enclosed in the brackets.

import glob

# Matches any .txt file not starting with r 
for name in glob.glob(r'C:\sample_directory\[!r]*.txt'):
    print(name)

# Output: C:\sample_directory\data.txt

Example 6. Recursive ** (Match Directories Recursively)

By default, glob.glob() only matches within the specified directory. To search subdirectories recursively, set recursive=True.

When recursive searching is enabled, ** will match any files and zero or more directories, subdirectories, and symbolic links to directories.

import glob

# Recursively matches all .txt files in the directory and subdirectories
for name in glob.glob(r'C:\sample_directory\**\*.txt', recursive=True):
    print(name)

# Output:
# C:\sample_directory\data.txt
# C:\sample_directory\report-1.txt
# C:\sample_directory\report-a.txt
# C:\sample_directory\report-b.txt
# C:\sample_directory\folder1\a1.txt
# C:\sample_directory\folder2\c3.txt

Using pathlib.Path

The pathlib module, introduced in Python 3.4, includes the Path class, which is designed to provide an object-oriented interface to the filesystem. The Path class encapsulates filesystem paths as objects and provides methods for common pathname manipulation, file operations, and directory traversal. This object-oriented approach makes managing files and directories easier.

When it comes to listing files in a directory using pathlib.Path, there are several methods and properties that you can use, primarily .iterdir(), .glob(), and .rglob(). Here’s how each of these methods can be used:

.iterdir()

The .iterdir() method returns an iterator of Path objects for each entry (file or directory) in the specified directory (not recursive).

from pathlib import Path

path = Path('/path/to/directory')
for entry in path.iterdir():
    print(entry.name)  # Output: the name of each entry

# Output:
# file1.txt
# folder1
# image.jpg

To filter for files only, use the .is_file() method on each entry. Similarly, to filter for directories only, use the .is_dir() method.

from pathlib import Path

path = Path('/path/to/directory')
for entry in path.iterdir():
    if entry.is_file():
        print(f"{entry.name} is a file.")
    elif entry.is_dir():
        print(f"{entry.name} is a directory.")

# Output:
# file1.txt is a file.
# folder1 is a directory.
# image.jpg is a file.

.glob(pattern)

The .glob(pattern) method returns an iterator of Path objects matching the given pattern (similar to glob.glob()).

The pattern can include wildcards, such as * for any number of characters, ? for a single character, and [seq] for any character in seq.

from pathlib import Path

path = Path('/path/to/directory')
for file in path.glob('*.txt'):
    print(file)  # Output: paths of all .txt files in the directory

# Output: file1.txt

This method does not search recursively. It matches only files in the current directory.

.rglob(pattern)

.rglob() works similarly to .glob(), but it searches recursively through all subdirectories for files that match the pattern.

from pathlib import Path

path = Path('/path/to/directory')
for file in path.rglob('*.txt'):
    print(file)  # Output: paths of all .txt files in the directory and subdirectories

# Output:
# file1.txt
# file2.txt
# file3.txt