Python glob.glob() Function

The glob module in Python includes a function glob.glob(), which is used to retrieve files matching a specified pattern, eliminating the need for manual filtering of directory listing results.

The pattern matching is based on the rules of Unix globbing (wildcards such as *, ?, [characters], and so on), which are similar to regular expressions but much simpler.

Syntax

import glob files = glob.glob(pathname, *, recursive=False)

Parameters

ParameterConditionDescription
pathnameRequiredA string defining the path and pattern to match files or directories against.
recursiveOptionalA boolean flag that specifies whether the pattern should match files in a recursive manner.
If False (default): Only matches items within the immediate directory specified.
If True: Includes files and directories from subdirectories as well.

Pattern Rules

Here are some pattern rules you can use with glob.glob():

Pattern RuleDescriptionExamples
LiteralMatches the character exactlydata.txt matches only the file “data.txt”
*Matches zero or more characters*.jpg matches “image.jpg”, “my_photo.jpg”, etc.
?Matches exactly one characterdata?.csv matches “data1.csv”, “dataA.csv”, etc.
[abc]Matches a single character from the set within the bracketsreport-[abc].txt matches “report-a.txt”, “report-b.txt”, etc.
[0-9]Matches a single character from the range 0 to 9image[0-5].jpg matches “image1.jpg”, “image2.jpg”, etc.
[!abc]Matches any character not within the brackets[!0-9]file.txt matches files not starting with a digit

Additional Notes:

  • Recursive Search: When recursive is set to True, ** will match any files and zero or more directories, subdirectories, and symbolic links to directories. For example, glob.glob("**/*.txt", recursive=True) will match .txt files in all subdirectories.
  • Combining Patterns: You can combine patterns for more complex matching. For example, [A-Z]*.pdf will match files that start with uppercase letters followed by .pdf.
  • Escape Characters: To match literal instances of *, ?, or [, escape them with a backslash \

Examples

Let’s consider a sample directory tree for following examples:

sample_directory/
├── data.txt
├── report-1.txt
├── report-a.txt
├── report-b.txt
├── my_photo.jpg
├── image1.jpg
├── image2.jpg
├── folder1/
│   ├── a1.txt
│   ├── b2.pdf
└── folder2/
    ├── c3.txt

Example 1. Literal (Match characters exactly)

This pattern matches an exact file or directory in the specified path.

import glob

# Match data.txt in C:\sample_directory
for name in glob.glob(r'C:\sample_directory\data.txt'):
    print(name)

# Output: C:\sample_directory\data.txt

Example 2. Asterisk * (Match Zero or More Characters)

This pattern matches any number of characters in a file or directory name.

import glob

# Matches all .jpg files in the specified directory
for name in glob.glob(r'C:\sample_directory\*.jpg'):
    print(name)

# Output:
# C:\sample_directory\image1.jpg
# C:\sample_directory\image2.jpg
# C:\sample_directory\my_photo.jpg

Example 3. Question Mark ? (Match Any Single Character)

This pattern matches exactly one character.

import glob

# Matches any file starting with report-, followed by any single character, and ends with .txt
for name in glob.glob(r'C:\sample_directory\report-?.txt'):
    print(name)

# Output:
# C:\sample_directory\report-1.txt
# C:\sample_directory\report-a.txt
# C:\sample_directory\report-b.txt

Example 4. Square Brackets [] (Match Any Character in Set)

This pattern matches any one of the enclosed characters.

import glob

# Matches any file starting with report-, followed by a, b, or c, and ends with .txt
for name in glob.glob(r'C:\sample_directory\report-[abc].txt'):
    print(name)

# Output:
# C:\sample_directory\report-a.txt
# C:\sample_directory\report-b.txt
# Matches any file starting with image, followed by any number 0 to 5, and ends with .jpg
for name in glob.glob(r'C:\sample_directory\image[0-5].jpg'):
    print(name)

# Output:
# C:\sample_directory\image1.jpg
# C:\sample_directory\image2.jpg

Example 5. Negated Square Brackets [!] (Match Any Character Not in Set)

This pattern matches any one character not enclosed in the brackets.

import glob

# Matches any .txt file not starting with r 
for name in glob.glob(r'C:\sample_directory\[!r]*.txt'):
    print(name)

# Output: C:\sample_directory\data.txt

Example 6. Recursive ** (Match Directories Recursively)

When recursive=True is set, this pattern matches directories recursively.

import glob

# Recursively matches all .txt files in the directory and subdirectories
for name in glob.glob(r'C:\sample_directory\**\*.txt', recursive=True):
    print(name)

# Output:
# C:\sample_directory\data.txt
# C:\sample_directory\report-1.txt
# C:\sample_directory\report-a.txt
# C:\sample_directory\report-b.txt
# C:\sample_directory\folder1\a1.txt
# C:\sample_directory\folder2\c3.txt

Example 7. Combining Patterns

You can also combine these patterns to create more complex matching criteria. For example, the pattern below matches any .txt or .md files that start with a letter from a to e in the directory.

import glob

# Matches any .txt or .md files that start with a letter from a to e in the directory
for name in glob.glob('/path/to/directory/[a-e]*.{txt,md}', recursive=True):
    print(name)

These examples demonstrate the flexibility of glob.glob() for matching filenames and directory names based on various pattern rules, making it a powerful tool for file system navigation and manipulation in Python.