The glob module in Python includes a function glob.glob()
, which is used to retrieve files matching a specified pattern, eliminating the need for manual filtering of directory listing results.
The pattern matching is based on the rules of Unix globbing (wildcards such as *
, ?
, [characters]
, and so on), which are similar to regular expressions but much simpler.
Syntax
import glob files = glob.glob(pathname, *, recursive=False)
Parameters
Parameter | Condition | Description |
pathname | Required | A string defining the path and pattern to match files or directories against. |
recursive | Optional | A boolean flag that specifies whether the pattern should match files in a recursive manner. If False (default): Only matches items within the immediate directory specified. If True: Includes files and directories from subdirectories as well. |
Pattern Rules
Here are some pattern rules you can use with glob.glob()
:
Pattern Rule | Description | Examples |
Literal | Matches the character exactly | data.txt matches only the file “data.txt” |
* | Matches zero or more characters | *.jpg matches “image.jpg”, “my_photo.jpg”, etc. |
? | Matches exactly one character | data?.csv matches “data1.csv”, “dataA.csv”, etc. |
[abc] | Matches a single character from the set within the brackets | report-[abc].txt matches “report-a.txt”, “report-b.txt”, etc. |
[0-9] | Matches a single character from the range 0 to 9 | image[0-5].jpg matches “image1.jpg”, “image2.jpg”, etc. |
[!abc] | Matches any character not within the brackets | [!0-9]file.txt matches files not starting with a digit |
Additional Notes:
- Recursive Search: When
recursive
is set to True,**
will match any files and zero or more directories, subdirectories, and symbolic links to directories. For example,glob.glob("**/*.txt", recursive=True)
will match .txt files in all subdirectories. - Combining Patterns: You can combine patterns for more complex matching. For example,
[A-Z]*.pdf
will match files that start with uppercase letters followed by .pdf. - Escape Characters: To match literal instances of
*
,?
, or[
, escape them with a backslash\
Examples
Let’s consider a sample directory tree for following examples:
sample_directory/
├── data.txt
├── report-1.txt
├── report-a.txt
├── report-b.txt
├── my_photo.jpg
├── image1.jpg
├── image2.jpg
├── folder1/
│ ├── a1.txt
│ ├── b2.pdf
└── folder2/
├── c3.txt
Example 1. Literal (Match characters exactly)
This pattern matches an exact file or directory in the specified path.
import glob
# Match data.txt in C:\sample_directory
for name in glob.glob(r'C:\sample_directory\data.txt'):
print(name)
# Output: C:\sample_directory\data.txt
Example 2. Asterisk *
(Match Zero or More Characters)
This pattern matches any number of characters in a file or directory name.
import glob
# Matches all .jpg files in the specified directory
for name in glob.glob(r'C:\sample_directory\*.jpg'):
print(name)
# Output:
# C:\sample_directory\image1.jpg
# C:\sample_directory\image2.jpg
# C:\sample_directory\my_photo.jpg
Example 3. Question Mark ?
(Match Any Single Character)
This pattern matches exactly one character.
import glob
# Matches any file starting with report-, followed by any single character, and ends with .txt
for name in glob.glob(r'C:\sample_directory\report-?.txt'):
print(name)
# Output:
# C:\sample_directory\report-1.txt
# C:\sample_directory\report-a.txt
# C:\sample_directory\report-b.txt
Example 4. Square Brackets []
(Match Any Character in Set)
This pattern matches any one of the enclosed characters.
import glob
# Matches any file starting with report-, followed by a, b, or c, and ends with .txt
for name in glob.glob(r'C:\sample_directory\report-[abc].txt'):
print(name)
# Output:
# C:\sample_directory\report-a.txt
# C:\sample_directory\report-b.txt
# Matches any file starting with image, followed by any number 0 to 5, and ends with .jpg
for name in glob.glob(r'C:\sample_directory\image[0-5].jpg'):
print(name)
# Output:
# C:\sample_directory\image1.jpg
# C:\sample_directory\image2.jpg
Example 5. Negated Square Brackets [!]
(Match Any Character Not in Set)
This pattern matches any one character not enclosed in the brackets.
import glob
# Matches any .txt file not starting with r
for name in glob.glob(r'C:\sample_directory\[!r]*.txt'):
print(name)
# Output: C:\sample_directory\data.txt
Example 6. Recursive **
(Match Directories Recursively)
When recursive=True
is set, this pattern matches directories recursively.
import glob
# Recursively matches all .txt files in the directory and subdirectories
for name in glob.glob(r'C:\sample_directory\**\*.txt', recursive=True):
print(name)
# Output:
# C:\sample_directory\data.txt
# C:\sample_directory\report-1.txt
# C:\sample_directory\report-a.txt
# C:\sample_directory\report-b.txt
# C:\sample_directory\folder1\a1.txt
# C:\sample_directory\folder2\c3.txt
Example 7. Combining Patterns
You can also combine these patterns to create more complex matching criteria. For example, the pattern below matches any .txt or .md files that start with a letter from a to e in the directory.
import glob
# Matches any .txt or .md files that start with a letter from a to e in the directory
for name in glob.glob('/path/to/directory/[a-e]*.{txt,md}', recursive=True):
print(name)
These examples demonstrate the flexibility of glob.glob()
for matching filenames and directory names based on various pattern rules, making it a powerful tool for file system navigation and manipulation in Python.