Explain Codes LogoExplain Codes Logo

How to recursively find files in Python

python
pathlib
os-walk
file-search
Nikita BarsukovbyNikita Barsukov·Aug 12, 2024
TLDR

Immediately glob for files at any depth using glob.glob():

import glob files = glob.glob('start_dir/**/*.ext', recursive=True) print(files)

Simply replace 'start_dir' with your directory and '**/*.ext' with your pattern (like **/*.txt for text files). This prints full paths of all matched files.

Harnessing pathlib for object-oriented approach

In Python 3.5 and later, pathlib lets us do an object-oriented file search:

from pathlib import Path files = Path('start_dir').rglob('*.c') for path in files: print(path) # gg ez .c files

This code produces Path objects which are very handy and can easily get file properties such as filename, suffix, or parent directory.

Delving with os.walk

When a tailored file search is needed, os.walk() becomes our best friend:

import os import fnmatch for root, dirs, files in os.walk('start_dir'): for name in fnmatch.filter(files, '*.c'): print(os.path.join(root, name)) # print me like one of your French files

This not only iterates over directories but allows us to filter results and grab details on both visible and hidden files.

Universally searching with glob2

For global searching across subdirectories with older Python versions, we may use glob2:

import glob2 files = glob2.glob('start_dir/**/*.c') # ** refers to any number of subdirectories print(files) # print them all!

Remember: glob2 is an external package - use pip install glob2. The ** wildcard in glob2 lets you recursively search across multiple subdirectories.

Large scale file handling

Performance matters when dealing with a large number of files. os.walk() may outshine other methods with its less overhead when traversing directories:

import os for root, dirs, files in os.walk('LandOfFiles'): # Your sophisticated file magics here

This function eschews unnecessary memory load common in list compiling, thus being light on memory.

Pattern matching precision with fnmatch

Sometimes, searches require more delicate handling. This is where fnmatch excels, offering filename pattern matching:

import os import fnmatch matches = [] for root, dirs, files in os.walk('src'): for filename in fnmatch.filter(files, '*.c'): matches.append(os.path.join(root, filename)) # Gotcha .c file

This piece of code specifically matches all .c files, allowing for multi-pattern and multi-directory search.

Develop efficient generators

An efficient find_files() generator function can become a practical and elegant solution for recursively discovering files:

import os def find_files(directory, pattern): for root, _, files in os.walk(directory): for file in files: if fnmatch.fnmatch(file, pattern): yield os.path.join(root, file) # Generators gonna generate! for file_path in find_files('src', '*.c'): print(file_path) # Keep 'em coming!

Generators, by maintaining a low memory footprint iterating over large data sets, shine for their efficiency!