Explain Codes LogoExplain Codes Logo

How can I iterate over files in a given directory?

python
file-system
pathlib
directory-iteration
Alex KataevbyAlex Kataev·Nov 10, 2024
TLDR

To iterate files in a directory with Python's pathlib simply:

from pathlib import Path # Python 'for' loop - more like a 'foreach', am I right? for file in Path('/your/directory').glob('*'): print(file)

Two liner! Use .glob('*') to match all files and change '/your/directory' with your target directory.

Use specific file types

To filter files by type, employ Path.glob('*.txt') to match only .txt files:

# If file extension '.txt' were a person, this would be stalking for text_file in Path('/your/directory').glob('*.txt'): print(text_file)

Directory and file paths

To handle file paths for joining, pathlib is intuitive. Here's a Pythonic way to create a full file path:

directory = Path('/your/directory') # Stop! It's filepath hammer time ⚒️ for file_path in directory.glob('*'): full_path = directory / file_path print(full_path)

Recursive files

For deep-diving into directories, use rglob for recursion:

# Recursive or Recursin'? Get it? 🐍 for file in Path('/your/directory').rglob('*'): print(file)

rglob('*.py') lists all Python files in all subdirectories, just like a Python file detector!

Efficiency with os.scandir()

For efficient directory scanning 😎, os.scandir() provides detailed info:

import os with os.scandir('/your/directory') as it: # "You shall not pass!" - only files go through for entry in it: if entry.is_file(): print(entry.name)

File system manipulation

An object-oriented approach using pathlib makes for clear code, without need for path joins:

path = Path('/your/directory') # It's showtime for files only! for file in path.iterdir(): if file.is_file(): print(file)

Adjust your viewpoint as we compare files in a directory to a hunt in a garden:

The garden 🏡 is your directory.
for treasure in garden: # Your directory print(treasure) # Treasure! A file in this case

And now, let's visually traverse the garden:

🏡🔍📄: [📄, 📄, 📄] # Every step in the garden reveals a file (document)

Each step is an iteration and every document is a file. Easy, isn't it?

Huge directories and file actions

For dealing with ginormous 😱 directories without consuming much memory, glob.iglob() yields file names:

import glob # 'iglob' is 'glob', but on a diet for filename in glob.iglob('/your/directory/**/*', recursive=True): print(filename)

To manipulate files (rename or move) during the iteration:

# No '.temp' files on my watch! for file in Path('/your/directory').glob('*'): if file.suffix == '.temp': new_name = file.with_suffix('.txt') file.rename(new_name)

Different OS, different rules

Different systems come with different filesystem rules. Handle encoding and decoding with os.fsencode and os.fsdecode. Let pathlib abstract the differences!:

current_directory = Path.cwd() # pathlib knows what separator suits your OS system_separator = Path('/') encoded_name = os.fsencode(file_path) # Once you encode, you must decode (responsibly)! decoded_name = os.fsdecode(encoded_name)

Embrace the power of Unix shell-style wildcards for file pattern matching:

# Wildcard patterns: the '.?[oa]' at the end of the file-hunting season for file in Path('/your/directory').glob('*.?[oa]'): print(file)

Remember to check for symbolic links while about it at your risk 😉:

# It's symbolic link season! for file in Path('/your/directory').glob('*'): if file.is_symlink(): real_path = file.resolve() print(f'{file} -> {real_path}')

Leveraging efficient directory entries

The efficient way to get file stats is through os.scandir():

with os.scandir('/your/directory') as it: # File hide-and-seek 🙈 for entry in it: if not entry.name.startswith('.') and entry.is_file(): info = entry.stat() # "It weighs..." *drumroll* "this much!" print(f'{entry.name}: {info.st_size}')