How can I iterate over files in a given directory?

python

file-system

pathlib

directory-iteration

byAlex Kataev·Nov 10, 2024

To iterate files in a directory with Python's pathlib simply:

from pathlib import Path

# Python 'for' loop - more like a 'foreach', am I right?
for file in Path('/your/directory').glob('*'):
    print(file)

Two liner! Use .glob('*') to match all files and change '/your/directory' with your target directory.

Use specific file types

To filter files by type, employ Path.glob('*.txt') to match only .txt files:

# If file extension '.txt' were a person, this would be stalking
for text_file in Path('/your/directory').glob('*.txt'):
    print(text_file)

Directory and file paths

To handle file paths for joining, pathlib is intuitive. Here's a Pythonic way to create a full file path:

directory = Path('/your/directory')
# Stop! It's filepath hammer time ⚒️
for file_path in directory.glob('*'):
    full_path = directory / file_path
    print(full_path)

Recursive files

For deep-diving into directories, use rglob for recursion:

# Recursive or Recursin'? Get it? 🐍
for file in Path('/your/directory').rglob('*'):
    print(file)

rglob('*.py') lists all Python files in all subdirectories, just like a Python file detector!

Efficiency with os.scandir()

For efficient directory scanning 😎, os.scandir() provides detailed info:

import os

with os.scandir('/your/directory') as it:
    # "You shall not pass!" - only files go through
    for entry in it:
        if entry.is_file():
            print(entry.name)

File system manipulation

An object-oriented approach using pathlib makes for clear code, without need for path joins:

path = Path('/your/directory')
# It's showtime for files only!
for file in path.iterdir():
    if file.is_file():
        print(file)

Adjust your viewpoint as we compare files in a directory to a hunt in a garden:

The garden 🏡 is your directory.

for treasure in garden: # Your directory
    print(treasure)  # Treasure! A file in this case

And now, let's visually traverse the garden:

🏡🔍📄: [📄, 📄, 📄]
# Every step in the garden reveals a file (document)

Each step is an iteration and every document is a file. Easy, isn't it?

Huge directories and file actions

For dealing with ginormous 😱 directories without consuming much memory, glob.iglob() yields file names:

import glob

# 'iglob' is 'glob', but on a diet
for filename in glob.iglob('/your/directory/**/*', recursive=True):
    print(filename)

To manipulate files (rename or move) during the iteration:

# No '.temp' files on my watch!
for file in Path('/your/directory').glob('*'):
    if file.suffix == '.temp':
        new_name = file.with_suffix('.txt')
        file.rename(new_name)

Different OS, different rules

Different systems come with different filesystem rules. Handle encoding and decoding with os.fsencode and os.fsdecode. Let pathlib abstract the differences!:

current_directory = Path.cwd()
# pathlib knows what separator suits your OS
system_separator = Path('/')  

encoded_name = os.fsencode(file_path)
# Once you encode, you must decode (responsibly)!
decoded_name = os.fsdecode(encoded_name)

Unix pattern matching and symbolic link checks

Embrace the power of Unix shell-style wildcards for file pattern matching:

# Wildcard patterns: the '.?[oa]' at the end of the file-hunting season
for file in Path('/your/directory').glob('*.?[oa]'):
    print(file)

Remember to check for symbolic links while about it at your risk 😉:

# It's symbolic link season!
for file in Path('/your/directory').glob('*'):
    if file.is_symlink():
        real_path = file.resolve()
        print(f'{file} -> {real_path}')

Leveraging efficient directory entries

The efficient way to get file stats is through os.scandir():

with os.scandir('/your/directory') as it:
    # File hide-and-seek 🙈
    for entry in it:
        if not entry.name.startswith('.') and entry.is_file():
            info = entry.stat()
            # "It weighs..." *drumroll* "this much!"
            print(f'{entry.name}: {info.st_size}')