How to read a file line-by-line into a list?

python

file-io

context-managers

memory-efficiency

byAlex Kataev·Aug 15, 2024

To read a file line-by-line into a list in Python, you can use a list comprehension with the open() function. This approach simultaneously removes trailing whitespaces from each line:

lines = [line.rstrip() for line in open('file.txt', 'r', encoding='utf-8')]

This efficiently condenses opening the file, stripping spaces and line breaks, and constructing our list into one line of code.

Applying best practices for file I/O

When it comes to file input/output (I/O), it's critical to stick to best practices.

Auto-closing files using Context Managers

Always ensure your files are properly closed after use. A context manager (with) makes sure your files are closed automatically:

with open('file.txt', 'r', encoding='utf-8') as file:
    lines = [line.rstrip() for line in file]

# The file is now closed. Rest easy.

Navigating through Large Files

Loading large files can consume much of your memory. Instead, iterate through the file line by line, reducing memory usage:

lines = []
with open('file.txt', 'r', encoding='utf-8') as file:
    for line in file:
        # One small step for code, one giant leap for processing.
        lines.append(line.rstrip())

Python 3.8's Walrus Operator

Why not use the walrus operator (:=)? This operator allows you to assign and evaluate an expression in one step, for a cleaner and more readable code:

lines = []
with open('file.txt', 'r', encoding='utf-8') as file:
    while (line := file.readline().rstrip()):  # Walrus in action. Not an actual walrus, sorry.
        lines.append(line)

Handling different file formats

Different files can require different handling:

Text files: We use mode='r' to tell Python we're dealing with a text file.
Binary files: For files containing non-text data (like images), use mode='rb' to read the file in binary mode.

Respecting file encodings

It's essential to set the character encoding explicitly to 'UTF-8' to prevent misinterpretation of non-ASCII characters:

with open('novel.txt', 'r', encoding='utf-8') as file:
    # Remember: A good novel is UTF-8-encoded.
    novel_lines = [line.strip() for line in file]

Advanced tips and common pitfalls

Master file paths

To ensure the file path is correct and platform-independent, use the pathlib module:

from pathlib import Path

file_path = Path('path/to/your/file.txt')
with file_path.open('r', encoding='utf-8') as file:
    lines = [line.strip() for line in file]

Handle newline characters

Be aware of newline characters ('\n') at the end of each line. The strip() method conveniently removes them.

Apply Lazy evaluation for memory efficiency

To lv up^^ go for a generator function which yields lines one by one, treating memory gently especially relevant for large files.

def read_lines(file_path): 
    with open(file_path, 'r', encoding='utf-8') as file:
        for line in file:
            yield line.strip()

lazy_lines = read_lines('source_code.txt')
# Call the function and make Paul Graham proud.
for line in lazy_lines:
    print(line)