Explain Codes LogoExplain Codes Logo

Merge PDF files

python
pdf-engineering
functions
best-practices
Anton ShumikhinbyAnton Shumikhin·Feb 28, 2025
TLDR

Merge PDFs in Python using PyPDF2 with a seamless approach. Get the library using pip install PyPDF2, create a merger object, pile up files, and pen it down:

from PyPDF2 import PdfMerger merger = PdfMerger() # Who doesn't like a PDF party? for pdf in ['file1.pdf', 'file2.pdf']: merger.append(pdf) # Inviting PDFs to the party merger.write("merged.pdf") # The party photo album merger.close() # Bye, time to clean up

Three easy steps: import PdfMerger, make a append(pdf) loop, and write("merged.pdf"). It's as simple as driving a Tesla.

Nailing advanced merging

In the epic game of Thrones of PDF operations, libraries like PyPDF2, PyMuPDF, and pdfrw are your dragons. Let's ride them:

Page-by-page conquest with PyPDF2

Enjoy power over the fiefdom of page ranges using the pages keyword argument. Wage war on disorganization:

from PyPDF2 import PdfMerger merger = PdfMerger() # Selecting the fiercest pages for the fight merger.append('file1.pdf', pages=(0, 3)) # We want the first three pages of file1.pdf merger.append('file2.pdf', pages=(2, None)) # The fight starts from page 3 till the end in file2.pdf merger.write('merged.pdf') merger.close() # Victory drink time

PyMuPDF: The dragon of high-performance merging

For merging faster than a Valyrian sword cut, saddle up PyMuPDF:

import fitz # PyMuPDF, short and classy pdf_writer = fitz.open() for pdf in ['file1.pdf', 'file2.pdf']: pdf_document = fitz.open(pdf) pdf_writer.insert_pdf(pdf_document) pdf_document.close() # Another one bites the dust pdf_writer.save('merged.pdf') pdf_writer.close() # And it's a wrap

Conquering directories with a batch

When playing the game of thrones with directories, use your armies (os.listdir() or glob.glob()) for batch attacks:

import glob from PyPDF2 import PdfMerger pdf_files = glob.glob('*.pdf') merger = PdfMerger() for pdf in pdf_files: merger.append(pdf) # Here comes my army merger.write('directory_merged.pdf') merger.close() # Tell them Winter came for House Globs

Remember: 'rb' and 'wb' protect you from I/O white walkers. And close the castle gates (manage file closures) to avert resource leaks and dreaded data corruption.

Advanced methods to level up

Every professional Pythonist needs a few good grimoires up their sleeves:

PDFly casting spells

The pdfly cat command is an incantation for console wizards:

$ pdfly cat file1.pdf file2.pdf -o merged.pdf

Conjuring merge functions

Master the ancient art of spell-making by crafting your own merge function:

def merge_pdfs(file_list, output_name): merger = PdfMerger() for pdf in file_list: merger.append(pdf) merger.write(output_name) merger.close() file_names = ['file1.pdf', 'file2.pdf'] merge_pdfs(file_names, 'hogwarts_story.pdf') # A magical story awaits

Safeguarding ancient knowledge

Practice defensive spellcasting (error handling) to protect your repository of PDF artifacts:

try: merge_pdfs(file_names, 'safe_hogwarts_story.pdf') except Exception as e: print(f"Dementors attacked the script: {e}") # Where's Harry when you need him?

Remember: Aim for the Defense Against the Dark Arts professorship with robust error handling.