Explain Codes LogoExplain Codes Logo

How to convert webpage into PDF by using Python

python
pdfkit
weasyprint
pypdf2
Alex KataevbyAlex Kataev·Sep 12, 2024
TLDR

To convert a webpage to PDF in Python, use the pdfkit package. It can be installed using pip install pdfkit. Additionally, you will need to set up wkhtmltopdf:

import pdfkit pdfkit.from_url('http://example.com', 'output.pdf') # Voila! Converting web to PDF is as easy as Py!

This single line fetches the specified webpage and crafts it into an output.pdf file.

Step-by-step guide

Here's a detailed guide to turn a webpage into a PDF document using Python packages. The spotlight is on pdfkit, which uses the mighty wkhtmltopdf tool underneath, creating a high-fidelity rendering.

Before you can start coding, make sure wkhtmltopdf is correctly installed:

  • For MacOS: brew install Caskroom/cask/wkhtmltopdf
  • For Debian/Ubuntu: sudo apt-get install wkhtmltopdf
  • For Windows: Download from wkhtmltopdf releases and add it to your system path.

If you're intimidated by the wkhtmltopdf installation process, consider using the WeasyPrint library instead. Just a quick pip install weasyprint and you're good to go!

Customizing PDF output

Python allows not only to convert webpage to PDF, but also to customize the resulting PDF by using QPrinter:

from PyQt5.QtWidgets import QApplication from PyQt5.QtPrintSupport import QPrinter from PyQt5.QtWebEngineWidgets import QWebEngineView app = QApplication([]) # Let's fire up the Python Photocopier! web = QWebEngineView() printer = QPrinter(QPrinter.PrinterResolution) printer.setOutputFormat(QPrinter.PdfFormat) printer.setOutputFileName('output.pdf') # Say name for your soon-to-be-born PDF! def print_to_pdf(): web.page().print(printer, lambda: print('PDF generated successfully! Who's the Pythonista now?')) app.quit() web.loadFinished.connect(print_to_pdf) web.setUrl(QUrl('http://example.com')) app.exec_()

Note: Don't be hasty, wait for the webpage to fully load with web.loadFinished before you start the PDF creation.

Specific scenarios and troubleshooting

Python allows us to cover a wide range of scenarios:

Merging PDF documents

With PyPDF2, merging PDFs is as easy as pie:

from PyPDF2 import PdfFileMerger merger = PdfFileMerger() merger.append('output1.pdf') merger.append('output2.pdf') # Uniting the PDF clans! merger.write('combined.pdf') # Hello, world! Meet our newly formed PDF! merger.close()

Problems during installation

Did you encounter a hobgoblin (or error, in muggle terms)? Fret not, the official documentation is a dense forest of knowledge, covering a broad spectrum of potential issues.

Formatting the output

Give your resulting PDF baby a shape:

printer.setPageSize(QPageSize.A4) printer.setOrientation(QPrinter.Portrait)

With these commands, you can control the page size and orientation.

Choosing the right tool for the job

For every job, there's a perfect tool. Explore pdfkit, WeasyPrint, or xhtml2pdf to find your Python package soulmate for converting webpages to PDFs.