Explain Codes LogoExplain Codes Logo

How to Pretty Print HTML to a file, with indentation

html
html-beautification
lxml
html5print
Nikita BarsukovbyNikita Barsukov·Dec 29, 2024
TLDR

Ease into your Python adventure by using BeautifulSoup for simplifying HTML/code beautification. Kick start this journey with pip install beautifulsoup4, then send your HTML string for a spa day, courtesy of the prettify() method, and write the results like a cherished diary entry to a file:

from bs4 import BeautifulSoup # Your precious lucky HTML ticket goes here! html_content = "<html_here>" soup = BeautifulSoup(html_content, 'html.parser') # And here it comes, more radiant than ever! pretty_html = soup.prettify() # Open a file and throw in the beauty! with open('formatted.html', 'w') as file: file.write(pretty_html)

Voila, your chaotic HTML has transformed into a formatting masterpiece with a grand welcome at formatted.html!

The code makeover toolbox

Although BeautifulSoup is our HTML beautification protagonist, different scenarios require various supporting characters. Here's a handy toolbox for alternative prettifiers and enhancements.

From chaos to neatness with lxml strings

When you find yourself dealing with HTML documents created by libraries like lxml, don't sweat it. Leverage lxml.etree.tostring() and set the stage for beauty with pretty_print=True. Saying it in English? It's 'unicode':

from lxml import etree # Your canvas is ready! Paint away using your HTML arsenal! html_element = etree.Element("html") # Time to invite lxml to set the stage for the beautiful HTML maestro! pretty_html = etree.tostring(html_element, encoding='unicode', pretty_print=True) # Now, just sit back and enjoy the masterpiece we call 'formatted_lxml.html' with open('formatted_lxml.html', 'w') as file: file.write(pretty_html)

Ta-da! You've got a tidy-fied HTML file that's more readable than a children's book!

html5print: The Swiss army knife

If you're looking for a code razzle-dazzle, let me present html5print. It styles not just HTML, but CSS and JS too!

from html5print import HTMLBeautifier # Put your untidy HTML in a queue for a makeover html = "<your_html_code>" # Voila, HTML is ready for the red carpet! pretty_html = HTMLBeautifier.beautify(html, 4) # Step right in, the spotlight is on 'html5_output.html'! with open('html5_output.html', 'w') as file: file.write(pretty_html)

Write and tidy with yattag

Ever wished for a double-duty tool? Something that generates HTML and keeps it tidy? Look no further than yattag!

from yattag import Doc, indent # Roll out the blueprint! doc, tag, text = Doc().tagtext() with tag('html'): with tag('body'): with tag('p'): text('Hello world!') # And... Tidy up! Years of house chores finally paid off! pretty_html = indent(doc.getvalue()) # Revel in the awe of 'yattag_output.html'! with open('yattag_output.html', 'w') as file: file.write(pretty_html)

This pythonic combo lets you control indentation like a fifa console game!

Correctness before prettiness

The golden rule: always strive for semantic correctness before you send your HTML for a beauty pageant. Thankfully, the The W3C Markup Validation Service is here to save the day. It's like your HTML's personal fitness trainer, ensuring your HTML is in stellar shape!