How do I unescape HTML entities in a string in Python 3.1?
You need to leverage Python's html module with the help of the unescape()
function. This turns your HTML entities back into their intended characters:
Legacy extension
If you're still partying in the world of Python 3.1, the HTMLParser
class from the html.parser
module got your back:
This HTMLParser().unescape()
function switch converts anything from common entities such as &
to rare species like "
in your strings.
Alternative and helpful methods
Legend of xml.sax.saxutils
There's a lesser known but equally powerful hero - the xml.sax.saxutils
module. It too possesses the powers of unescape()
:
For those who prefer to keep to Python's homegrown capabilities, this is another excellent built-in solution.
Craft your own regex hero
For those situations where you are dealing with complex strings or if you are just a regex maestro, here's a way to forge your own mighty function:
This baby has its own crafted regex pattern that hunts down entities and replaces them with corresponding unicode characters.
Unicode and hex escaping: trap for the tricksters!
Even sneaky escaped unicode characters cannot hide:
Same goes for regular hexadecimal pranksters:
Was this article helpful?