Explain Codes LogoExplain Codes Logo

Converting HTML to plain text in PHP for e-mail

php
html2text
email-conversion
plain-text
Anton ShumikhinbyAnton Shumikhin·Aug 30, 2024
TLDR

Convert HTML to plain text in PHP using the built-in function strip_tags(). This function efficiently strips away HTML tags and produces a clean plain text outcome, perfect for emails.

Code snippet:

$plainText = strip_tags("<h1>Title</h1><p>Message</p>"); // This outputs: Title Message, fast and simple, just like instant noodles!

For preserving formatting like line breaks, replace <br> tags with \n before using strip_tags().

Keep the style with html2text

If you want to convert HTML but aren't ready to let go of all the formatting, then use the open-source html2text library. This maintains the soul of your HTML document while converting it to plain text.

Composer installation:

composer require html2text/html2text

Code snippet:

use \Html2Text\Html2Text; $html = "<h1>Title</h1><p>Message</p>"; $text = new Html2Text($html); // Outputs 'Title\n\nMessage'. Your HTML tags are packed off to vacation!

Dealing with UTF-8 characters

Working with UTF-8 characters? Default PHP might kick up a fuss. Use mb_convert_encoding() to calm things down before stripping tags.

$html = "<p>Some UTF-8 text: Ω≈ç√∫˜µ≤≥÷</p>"; $utf8Text = mb_convert_encoding($html, 'HTML-ENTITIES', "UTF-8"); $plainText = strip_tags($utf8Text); // Outputs: Some UTF-8 text: Ω≈ç√∫˜µ≤≥÷. The special characters aren't aliens anymore!

Pro level: lynx solution

For extreme HTML enthusiasts, the lynx text browser combined with proc_open() in PHP would provide high-end rendering of HTML into a human-readable format.

Lynx conversion example:

$descriptorspec = array( 0 => array("pipe", "r"), 1 => array("pipe", "w"), 2 => array("pipe", "w") ); $process = proc_open('lynx -dump -stdin', $descriptorspec, $pipes); if (is_resource($process)) { fwrite($pipes[0], $htmlContent); // The HTML says goodbye and dives into stdin fclose($pipes[0]); $textContent = stream_get_contents($pipes[1]); // The plain text emerges from stdout, wondering what just happened fclose($pipes[1]); if ($errors = stream_get_contents($pipes[2])) { throw new Exception("Error converting HTML: " . $errors); } fclose($pipes[2]); proc_close($process); echo $textContent; }

Tip: Verify lynx's output to circumvent security slip-ups because nobody wants an executable surprise in their mail!

A word about License and Contributions

When using html2text, beware of the Eclipse Public License. This license comes with its own dressing room and diet, so compatibility with other scripts and libraries may be an issue.

Contributing to open-source projects is like voting. It makes the tools better, and when everybody does it, everyone wins! So head to the html2text repository to contribute and #MakeHTMLGreatAgain.

Look up other libraries such as php-variable-sanitizer and useful-php-scripts for advanced sanitation and functionalities. Remember to read and understand the licensing terms before using them, especially for commercial use.