Explain Codes LogoExplain Codes Logo

Strip html from string Ruby on Rails

ruby
html-sanitizer
rails-html-sanitizer
xss-attacks
Alex KataevbyAlex Kataev·Oct 31, 2024
TLDR

To strip HTML tags from a string in Ruby on Rails, use the strip_tags method:

clean_string = strip_tags("<b>Sample</b> text with <a href='link'>HTML</a>.") # Returns: "Sample text with HTML."

Remember to add ActionView::Helpers::SanitizeHelper if you are venturing outside the kingdom of views or helpers.

Preserving value attribute's text

The strip_tags function takes care of the children, really. It helps you retain the text content even within attributes like value. Here's the trick:

html_string = '<input type="text" value="Clean <b>Text</b>">' clean_string = strip_tags(html_string) # Whoa! The value "Clean <b>Text</b>" is intact. Magic? Nope, just Rails.

Thanks to ActionView::Base.full_sanitizer.sanitize, even the daunting specter of XSS attacks is kept at bay.

Handling complex HTML with Nokogiri

For a HTML structure that looks like spaghetti code, Nokogiri can be your Swiss knife. Control it with Xpath like a pro:

require 'nokogiri' doc = Nokogiri::HTML(html_string) text = doc.xpath("//text()").to_s # Boom! It discards all HTML tags faster than you discard your diet plans.

Simply put, Nokogiri makes advanced manipulation of HTML/XML look like child's play.

Loofah: Your sanitization companion

Meet Loofah, a gem that wears many hats:

require 'loofah' scrubbed_string = Loofah.scrub_fragment(html_string, Loofah::Scrubbers::Strip).to_text # Voila! All HTML tags are gone and special characters are playing nice.

Loofah is all about extensive sanitization and aligns well with Rails' "safety first" mantra.

Customizing HTML tag allowance

What if you want to play favorites with HTML tags and attributes? sanitize with :tags and :attributes is your answer, and %w() your secret handshake:

safe_string = sanitize(html_string, tags: %w(a img), attributes: %w(href src)) # Guess what? This allows <a> and <img> tags with href and src attributes. You're welcome!

Scenario-based solutions

Let's move further into our journey of scenarios and their respective solutions:

Retaining simple formatting: An art form

How to keep basic text formatting while warding off XSS threats?

sanitized_string = sanitize(html_string, tags: %w(b i u br), attributes: %w()) # It be like: Bold, italics, underline, line break? Yeah baby, let's do it!

Custom sanitization: Not as hard as it sounds

Building a tailor-made cure with Loofah:

scrubber = Loofah::Scrubbers::Prune.new(tags: %w(script)) clean_html = Loofah.fragment(html_string).scrub(scrubber).to_html # No more <script> tags and their content; because, who invited them anyway!

Make links secure with rails-html-sanitizer and keep your cyber world safe:

sanitized_string = Rails::Html::Sanitizer.sanitize(html_string, tags: %w(a), attributes: %w(href)) # It's like a bouncer for your <a> tags