Explain Codes LogoExplain Codes Logo

Html character decoding in Objective-C / Cocoa Touch

html
html-entity-decoding
string-manipulation
objective-c
Anton ShumikhinbyAnton Shumikhin·Oct 29, 2024
TLDR

To quickly decode HTML characters in Objective-C use NSAttributedString and NSHTMLTextDocumentType:

NSString *htmlString = @"& < >"; NSData *data = [htmlString dataUsingEncoding:NSUTF8StringEncoding]; NSAttributedString *decoded = [[NSAttributedString alloc] initWithData:data options:@{NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType} documentAttributes:nil error:nil]; NSString *result = [decoded string];

The elegant NSAttributedString translates HTML entities into plain text without breaking a sweat. Include the Foundation framework, and your job is already halfway done!

Understanding the ropes: Various approaches

While NSAttributedString is your swift, turnkey solution, consider the alternative methods for more nuanced and customized handling of HTML entities.

GitHub gem: NSString HTML category

Developers on GitHub have created a nifty NSString category for HTML. This model proffers methods for decoding HTML entities, encoding text into HTML, and even converting HTML to plain text:

// convert HTML to plain text in one line! NSString *plainRssFeedText = [originalRssFeedText gtm_stringByUnescapingFromHTML];

Decoding with NSScanner: one entity at a time

NSScanner earns its keep when parsing strings with mixed content types or if you need extremely granular HTML entity extraction:

NSString *html = @"The "NSScanner" can decode & too!"; NSMutableString *decodedString = [NSMutableString string]; NSScanner *scanner = [NSScanner scannerWithString:html]; while (![scanner isAtEnd]) { NSString *text = nil; [scanner scanUpToString:@"&" intoString:&text]; if (text) [decodedString appendString:text]; // Save a turtle, add to the string! [scanner scanHTMLCharacterEntityIntoString:&text]; if (text) [decodedString appendString:text]; // NSScanner: "I've got the entity, boss!" } // Mind the "&" symbol when it stands alone, or your HTML might go on strike!

In here, consider extra capacity for the result string and checking the end of scanning for a smooth ride and no infinite loops.

Google Toolbox for Mac: the straightforward way

The Google Toolbox for Mac offers gtm_stringByUnescapingFromHTML, a method to make decoding characters as straightforward as a math test in first grade.

// Google toolbox making decoding look like a piece of cake! NSString *decodedStr = [encodedStr gtm_stringByUnescapingFromHTML];

String manipulation with NSMutableString: Efficiency is key

NSMutableString comes to string manipulation's rescue when there are frequent mutations or replacements involved:

NSMutableString *mutableHtml = [NSMutableString stringWithString:htmlString]; [mutableHtml replaceOccurrencesOfString:@"&" withString:@"&" options:NSLiteralSearch range:NSMakeRange(0, [mutableHtml length])]; // Continue replacing other entities. Go on! The world won't replace itself!

Converting NSData back to NSAttributedString: Full circle

For a holistic approach in iOS 7+, convert your HTML string back to NSData, and you're back to NSAttributedString:

// Convert back HTML string to NSData, and NSData to NSAttributedString. Talk about a 360!

Handling those edge cases and special characters

Character Entity References and the reserved nature of the ampersand (&) need special mention.

Wading through special characters

  • Character Entity References: & or & are not some secret codes from another planet, but your HTML characters in disguise. Decode them right to prevent any alien invasion!
  • Reserved Ampersands: Caution! "&" is the VIP of HTML and needs special handling in places like RSS feeds. Treat it well, or it might wreak havoc!
  • Asynchronous Decoding: HTML requests taking too long? Shift the load of decoding off the main thread. Multitasking, yay!
  • NSAttributedString Main Thread Use: Although we love pushing work off the main thread, remember NSAttributedString prefers being called on the main thread. After all, who doesn't like some special attention!

Visualization

Let's take a quick illustrative detour for understanding HTML character decoding:

Input: Encoded HTML entities like &amp; &lt; &gt; Process: 🤖🔍(decoding) // Decoding the HTML hieroglyphics... Output: Decoded characters like &, <, >

Still having doubts? Let's practically decode a string with the @testable HTML entities:

NSString *htmlString = @"Wanna learn decoding? &amp; We got &lt; you covered!"; NSString *decodedString = [htmlString stringByDecodingHTMLEntities];

You get readable text, as simple as ABC:

Before: The encoded slogan 📜: "Join the club &amp; discover fun!" After: The real deal 💌: "Join the club & discover fun!"

Adapt and conquer: Tips from the trench

Unit Tests: Battle-tested code is the best code. Thoroughly validate with different HTML content.

Keep up-to-date: The Apple Developer Documentation should be your daily newspaper. Watch out for changes in your friendly neighborhood functions!

Optimization: Make Instruments your best friend to optimize your string decoding. Plus, it's free!

  1. String Format Specifiers: Put your string formatting skils to test.
  2. NSXMLParser | Apple Developer Documentation: Master the art of XML and HTML parsing with NSXMLParser.
  3. initWithData:options:documentAttributes:error: | Apple Developer Documentation: Learn transforming HTML to NSAttributedString.
  4. HTML character decoding in Objective-C / Cocoa Touch: Crowd-sourced wisdom on HTML entity encoding and decoding in Objective-C.
  5. NSCharacterSet | Apple Developer Documentation: Hone your skills in manipulating strings with character sets.
  6. NSURL | Apple Developer Documentation: Master URL-encoded strings in Objective-C with NSURL.
  7. NSRegularExpression | Apple Developer Documentation: Rule the world of pattern matching in text with regular expressions.