Html character decoding in Objective-C / Cocoa Touch
To quickly decode HTML characters in Objective-C use NSAttributedString
and NSHTMLTextDocumentType
:
The elegant NSAttributedString
translates HTML entities into plain text without breaking a sweat. Include the Foundation framework, and your job is already halfway done!
Understanding the ropes: Various approaches
While NSAttributedString
is your swift, turnkey solution, consider the alternative methods for more nuanced and customized handling of HTML entities.
GitHub gem: NSString HTML category
Developers on GitHub have created a nifty NSString
category for HTML. This model proffers methods for decoding HTML entities, encoding text into HTML, and even converting HTML to plain text:
Decoding with NSScanner: one entity at a time
NSScanner
earns its keep when parsing strings with mixed content types or if you need extremely granular HTML entity extraction:
In here, consider extra capacity for the result string and checking the end of scanning for a smooth ride and no infinite loops.
Google Toolbox for Mac: the straightforward way
The Google Toolbox for Mac offers gtm_stringByUnescapingFromHTML
, a method to make decoding characters as straightforward as a math test in first grade.
String manipulation with NSMutableString: Efficiency is key
NSMutableString
comes to string manipulation's rescue when there are frequent mutations or replacements involved:
Converting NSData back to NSAttributedString: Full circle
For a holistic approach in iOS 7+, convert your HTML string back to NSData
, and you're back to NSAttributedString
:
Handling those edge cases and special characters
Character Entity References and the reserved nature of the ampersand (&) need special mention.
Wading through special characters
- Character Entity References:
&
or&
are not some secret codes from another planet, but your HTML characters in disguise. Decode them right to prevent any alien invasion! - Reserved Ampersands: Caution! "&" is the VIP of HTML and needs special handling in places like RSS feeds. Treat it well, or it might wreak havoc!
Navigating around multithreading
- Asynchronous Decoding: HTML requests taking too long? Shift the load of decoding off the main thread. Multitasking, yay!
- NSAttributedString Main Thread Use: Although we love pushing work off the main thread, remember
NSAttributedString
prefers being called on the main thread. After all, who doesn't like some special attention!
Visualization
Let's take a quick illustrative detour for understanding HTML character decoding:
Still having doubts? Let's practically decode a string with the @testable HTML entities:
You get readable text, as simple as ABC:
Adapt and conquer: Tips from the trench
Unit Tests: Battle-tested code is the best code. Thoroughly validate with different HTML content.
Keep up-to-date: The Apple Developer Documentation should be your daily newspaper. Watch out for changes in your friendly neighborhood functions!
Optimization: Make Instruments your best friend to optimize your string decoding. Plus, it's free!
Print this out: References
- String Format Specifiers: Put your string formatting skils to test.
- NSXMLParser | Apple Developer Documentation: Master the art of XML and HTML parsing with NSXMLParser.
- initWithData:options:documentAttributes:error: | Apple Developer Documentation: Learn transforming HTML to NSAttributedString.
- HTML character decoding in Objective-C / Cocoa Touch: Crowd-sourced wisdom on HTML entity encoding and decoding in Objective-C.
- NSCharacterSet | Apple Developer Documentation: Hone your skills in manipulating strings with character sets.
- NSURL | Apple Developer Documentation: Master URL-encoded strings in Objective-C with NSURL.
- NSRegularExpression | Apple Developer Documentation: Rule the world of pattern matching in text with regular expressions.
Was this article helpful?