Explain Codes LogoExplain Codes Logo

How to check if a String contains only ASCII?

java
prompt-engineering
best-practices
functions
Alex KataevbyAlex Kataev·Dec 28, 2024
TLDR

Evaluate if a String has ASCII characters only, leverage Java String.matches() method with the apt regex pattern ^\\p{ASCII}*$. Just use the following code:

boolean isAscii(String input) { return input.matches("^\\p{ASCII}*$"); }

Introduce your string to this isAscii function, and it will return true if it's an ASCII-only string, or false if equator crosses non-ASCII lands.

Dry-land loop approach

For larger strings, size of the Pacific for instance, a for-each loop may save you some computational time vs regex:

boolean isPureAscii(String input) { for (char c : input.toCharArray()) { if (c > 127) { // anything above 127 smells funny, could be alien, better check return false; } } return true; // nothing smells, all characters seem to belong to ASCII-land }

This code navigates through each character to see if it's carrying an ASCII visa (char value < 128).

Venturing into the Charset sea

When charsets become a part of your toolkit, java.nio.charset.Charset shows the path to check ASCII conformance. Here's your GPS:

boolean isAsciiEncoded(String input) { CharsetEncoder encoder = Charset.forName("US-ASCII").newEncoder(); // prepare US-ASCII boat return encoder.canEncode(input); // return true only if input fits nicely in our ASCII boat }

This code checks if the input string is ready for an ASCII cruise using US-ASCII charset.

Guava's home route

For those comfortable in Guava's backyard, CharMatcher.ascii() offers a familiar solution:

boolean isOnlyAscii(String input) { return CharMatcher.ascii().matchesAllOf(input); // if ASCII, Guava welcomes }

Here, CharMatcher.ascii() checks for all ASCII folks, including those homeless ones like tabs and line feeds.

Beware the "ISO-8859-1" whirlpool

The "ISO-8859-1" encoding represents more than the mere ASCII inhabitants. Watch your step while checking for ASCII compliance here - it's no strict rule. Call it the ASCII gray area if you will.

Diving with non-printable ghosts

Dealing with non-printable characters? ASCII includes these code point range U+007F characters. Be aware of them as ASCII validation might get haunted by these unseen entities.

Paddling through extended ASCII

"Extended ASCII" might confuse you with a wider range than standard ASCII (0-127). For authenticity, follow the 0-127 trail - the true ASCII territory.

Jumping in the UTF-16 swimming pool

Java Strings love to swim in the UTF-16 pool. Unicode characters outside ASCII chill at values > 127. Keep this in mind as it might rock your ASCII-volleyball game!

Spotting the edge of the Earth

Be cautious of the Earth's edges - empty strings or null inputs. Early validation patrols against sailing off to NullPointerExceptions or misjudged data interpretations. It's always safety first!

Sailing with alternative regex patterns

Loves regex, but not fond of the usual route? Try \\A\\p{ASCII}*\\z. It guarantees te journey starts and ends with an ASCII character sequence:

boolean isAsciiRegexAlt(String input) { return input.matches("\\A\\p{ASCII}*\\z"); }

May improve visibility of the journey's start and end - all within the regex map!