Escaping regex specials
Special characters such as .
, |
, ?
, *
, and +
should be escaped using a backslash (\
):
String input = "Section 4.2" ;
String[] sections = input.split( "\\." ); // Yes indeed, dot is a special character!
// sections = ["Section 4", "2"] -> Finally split into rightful sections
Alternatively, use Pattern.quote()
for safe character handling:
String[] sections = input.split(Pattern.quote( "." ));
Including separators
Retain separators when splitting with lookarounds in your regex:
String[] partsIncluded = "apple-banana-cherry" .split( "(?<=-)" );
// partsIncluded = ["apple-", "banana-", "cherry"] -> apple didn't fall far from the hyphen!
String[] partsAhead = "apple-banana-cherry" .split( "(?=-)" );
// partsAhead = ["apple", "-banana", "-cherry"]
Result array size control
Control the size of the result array by specifying a limit parameter :
String[] limitParts = "apple-banana-cherry" .split( "-" , 2 );
// limitParts = ["apple", "banana-cherry"] -> Cherry has a new sidekick named banana!
Regex gymnastics and Pattern
, Matcher
usage
Convert regex patterns to Pattern
objects and use Matcher
for executing regex operations :
Pattern pattern = Pattern.compile( "-" );
Matcher matcher = pattern.matcher( "java-tips-tools" );
while (matcher.find()) {
// Put your processing clothes on! We're going pattern-matching!
}
Fetch matched content with Matcher.group()
:
Pattern pattern = Pattern.compile( "([A-Z]+)-([0-9]+)" );
Matcher matcher = pattern.matcher( "AB-123" );
if (matcher.matches()) {
String letters = matcher.group( 1 ); // AB
String numbers = matcher.group( 2 ); // 123
// Who said letter grades and numbers can't get along?
}
Error handling in splitsville
Handle unexpected formats with proper error management :
String ipAddress = "192-168-1-1" ;
String[] octets = ipAddress.split( "-" );
if (octets.length != 4 ) {
throw new IllegalArgumentException( "Invalid IP address format. We're strict on rules here!" );
}
No regex? No problem!
The StringTokenizer
class splits strings without the complications of regex:
StringTokenizer tokenizer = new StringTokenizer( "java:tips:tools" , ":" );
while (tokenizer.hasMoreTokens()) {
String token = tokenizer.nextToken();
// Token party! Everyone's invited!
}
Conquer regex complexities
Ensure a specific structure of input using capturing groups:
String input = "AB-100" ;
boolean isMatch = input.matches( "([A-Z]{2})-(\\d+)" );
// isMatch = true -> AB congratulates 100!
Navigating negation and quantities
Exclude certain characters using negative character classes :
String data = "apple!banana?cherry" ;
String[] words = data.split( "[^a-zA-Z]+" );
// words = ["apple", "banana", "cherry"] -> Uninvited special characters shown the door!
Use quantifiers like +
to match one or more occurrences:
String serialNumbers = "A123-B456-C789" ;
String[] numbers = serialNumbers.split( "-\\D+" );
// numbers = ["A123", "B456", "C789"] -> Nothing can keep these numbers apart!