Search and replace with regular expressions (2)

Many simple search and replace operations can be performed using the String.replaceAll() method. Sometimes, more flexibility is required: for example, if not every instance of the expression needs replacing, or if the replacement string is not fixed. In this case, instaces of Matcher provide a find() method which we will look at here. The idiom introduced here can also be used simply to find and process instances of a pattern in a string, without necessarily appending anything to another string.

The Matcher.find() method

Using Matcher.find() shares some similarity to Matcher.matches(). We first need to compile a Pattern representing our regular expression and then from this construct a Matcher around the string that we want to process. But unlike when we use matches(), our expression is now the pattern that we want to find as a portion of the string, rather than as the whole string. And since the pattern can occur multiple times in the string being matched, we will sit in a loop calling the find() method. The find() method will return true as long as there's another match.

To perform the "replacement", as we go along, we actually build up a new StringBuffer that will contain the new version of the string with the replacements made. A couple of methods of the Matcher object will help us with this.

So keeping with our example of removing HTML 'bold' tags, the code now looks like this:

public String removeBoldTags(CharSequence htmlString) {
  Pattern patt = Pattern.compile("<b>([^<]*)</b>");
  Matcher m = patt.matcher(htmlString);
  StringBuffer sb = new StringBuffer(htmlString.length());
  while (m.find()) {
    String text = m.group(1);
    // ... possibly process 'text' ...
    m.appendReplacement(sb, Matcher.quoteReplacement(text));
  }
  m.appendTail(sb);
  return sb.toString();
}

You'll notice that the parameter passed in is not specifically a String but actually just any old CharSequence. The CharSequence interface introduced in Java 1.4 is implemented by String and by a few other classes (such as StringBuffer and CharBuffer) that can hold a 'sequence of characters'. On the other hand, the appendX() methods work only with StringBuffers– it would have been nice if they'd worked with any old Appendable, but the latter interface did not exist when the regular expressions API was added (in Java 1.4; Appendable was added in Java 5).

Group 0

You may recall from our discussion of capturing groups that there is always a group 0, which refers to the entire string when using the matches() method. When using the find() method, group 0 refers to the entire portion of the string found to match the expression on the previous call to find().

Find without the replace

Of course, you can use Matcher.find() without actually using the replace. This can be used, for example, if you just want to count or process instances of a particular pattern within a string.


If you enjoy this Java programming article, please share with friends and colleagues. Follow the author on Twitter for the latest news and rants.

Editorial page content written by Neil Coffey. Copyright © Javamex UK 2021. All rights reserved.