Capturing groups

Capturing groups are an extremely useful feature of regular expression matching that allow us to query the Matcher to find out what the part of the string was that matched against a particular part of the regular expression.

Let's look directly at an example. Say we write an expression to parse dates in the format DD/MM/YYYY. We can write an expression to do this as follows:

[0-9]{2}/[0-9]{2}/[0-9]{4}

(In principle, we could make the regular expression put further constraints on the date. For example, the month part could be just [0-1][0-9], since the first digit of a month number must be 0 or 1. But for this example, we'll accept any number with the correct number of digits and assume that further range checking would then take place when a match was found.)

As it stands, this expression will tell us if a given string matches the required date format, but it won't help us read what the date is. This is where capturing groups come in. We re-write the expression as follows.

([0-9]{2})/([0-9]{2})/([0-9]{4})

The brackets surround the parts of the expression whose corresponding string we want to "remember". These bracketed expressions are called groups, and are number from 1 upwards from left to right.

Now, we can "pull out" these elements of the string with the following code:

Pattern datePatt = Pattern.compile("([0-9]{2})/([0-9]{2})/([0-9]{4})");
...
Matcher m = datePatt.matcher(dateStr);
if (m.matches()) {
  int day   = Integer.parseInt(m.group(1));
  int month = Integer.parseInt(m.group(2));
  int year  = Integer.parseInt(m.group(3));
}

Note that to use capturing groups, we basically have to use the explicit Pattern/Matcher means of matching. In advanced use of capturing groups, there are exceptions where we can actually refer to a captured group from inside the expression itself. When performing a search and replace with a regular expression, we can also refer to groups by their number from inside the replacement string.

Group 0

Capturing groups start at group number 1, as in the example above. There is also a group 0, which is always the entire string that matched. See also the section on search and replace using the Matcher.find() method.

Alternatives and optional capturing groups

On the following page, we look at using alternatives in capturing groups.


If you enjoy this Java programming article, please share with friends and colleagues. Follow the author on Twitter for the latest news and rants.

Editorial page content written by Neil Coffey. Copyright © Javamex UK 2021. All rights reserved.