More on character classes

We looked at some basic regular expressions that included character classes: a "choice" of character to match placed inside square brackets. For example, [Tt] will match against either T or t. On this page we'll look at some more possibilities with character classes.

Character ranges

A useful feature is that we can put a range of characters by placing a hyphen between start and end character. For example, to match any lower case letter, we can write:

[a-z]

Similarly, to match a digit, we can write:

[0-9]

We can combine single characters and ranges, and/or combine multiple ranges:

ExpressionMeaning
[a-zA-Z]A lower or upper case letter in the range A-Z.
[0-9A-F]A hexadecimal digit (0-9 or A-F)
[0-9A-Fa-f]A hexadecimal digit, either upper or lower case.
[ 0-9]A space or digit.

Negation

To say "not in the range...", we put a hat symbol ^ at the beginning of the character class expression. So for example, to say "not a digit", we would write the following:

[^0-9]

Intersection

An operation called intersection essentially means "in this class AND in this one". It is really useful when we combine an intersection with a negation to say "in this class BUT NOT in this one". The intersection uses two ampersands. Here is the syntax:

[0-9&&[^5]]
[a-z&&[^aeiouy]]

The first of these says a digit except 5; the second says any lower case letter except those representing vowels.

Note that one ampersand on its own– &– simply represents that character.

Named character classes

Some 'shortcuts' exist for common character classes (such as [0-9]) in the form of named character classes.

Next...

On the next page, we'll look at a special character class: the dot.


If you enjoy this Java programming article, please share with friends and colleagues. Follow the author on Twitter for the latest news and rants.

Editorial page content written by Neil Coffey. Copyright © Javamex UK 2021. All rights reserved.