More on character classes
We looked at some basic regular expressions that
included character classes: a "choice" of character to match placed
inside square brackets. For example, [Tt] will match against either
T or t. On this page we'll look at some more possibilities with
A useful feature is that we can put a range of characters by
placing a hyphen between start and end character. For example,
to match any lower case letter, we can write:
Similarly, to match a digit, we can write:
We can combine single characters and ranges, and/or combine multiple ranges:
|[a-zA-Z]||A lower or upper case letter in the range A-Z.|
|[0-9A-F]||A hexadecimal digit (0-9 or A-F)|
|[0-9A-Fa-f]||A hexadecimal digit, either upper or lower case.|
|[ 0-9]||A space or digit.|
To say "not in the range...", we put a hat symbol ^
at the beginning of the character class expression. So for example, to say "not a digit",
we would write the following:
An operation called intersection essentially means "in this class AND in this one". It is really useful when we combine an intersection with a negation to say "in this class BUT NOT in this one". The intersection uses two ampersands. Here is the syntax:
The first of these says a digit except 5; the second says
any lower case letter except those representing vowels.
Note that one ampersand on its own– &– simply represents that character.
Named character classes
Some 'shortcuts' exist for common character classes (such as [0-9]) in
the form of named character classes.
On the next page, we'll look at a special character class: the dot.