Search this site

 Home  Regex intro  Character classes  Repetition operators  Find/replace  Multiline  Example regex


Introduction to regular expressions in Java

Regular expressions are a bit like marmite: people either love them or hate them. Essentially, they are a syntax for pattern matching.

Pattern matching by hand-coding or with regular expressions

Suppose you want to answer the question: does a given string contain a series of 10 digits? You could hand-code this: cycle through the characters in the string until you hit a digit. Then when you find a digit, cycle through checking that the next nine characters are digits. So in Java, the code would look something like this:

public boolean hasTenDigits(String s) {
  int noDigitsInARow = 0;
  for (int len = s.length(), i = 0; i < len; i++) {
    char c = s.charAt(i);
    if (Character.isDigit(c)) {
      if (++noDigitsInARow == 10) {
        return true;
      }
    } else {
      noDigitsInARow = 0;
    }
  }
  return false;
}

The strengths and weaknesses of this code are obvious:

  • it's quite a few lines of code for a conceptually simple thing;
  • but at least you can understand the code fairly easily once you start picking through it.

Doing the same thing with a regular expression looks something like this:

public boolean hasTenDigits(String s) {
  return s.matches(".*[0-9]{10}.*");
}

You'll probably agree that we've more or less reversed the above two points. Now, we have a nice succinct piece of code, but it does rely on you understanding a nasty piece of syntax. The argument in favour of regular expressions is:

  • once you do understand the nasty piece of syntax, regular expressions let you "see the wood for the trees": you don't have to pick through long pieces of sprawling code when doing simple pattern matching on strings.

On the next page, we'll get going with basic expressions with String.matches().

In case you already know something about regular expressions and want to skip ahead, here are some of the later topics currently covered by this tutorial:

Regular expression examples

Finally, we'll look at a couple of examples of using regular expressions:

  • guessing the IP's country code from referrer string with regular expressions;
  • Scraping HTML: how to pull out data from the HTML or XML data at a particular URL, a task often called "HTML scraping" or "screen scraping".
comments powered by Disqus

Written by Neil Coffey. Copyright © Javamex UK 2012. All rights reserved.