Regular expression example: IP location

On the previous page, we showed how a regular expression can be uesd to extract the country code from the referrer string, looking at the simplest case of a Yahoo referrer string.

Parsing the Google referrer string

Recall that the Google referrer strings look as follows:

http://www.google.com/search?hl=fr&q=dictionary+french
http://www.google.co.in/search?hl=en&q=java+programming
http://www.google.com.au/search?hl=en&q=sidney+shopping
http://www.google.bg/search?hl=bg&q=red+wine

As you can see:

We'll propose treating these referrers as two types of case:

Of course, this isn't perfect. For example, there are many Spanish speakers living in the southern states of the US who may well use google.com but have the language configured to be es. With our simplistic method here, we'd mistakenly say they were in Spain. In the first URL above, we will say the user is in France on the basis of the language code fr, but they could quite likely be in Canada. And ultimately there's nothing to stop a user in Spanish-speaking Peru using the Australian site google.com.au and configuring their language to be Italian. Slightly erroneously, we're pretending that country and language codes are the same thing; in some cases this isn't true, and in some cases a language can be specified with a locational variant (e.g. fr-CA for Canadian French) which would be a better clue. We'll ignore these issues here. It actually turns out that in many cases, the simplistic methodology we outline here is a reasonable first approximation.

On the next page, we consider in turn these two types of case: google.com with a language code and country-specific google domain.