Home  Collections intro  Lists  Maps  Sets  Which collection class?  Sorting  Hashing  Advanced
 Video lecture: hash tables  Bloom filters

Writing a hash function in Java:
a practical guide to implementing hashCode()

If you started by reading this site's introduction to hash maps, then you probably saw the example of the hash function of the String class. This function works by combining the values of all characters making up the string. On this page, we'll look at some rough-and-ready patterns for making a hash code function without going into too much of the theory.

Basic hash function guidelines

What we basically want to achieve with a hash function is as follows:

  • we want to combine all data values that vary into the hash code;
  • we want the bits of the data that vary most randomly to affect the lower bits of the hash code;
  • given a set of typical random keys, we want hash codes where as many bits as possible have roughly 50-50 chance of being a 0 or 1 (in other words are "as randomly distributed as possible"), especially in the lower bits;
  • we want the hash function to execute quickly: in other words, it should be a "couple of lines of code" written with a few simple arithmetic operations.

The desire to have hash codes that have a random distribution especially in the lower bits isn't necessarily a universal requirement, but it is desirable because of the way Java hash maps work1.

Here are some examples of data fields and how we might achieve the above goals. Note that we'll just give the "raw" hash function here, and look separately at how to actually build hash functions into a class.

Object fieldsExample hash codeComments
Two ints, each fairly randomly distributed between 0 and a fairly large number2.x ^ yXORing two numbers with roughly random distribution results in another number still with roughly random distribution3, but which now depends on the two values.
Two objects whose classes already have good hash functions (for example Strings or generally other standard library classes).x.hashCode() ^ y.hashCode()If the two classes already have good (randomly distributed) hash code functions, then you can treat the two hash codes as "randomly distributed integers" and XOR them.
Two ints that have a biased distribution.(x * p) ^ y
Where p is a prime number4 and/or odd number close to a power of 2, such as 17 or 33.

Multiplying one of the numbers (or, put another way, shifting plus adding/subtracting from itself) will help to ensure that the "biased" bits of one number interact with the more "random" bits of the other, so that the bias is "ironed out" over more of the bits.

enums

Treat as "biased ints":

(x.hashCode() * 17) ^ y.hashCode()

Enums typically have low values (favour the lower bits), so they should be combined with a multiplication to spread them across a wider bit range of the hash code.
Two doubles or floats.

Wrap the primitive in a Float or Double object then call its hashCode() method.

new Float(x).hashCode() ^ new Double(y).hashCode()

(new Double(x).hashCode() >> 13) ^ new Double(y).hashCode()

For efficiency (but the JIT compiler may remove the need) you could directly copy the code from Double.hashCode() or Float.hashCode() directly into your hash function. See the hashCode() method of Point2D for an example of this.

Hash codes of floats and doubles, in particular those representing powers of 2 (including negative powers of 2: 1/2, 1/4 etc), typically have more variation in the upper bits than in the lower ones. The HashMap's secondary hash code alleviates this problem somewhat, but it may be worth combining the hash codes by shifting one of them right by several bits so that more variation ends up in the lower bits.

If you're really stuck...

If you're really stuck on what to use as a hash function, a last resort (that is actually not as bad as it sounds) is to append all of the variables in string format and then use String.hashCode()

public int hashCode() {
  int hc = cachedHashCode;
  if (hc == 0) {
    String varstr = var1 + var2 + var3;
    hc = varstr.hashCode();
  }
  return hc;
}

Because creating the string is a little bit of an overhead, you'll probably want to cache the hash code once calculated, as in this example.

Making classes hash map compatible

Once you have decided on a hash function, you need to actually "plug it in" to your Java class. As discussed on the next page, you essentially do this by overriding hashCode() and equals() in the class in question.

1. Since Java 1.4, the HashMap and related classes uses a table that always has a power of two as the number of buckets. In other words, it will always take the x lowest bits of a given hash code to determine the bucket number. The HashMap implementation also uses an additional hash function to try and "spread" the randomness of bits in the hash function. In other words, a hash function which has some random bits, even if not the low-order bits, will work to some extent, but still not as well as if it had random low-order bits. Note that this means that the common guidance to "always use a prime number of buckets" doesn't apply to Java HashMaps; even if you construct a HashMap with a prime number, this capacity will be rounded up to the next-biggest power of two.
2. By "fairly large", we mean large at least in proportion to the number of integers we'll typically put in the map. Generally we mean at least as large as the square of this number. So if we're going to put 200 numbers in the map, we want them to be distributed fairly randomly between 0 and 200 x 200 = 400000 at the very least.
3. At each bit of the two numbers to combine, a 0 is output if the two bits are equal, else a 1. In other words, in 50% of the combinations, a 1 will be output. So if the two input bits each have a roughly 50-50 chance of being 0 or 1, then so too will the output bit.
4. Actually choosing which number to use as p can be a black art, but in many cases those mentioned will work fairly well. If you know something more precise about the number distributions, you may be able to choose a better operation to average out the bias. For example, if you know that "x is always a fairly random multiple of 8 and y is always fairly random", then you could use (x >> 3) ^ y (effectively divide x by 8 before XORing with y).

comments powered by Disqus

Written by Neil Coffey. Copyright © Javamex UK 2012. All rights reserved. If you have any feedback on the Java collections tutorials in this section or about the content of this site in general, please leave a message on the Javamex forum.