
Home
Collections intro
Lists
Maps
Sets
Which collection class?
Sorting
Hashing
Advanced Writing a hash function in Java:

Object fields  Example hash code  Comments 

Two ints, each fairly randomly distributed between 0 and a fairly large number^{2}.  x ^ y  XORing two numbers with roughly random distribution results in another number still with roughly random distribution^{3}, but which now depends on the two values. 
Two objects whose classes already have good hash functions (for example Strings or generally other standard library classes).  x.hashCode() ^ y.hashCode()  If the two classes already have good (randomly distributed) hash code functions, then you can treat the two hash codes as "randomly distributed integers" and XOR them. 
Two ints that have a biased distribution.  (x * p) ^ y Where p is a prime number^{4} and/or odd number close to a power of 2, such as 17 or 33.  Multiplying one of the numbers (or, put another way, shifting plus adding/subtracting from itself) will help to ensure that the "biased" bits of one number interact with the more "random" bits of the other, so that the bias is "ironed out" over more of the bits. 
enums  Treat as "biased ints": (x.hashCode() * 17) ^ y.hashCode()  Enums typically have low values (favour the lower bits), so they should be combined with a multiplication to spread them across a wider bit range of the hash code. 
Two doubles or floats.  Wrap the primitive in a Float or Double object then call its hashCode() method. new Float(x).hashCode() ^ new Double(y).hashCode() (new Double(x).hashCode() >> 13) ^ new Double(y).hashCode()  For efficiency (but the JIT compiler may remove the need) you could directly copy the code from Double.hashCode() or Float.hashCode() directly into your hash function. See the hashCode() method of Point2D for an example of this. Hash codes of floats and doubles, in particular those representing powers of 2 (including negative powers of 2: 1/2, 1/4 etc), typically have more variation in the upper bits than in the lower ones. The HashMap's secondary hash code alleviates this problem somewhat, but it may be worth combining the hash codes by shifting one of them right by several bits so that more variation ends up in the lower bits. 
If you're really stuck on what to use as a hash function, a last resort (that is actually not as bad as it sounds) is to append all of the variables in string format and then use String.hashCode()
public int hashCode() { int hc = cachedHashCode; if (hc == 0) { String varstr = var1 + var2 + var3; hc = varstr.hashCode(); } return hc; }
Because creating the string is a little bit of an overhead, you'll probably want to cache the hash code once calculated, as in this example.
Once you have decided on a hash function, you need to actually "plug it in" to your Java class. As discussed on the next page, you essentially do this by overriding hashCode() and equals() in the class in question.
1. Since Java 1.4, the HashMap and related classes uses a table that always
has a power of two as the number of buckets. In other words, it will always take
the x lowest bits of a given hash code to determine the bucket number.
The HashMap implementation also uses an additional hash function to try
and "spread" the randomness of bits in the hash function. In other words, a hash
function which has some random bits, even if not the loworder bits, will
work to some extent, but still not as well as if it had random loworder bits.
Note that this means that the common guidance to "always use a prime number of
buckets" doesn't apply to Java HashMaps; even if you construct a HashMap
with a prime number, this capacity will be rounded up to the nextbiggest power
of two.
2. By "fairly large", we mean large at least in proportion to the number of
integers we'll typically put in the map. Generally we mean at least as large as the
square of this number. So if we're going to put 200 numbers in the map, we
want them to be distributed fairly randomly between 0 and 200 x 200 = 400000 at the very least.
3. At each bit of the two numbers to combine, a 0 is output if the two bits are equal, else a 1. In other words, in 50% of the combinations, a 1 will be output. So if the two input bits each have a roughly 5050 chance of being 0 or 1, then so too will the output bit.
4. Actually choosing which number to use as p can be a black art, but in many cases those mentioned will work fairly well. If you know something more precise about the number distributions, you may be able to choose a better operation to average out the bias. For example, if you know that "x is always a fairly random multiple of 8 and y is always fairly random", then you could use (x >> 3) ^ y (effectively divide x by 8 before XORing with y).
Written by Neil Coffey. Copyright © Javamex UK 2012. All rights reserved. If you have any feedback on the Java collections tutorials in this section or about the content of this site in general, please leave a message on the Javamex forum.