a practical guide to implementing

If you started by reading this site's introduction to hash maps,
then you probably saw the example of the hash function of the `String` class. This function works by **combining** the values of
all characters making up the string. On this page, we'll look at some rough-and-ready
patterns for making a hash code function without going into too much of the theory.

What we basically want to achieve with a hash function is as follows:

- we want to combine
**all data values that vary**into the hash code; - we want the
**bits of the data that vary most randomly**to affect the**lower bits**of the hash code; - given a set of typical random keys, we want hash codes where
as many bits as possible have roughly 50-50 chance of being a 0 or 1 (in other words are "as randomly distributed as possible"), especially in the lower bits; - we want the hash function to execute quickly: in other words, it should be a "couple of lines of code" written with a few simple arithmetic operations.

The desire to have hash codes that have a random distribution especially in the
**lower bits** isn't necessarily a universal requirement, but it is
desirable because of the way Java hash maps work^{1}.

Here are some examples of data fields and how we might achieve the above goals. Note that we'll just give the "raw" hash function here, and look separately at how to actually build hash functions into a class.

Object fields | Example hash code | Comments |
---|---|---|

Two ints, each fairly randomly distributed between 0 and a fairly large number^{2}. | x ^ y | XORing two numbers with roughly random distribution results in another number still with roughly random distribution^{3}, but which now depends on the two values. |

Two objects whose classes already have good hash functions (for example Strings or generally other standard library classes). | x.hashCode() ^ y.hashCode() | If the two classes already have good (randomly distributed) hash code functions, then you can treat the two hash codes as "randomly distributed integers" and XOR them. |

Two ints that have a biased distribution. | (x * p) ^ yWhere p is a prime number^{4} and/or odd number close to a power of 2, such as 17 or 33. | Multiplying one of the numbers (or, put another way, shifting plus adding/subtracting from itself) will help to ensure that the "biased" bits of one number interact with the more "random" bits of the other, so that the bias is "ironed out" over more of the bits. |

enums | Treat as "biased ints":
| Enums typically have low values (favour the lower bits), so they should be combined with a multiplication to spread them across a wider bit range of the hash code. |

Two doubles or floats. | Wrap the primitive in a
| For efficiency (but the JIT compiler may remove the need) you could directly copy the code from Hash codes of floats and doubles, in particular those representing powers of 2 (including negative powers of 2: 1/2, 1/4 etc), typically have |

If you're really stuck on what to use as a hash function, a last resort
(that is actually not as bad as it sounds) is to append all of the variables in
string format and then use `String.hashCode()`

public int hashCode() { int hc = cachedHashCode; if (hc == 0) { String varstr = var1 + var2 + var3; hc = varstr.hashCode(); } return hc; }

Because creating the string is a little bit of an overhead, you'll probably want to cache the hash code once calculated, as in this example.

Once you have decided on a hash function, you need to actually "plug it in" to your
Java class. As discussed on the next page, you essentially do this by
overriding `hashCode()` and `equals()`
in the class in question.

1. Since Java 1.4, the `HashMap` and related classes uses a table that always
has a power of two as the number of buckets. In other words, it will always take
the *x* lowest bits of a given hash code to determine the bucket number.
The `HashMap` implementation also uses an additional hash function to try
and "spread" the randomness of bits in the hash function. In other words, a hash
function which has *some* random bits, even if not the low-order bits, will
work to some extent, but still not as well as if it had random low-order bits.
Note that this means that the common guidance to "always use a prime number of
buckets" doesn't apply to Java `HashMap`s; even if you construct a `HashMap`
with a prime number, this capacity will be rounded up to the next-biggest power
of two.

2. By "fairly large", we mean large at least in proportion to the *number* of
integers we'll typically put in the map. Generally we mean at least as large as the
*square* of this number. So if we're going to put 200 numbers in the map, we
want them to be distributed fairly randomly between 0 and 200 x 200 = 400000 at the very least.

3. At each bit of the two numbers to combine, a 0 is output if the two bits are equal, else a 1. In other words, in 50% of the combinations, a 1 will be output. So if the two input bits each have a roughly 50-50 chance of being 0 or 1, then so too will the output bit.

4. Actually choosing which number to use as `p` can be a black art, but in many cases those mentioned will work fairly well. If you know something more precise about the number distributions, you may be able to choose a better operation to average out the bias. For example, if you know that "`x` is always a fairly random multiple of 8 and `y` is always fairly random", then you could use `(x >> 3) ^ y` (effectively divide `x` by 8 before XORing with `y`).

Editorial page content written by Neil Coffey. Copyright © Javamex UK 2021. All rights reserved.