On the previous page, we discussed some hash code
statistics: specifically the number of **collisions** that we expected if we took
*n* hash codes from a given range of possible values. We mentioned, for example, that
if we take in the order of 100,000 random 32-bit hash codes, we'd expect in the order of 2 collisions,
with 8 collisions if we take about 200,000, 32 if we take 400,000 etc. Recall that we're talking
purely about the number of pairs^{1} of objects with **identical hash codes**, not about the
actual number of **collisions in the hash table** (i.e. the number of buckets that
would have more than one item), which we expect to be much greater. A couple of interesting
possibilities emerge at this point:

- We could use the
**hash code as the key to the hash table**instead of the actual object. This would mean, of course, that there'd be a**risk**that we'd ask for the object mapped to a given key, and actually get the**wrong value**(i.e. one mapped to another object with the same hash code). But the risk would be**calculable**, and might be small enough for some applications. - If the number of collisions was too great for the number of objects we want to key on, we
could use a
**wider hash code**. For example, using 64-bit hash codes would mean we could take in the order of a**billion**random objects before expecting to get a hash code collision (we'd expect an average of 2 collisions with 2^{33}= around 8 billion objects).

So why just store hash codes rather than the keys if we know it's not completely accurate?
Well, partly for speed (comparing two numbers is quicker than comparing two
strings) but mainly to **reduce memory usage** in the case where our keys are relatively
large. Potentially we can also **reduce the number of objects created** to store
the structure, if we're prepared to write our own map implementation: if our keys are
primitive types (`int`s or `double`s), we can actually store them in a
single array. Of course, storing only the hash codes would also mean that we **couldn't iterate
over the keys** as we can with a regular Java hash map or hash set. So the technique would
only be suitable for retrieving the mapped *values*, or for determining *if* a
given key is in the map.

On the next pages, we consider keying on hash code and using using hash codes with a BitSet to significantly reduce memory usage for the purpose of duplicate elimination.

1. We're of course making an approximation: we're not distinguishing between the case
of *n* colliding *pairs*, and the case where more than 2 objects have an identical hash code.
Our calculations are based on the assumption that each collision involves a different *pair* of colliding objects.
For most practical cases, this turns out to be a good approximation.