Improving our hash function
On the previous page, we looked at the structure of a hash map, which assigns each key/value pair to a number of "buckets" or linked lists based on the hash function.
In our simple but impractical example, we took the length of the string as the hash function. If we solve the problem of collisions by having a linked list in each bucket, then taking the string length as the hash function will theoretically work. But it has an obvious problem if, say, we want our keys to be, say, 100,000 words of English. In this case, our linked lists at array position 6, as well as those corresponding to other common string lengths, will still have thousands of entries, while higher numbers will have practically none. Sure, it's better to scan a list of 10,000 entries than a list of 100,000 when looking for the key. But really, our hash function of taking the string length isn't adequate. We need to choose a better hash function that ideally has these properties:
In practice, we combine these problems and define our task as coming up with a hash function that distributes hash codes evenly over the entire range of possible integers (in Java, that means a 4-byte number, or one with about 4 billion possible values)2. Then we assume that whatever the size of the array in our hash map (whatever the modulus), the mappings will be distributed more or less evenly.
On the next page, we'll look at the string hash function used by the Java String class, and examine some of the features that make it more adequate.
Written by Neil Coffey. Copyright © Javamex UK 2012. All rights reserved. If you have any feedback on the Java collections tutorials in this section or about the content of this site in general, please leave a message on the Javamex forum.