Strong (non-secure) hash functions

Having defined the related notions of strong and secure hash functions, we'll look at an example of a hash function that we would designate strong but not secure. In other words, collisions cropping up by chance are negligible in everyday use, but we could easily calculate or find items (even if by trial and error) that would have collisions if we deliberately wanted to.

Typical uses of a strong, non-secure, hash function would be where:

A strong hash function will tend to have these properties:

In practice, these constraints nearly always point towards a 64-bit hash function. The width clearly needs to be "several bits more than 32". Other widths around 64 would be equally or even more adequate, but all modern processors have support for 64-bit arithmetic.

An example 64-bit hash function

As an example, we'll look a Java implementation of the 64-bit hash function suggested in Numerical Recipes (3rd Edition). The essential idea is as follows:

The authors suggest initialising the table with an XORShift generator, seeded with what is (presumably) a value that they found to give good results. (XORShift generators have a slight weakness that they can go through sequences with few bits set; I assume that the authors found this starting value to give numbers that avoided that property.) In Java, using their start value and XORShift parameters, we could code this as follows:

public class StrongHash {
  private static final long[] byteTable;
  
  static {
    byteTable = new long[256];
    long h = 0x544B2FBACAAF1684L;
    for (int i = 0; i < 256; i++) {
      for (int j = 0; j < 31; j++) {
        h = (h >>> 7) ^ h;
        h = (h << 11) ^ h;
        h = (h >>> 10) ^ h;
      }
      byteTable[i] = h;
    }
  }

Now, to calculate the hash code of some byte sequence, we proceed as mentioned:

private static final long HSTART = 0xBB40E64DA205B064L;
private static final long HMULT = 7664345821815920749L;

public static long hashCodeOf(byte[] bytes) {
  long h = HSTART;
  final long hmult = HMULT;
  final long[] ht = byteTable;
  for (int i = 0; i < bytes.length; i++) {
    h = (h * hmult) ^ ht[bytes[i] & 0xff];
  }
  return h;
}

Recall that ANDing with 255 (0xff hex) converts a (signed) Java byte into an unsigned value in the range 0-255, which we can then use as an index into the array of random numbers.

To produce a hash of other objects using this method, we have a couple of options:


1. For example, a hash function that simply "multiplies and adds" successive values directly (such as that used by the Java String class, or those auto-generated by IDEs such as Eclipse) is prone to accumulating zeroes in its lower bits when successive bytes have zeroes in those bits (as an example, look at what happens when you call hashCode() on strings consisting of sequences of spaces).