If we want to check that a given hash we're sent matches one we've just calculated,
then we can use the MessageDigest.isEqual() method, passing in the two
byte arrays representing the hashes (we could also use Arrays.equals()—
it really is just a byte-by-byte comparison!).
Secure hash algorithms
In principle, you can see that the MessageDigest class has a similar
pluggable architecture to the Cipher class: we pass
in the name of the algorithm we want to use, and the security architecture finds
a suitable provider that can fulfil that request. In practice, Sun's JDK
ships with only a handful of hashes: MD2, MD5 and several SHA variants
(SHA-1, SHA-256, SHA-384 and SHA-512). In fairness, there aren't currently many
other choices in any case.
Figure 1 shows the relative performance of these different hash algorithms.
As we'll discuss below, to some extent, there's a tradeoff between security and
performance.
Figure 1: Performance of standard secure hash functions.
(Timings from a 2GHz Pentium running Java 6 under Windows XP;
each point is actually the mean of 20 measurements.)
Now, we can summarise the general characteristics of these hash functions.
MD2
MD2 is one of the earliest hash functions developed by Ron Rivest at RSA Security.
To date, no full attack on MD2 has been published, but attacks have been
published on the compression function (one of the components of the
hash algorithm). Aside from this partial attack, the main reason for avoiding
MD2 is that it is extremely slow compared to other algorithms (see Figure 1).
It is a 128-bit hash, meaning that we would expect to find a collision
by chance after hashing 264 sets of data. Many consider this unacceptably low
for new applications, considering they may need to cope with the volumes of data that people
will be working with several years into the future.
MD5
MD5 is a later hash function developed by Ron Rivest. It is one of the most common
hash algorithms in use today. Like MD2, it is a 128-bit hash function but, unlike its
predecessor, it is one of the fastest "secure" hash functions in common use, and
the fastest provided in Java 6.
Unfortunately, it is now considered insecure. Aside from the relatively small hash size,
there are well-published methods to find collisions analytically
in a trivial amount of time. For example, Vlastimil Klima has published a
C program to find MD5 collisions
in around 30 seconds on an average PC. If you need security, don't use MD5!
Although insecure, MD5 still makes a good general strong hash function due to
its speed. In non-security applications such as finding duplicate files on a hard disk
(where you're not trying to protect against the threat model of somebody deliberately fooling
your system), MD5 makes a good choice.
SHA algorithms
SHA (Secure Hash Algorithm) refers collectively to various hash functions
developed by the US National Security Agency (NSA). The various algorithms are based
on differing hash sizes and (in principle) offer corresponding levels of security: