Java tutorials home  Java cryptography  Encryption intro  Keys  Symmetric encryption  AES/block ciphers  Block modes (ECB, CTR, OFB)  Asymmetric encryption  RSA in Java  Comparison of algorithms  Key sizes  Hash functions
Search this site:
Threads Database Profiling Regular expressions Random numbers Compression Exceptions C Equivalents in Java

 Help us improve this site: Take the Javamex survey!

Secure hash functions in Java (ctd)

Now we've seen an introduction to hash functions, let's look at how to compute a hash in Java. As mentioned, secure hashes are sometimes called message digests. And in fact, the main class for computing them in Java is java.security.MessageDigest. We get an instance of MessageDigest, feed it an array (or several arrays) of bytes, then call its digest() method to get the resulting hash:

byte[] data = ....

MessageDigest md = MessageDigest.getInstance("SHA-1");
md.update(data);
byte[] hash = md.digest();

If we want to check that a given hash we're sent matches one we've just calculated, then we can use the MessageDigest.isEqual() method, passing in the two byte arrays representing the hashes (we could also use Arrays.equals()— it really is just a byte-by-byte comparison!).

Secure hash algorithms

In principle, you can see that the MessageDigest class has a similar pluggable architecture to the Cipher class: we pass in the name of the algorithm we want to use, and the security architecture finds a suitable provider that can fulfil that request. In practice, Sun's JDK ships with only a handful of hashes: MD2, MD5 and several SHA variants (SHA-1, SHA-256, SHA-384 and SHA-512). In fairness, there aren't currently many other choices in any case.

Figure 1 shows the relative performance of these different hash algorithms. As we'll discuss below, to some extent, there's a tradeoff between security and performance.

Figure 1: Performance of standard secure hash functions.
(Timings from a 2GHz Pentium running Java 6 under Windows XP;
each point is actually the mean of 20 measurements.)

Now, we can summarise the general characteristics of these hash functions.

MD2

MD2 is one of the earliest hash functions developed by Ron Rivest at RSA Security. To date, no full attack on MD2 has been published, but attacks have been published on the compression function (one of the components of the hash algorithm). Aside from this partial attack, the main reason for avoiding MD2 is that it is extremely slow compared to other algorithms (see Figure 1).

It is a 128-bit hash, meaning that we would expect to find a collision by chance after hashing 264 sets of data. Many consider this unacceptably low for new applications, considering they may need to cope with the volumes of data that people will be working with several years into the future.

MD5

MD5 is a later hash function developed by Ron Rivest. It is one of the most common hash algorithms in use today. Like MD2, it is a 128-bit hash function but, unlike its predecessor, it is one of the fastest "secure" hash functions in common use, and the fastest provided in Java 6.

Unfortunately, it is now considered insecure. Aside from the relatively small hash size, there are well-published methods to find collisions analytically in a trivial amount of time. For example, Vlastimil Klima has published a C program to find MD5 collisions in around 30 seconds on an average PC. If you need security, don't use MD5!

Although insecure, MD5 still makes a good general strong hash function due to its speed. In non-security applications such as finding duplicate files on a hard disk (where you're not trying to protect against the threat model of somebody deliberately fooling your system), MD5 makes a good choice.

SHA algorithms

SHA (Secure Hash Algorithm) refers collectively to various hash functions developed by the US National Security Agency (NSA). The various algorithms are based on differing hash sizes and (in principle) offer corresponding levels of security:

AlgorithmWidthCharacteristics/comments
SHA-1160 bits

Design based on MD4/MD5. After MD5, probably the next most commonly used hash function.

Insecure against an adversary with considerable resources, but otherwise moderately secure in the short term. A known attack can find collisions with 263 complexity (whereas by chance, we'd expect 280). NIST considers this "plainly within the realm of feasibility for a high resource attacker" and has mandated that federal agencies withdraw use of SHA-1 for certain purposes by 2010 (NIST, 20061).

Depending on the expected confidentiality lifetime of data, new applications should probably avoid SHA-1 if they can, although there is no publically known attack that is practical in the short term.

SHA-256256 bits

Secure by current knowledge. The best partial attack I'm aware of is a reported attack on 24 of the 64 rounds of SHA-256 (Sanadhya & Palash Sarkar, 20082) which the authors say "do not threaten the security of the full SHA-2 family".

Note that SHA-384 is a waste of CPU time: it performs the same calculations as SHA-512 and then disregards some of the bits.

SHA-384384 bits
SHA-512512 bits
Summary of SHA algorithms.

1. See NIST (2006), NIST Comments on Cryptanalytic Attacks on SHA-1.
2. Sanadhya, S. K. & Sarkar, P. (2008), New Collision Attacks against Up to 24-Step SHA-2 in Progress in Cryptology - INDOCRYPT 2008, Springer.


Written by Neil Coffey. Copyright © Javamex UK 2010. All rights reserved.