Secure hash functions in Java
As you probably know if you've worked on certain algorithms such
as hash tables, a hash function
takes pieces of data and "randomly" maps them to integers. Generally, the
resulting integers, called hashes or hash codes, are of some
specified fixed number of bits. For example, Java collections use 32-bit hash codes.
In cryptography applications, we often need a so-called secure hash function.
Secure hash functions are generaelly a subset of hash functions that
fulfil at least two extra criteria:
- it must be computationally impossible to reverse the mapping, that is,
go from a hash code to a message or piece of data that would have generated that
- it must be infeasible for a collision to occur:
that is, for two messages to be found that have the same hash code.
In order to fulfil these criteria (or at least, as a by-product of needing to fulfil these criteria), secure hash functions generally have these characteristics:
- they are slower to compute than the hash codes typically used
to key hash maps;
- they are wider (i.e. have more bits) than weak hash codes.
Secure hash codes are typically 128 bits wide at the very least; compare that, for example,
to the 32-bit codes returned by Java's hashCode() method, or the 64-bit
hash codes recommended for key-less hash maps in Numerical Recipes.
Applications of secure hash functions
Secure hash functions actually have various applications. A very
common case is verifying the integrity of data. When we send some data, we append
a hash of that data; on the receiving end, we re-hash the received data and check
that the computed hash equals that sent; if any of the data has changed then
(with overwhelming probability), the computed hash value will no longer match
the original. Another case is where we need to authenticate some data,
i.e. produce a kind of integrity check that only a party with a given private key
could produce. (In this case, the general solution is to combine a hash code with encryption.)
In other cases, a secure hash function is useful to represent a particular item of data.
For example, for the purpose of checking passwords, we need only store a hash
of that password. When somebody enters their password, if the computed hash of what
they entered matches the hash stored in the password file/database, we assume they
knew the password.
This scheme, sometimes called compare by hash (CBH) can be used to search
for duplicates of data on a hard drive or for synching data between multiple machines.
Similarly, another example are the databases that various law enforcement agencies keep
of known "disapproved" files that they want to search people's hard drives for.
In these applications, keeping a database of the actual file contents, and/or transmitting
and comparing those entire contents, would be impractical. Instead, only hashes
are stored and compared.
More broadly, secure hash functions are useful in a variety of cases where
we need a trapdoor function (i.e. one that cannot feasibly be reversed),
especially where we need one with a limited or fixed-size result.
Next: computing hashes and choosing a hash algorithm
On the next page, we look at two issues: