In choosing a given key size or hash width,
we need to consider the possibility of collision attacks.
Collision attacks essentially stem from the observation that if you take two sets of
random values from some range, then the size of each set need only be around the square root
of the number of potential values for there to be a good chance of at least one collision,
or value present in both groups. For example, let's say you have a random number generator
that generates numbers fairly between 1 and 65536 (=216). Now, you generate 256 numbers
and call them Group A. And then you generate another 256 numbers and call them Group B. There'll
be a greater than 60% chance that at least one number in Group A is also in Group B.
At present, collision attacks are not generally practical with 128-bit keys
and hashes, but are arguably going to become practical quite soon,
certainly within the required "confidentially lifetime" of
many people's data.
A common variant is called a birthday attack or example of the
birthday problem. It usually refers to observing repetitions that
are likely to occur given enough applications of a particular function or algorithm.
It stems from the observation that if you pick 23 people
at random, there's a better than 50% chance that some pair will share the same birthday.
The phenomenon extends most obviously to hash functions. Let's say we
use a 128-bit MD5 hash to verify the integrity of a message. There are 2128
possible hash codes. But after we have hashed "only" 264 random messages
(the square root of 2128), then just by chance it is more likely
than not that some pair of those messages will have the same hash code. With
SHA-1, a 160-bit hash, then a pair only becomes likely after around 280
messages. (Note that we're just talking about collisions that naturally would occur
by random chance; we're not referring to active attacks against these particular hash functions.)
A similar phenomenon occurs if we encrypt a large number of blocks using
certain block cipher modes. In OFB (output feedback) mode,
then after encrypting in the order of 264 blocks, it's likely that we'll
re-use part of the previous cipher stream. At present, this is an inconceivably large
amount of data; in 20 or 30 years, it may well be a realistic amount of data to encrypt with
a given key.
In a meet-in-the-middle attack, Group A is typically some pre-calculated set of
encrypted blocks, and Group B is a set of "real" encrypted blocks. The idea is to
use the pre-calculated set of blocks to try and find at least one "real" encrypted
block that you can decrypt. A typical version of the attack is as follows:
- we pre-calculate the encrypted version of some known or suspected block of plaintext
(for example, some known file header, known part of a connection protocol etc)
for about 2n/2 possible keys, where the key length is 2n—
note that this is the square root of the possible number of keys, and much much fewer
than half the keys that we'd expect for a brute-force attack;
- then, we sit and make lots of observations: scan lots of encrypted conversations, scan lots of
encrypted files etc; after an average of 2n/2 observations,
we'd expect a collision— i.e. for one of our pre-calculated
encrypted blocks to occur— with over 60% certainty (assuming our prediction of the plaintext was correct,
- you can trade more pre-calculated cipher texts for fewer required
observations and vice versa, or just trade fewer of either for a smaller percentage
chance of getting a collision.
Collision attacks are particularly deadly for DES, with its 56-bit key size and
small block size.
If you pre-calculate 230 keys (about a billion, or about 8 GB of data),
then after 220 observations (about a million), there is around a 1% chance
of a collision; pre-calculating 232 keys gives roughly that chance after
218 observations (around 200,000). These are very feasible calculations
and not an unreasonable number of encryptions in some applications, and are part of the
reason why you shouldn't entrust data of any value to DES.
At present, scaling these calculations up to 128-bit key sizes (and a 128-bit
block size) sounds infeasible. Working with 262 precalculations
and 262 observations gives us around a 6% chance of a collision,
which might be just about the cut-off point for "worth an expensive attack" for
But storing 262 blocks requires about 50 million terabytes
(and we can't compute them fast enough each time to not have to store them).
And we're talking about 4 billion billion observations. For now, this doesn't
constitute a practical attack. But how far-fetched will
these numbers appear in 10 years, 20 years, 30 years...?