How does java.util.Random work and how good is it?
The java.util.Random class implements what is generally called a
linear congruential generator (LCG). An LCG is essentially a
formula of the following form:
numberi+1 = (a * numberi + c) mod m
In other words, we begin with some start or "seed" number which ideally is "genuinely
unpredictable", and which in practice is "unpredictable enough".
For example, the number of milliseconds— or even nanoseconds—
since the computer was switched on is available on most systems. Then, each time we want a random
number, we multiply the current seed by some fixed number, a, add another fixed number,
c, then take the result modulo another fixed number, m. The number a is
generally large. This method of random number
generation goes back pretty much to the dawn of computing1. Pretty much every
"casual" random number generator you can think of—
from those of scientific calculators to 1980s home computers to currentday C and Visual Basic
library functions— uses some variant of the above formula to generate its random numbers.
LCG parameters used by java.util.Random
The actual parameters used by java.util.Random are
essentially taken from the UNIX rand48 generator
(though with a slightly different seeding function). For
reasons discussed later, only the top 32 bits of each 48 bits
generated are used.
With these parameters, the resulting random number generator appears to
be about as "good as it gets" for an LCG.
Depending on the values chosen for a, c and and m, the quality of
random numbers produced by this method varies between "unbelievably disastrous" and
"OK for casual applications". For practical reasons, it is generally common to do one of the following:
- make a
close to a power of 2 (so that the multiplication can be performed by shifting and adding/subtracting2),
or;
- make m a power of 2, often the register size of the machine (such as 232 for
a 32-bit machine) so that the modulo is carried out either "for free", or at worst via an AND
operation rather than an expensive division3.
With or without these constraints, values for the parameters are then generally sought so that:
- every possibly value between 0 and m-1 inclusive is generated before the pattern
repeats itself;
- the numbers generated are as "statistically random" as we can get them (see below).
Since for a given "current seed" value, the "next seed" will always be completely
predictable based on that value, the series of numbers must repeat after at most m
generations. This is called the period of the random number generator. In
the case of java.util.Random, m is 248 and the other values
have indeed been chosen so that the generator has its maximum period. Therefore:
The period of the java.util.Random generator is 248.
In decimal, 248 is a few hundred million million. That might sound like enough—
and it is for certain applications— but it does mean some quite severe limitations in
other cases. For example, consider an application where you pull out a number of 2-integer
pairs (and where you use the full range of the integer). One integer has 232
possible values. So the number of possible combinations of 2-integer pairs is 232 * 232,
or 264. In other words, java.util.Random will not be able to
produce every possible combination. Of course, even a generator that produced "perfect" random
numbers with a 248 period would have this limitation.
For some testing or scientific applications, that would be bad enough. But it turns out that
with LCGs, things are actually worse:
- When taking combinations of values, e.g. for coordinates, the resulting
pairs, triples etc always have a particular mathematical relationship, sometimes
described as "falling in the planes".
- Not all bits are produced with equal randomness: the lower the bit
in the number generated, the less "random" it actually is. This can introduce artefects
if you write code such as if (r.nextInt() > = 2) {}. See the section
on the randomness of bits with LCGs for an illustration
and guidelines on minimising this problem.
Next...
See the two pages linked to above for more details on the flaws of the LCG method.
For a better alternative that is trivial to implement in a few lines of Java, see
the XORShift generator.
See also:
- the introduction to random numbers in Java, in which
we gave a checklist of dos and dont's of using java.util.Random;
- a look at random sampling: the problem of
picking x random choices from a total of n, a problem that many
people implement inefficiently or wrongly, and whose correct solution turns out to
be quite simple.
1. It is generally attributed to Dick Lehmer, who appears to have intoduced it
formally in a 1948 conference paper.
2. For example, 65539 is 216+3. So 65539x can be calculated as x<<16+x+x+x. Shift and addition/subtraction instructions are generally much faster operations than mutliplication.
3. For example, to calculate a number modulo 248— the modulus chosen in
java.util.Random, we AND it with (248-1).