Java tutorials home  java.util.Random  Random number generators  XORShift  High quality random  Seeding generators  Entropy  SecureRandom  Random sampling  Random simulations and nextGaussian()

Randomness of bits with LCGs

In addition to the rather serious problem of pairs or triples (etc) of numbers "falling in the planes", random numbers produced by LCGs such as java.lang.Random have another serious drawback: in the numbers produced, randomness depends on bit position. That is, in the numbers produced by (say) Random.nextInt(), the lower bits will be less random than the upper bits.

As a graphic example of the problem, let's suppose we create a 256x256 pixel square, with each pixel coloured either black or white. We'll decide on the colour of the pixel depending on the bottom bit of successive values returned by Random.nextInt(). So the program would look something as follows:

Random r = new Random();
BufferedImage bimg = new BufferedImage(256, 256,
    BufferedImage.TYPE_BYTE_BINARY);
int w = bimg.getWidth();
for (int y = 0; y < bimg.getHeight(); y++) {
  for (int x = 0; x < w; x++) {
    int bit = r.nextInt() & 1;
    bimg.setRGB(x, y, (bit == 0) ? 0 : 0xffffff);
  }
}

So when we run this, we might reasonably expect to see our square filled with "snow" with no discernible pattern. Well, Figure 1 shows what we actually get. We can clearly see long "stripes", "checkerboards" and other patterns with much greater frequency and regularity than we would expect by chance1. This happens because we are using the part of the integer which has the lowest degree of randomness. It also has the lowest period.

We can improve on Figure 1 by taking higher bits. Figure 2 shows what happens if we replace the line in bold with the following:

int b = (r.nextInt() >>> 2) & 1;

The result is better, but there are still some clear "spikes" where runs of the same value occur again with much higher frequency than we'd expect.

Now of course, the degisners of java.lang.Random knew perfectly well about these limitations. Even Figure 1 is actually better than it could have been. When we call nextInt(), this method actually discards the very lower 16 bits of every number it generates: the 32-bit integer that it returns us is actually the top 32 bits of a 48-bit number generated internally. And if we want to generate a series of random bits, then we whould ideally pull them from the part of that 48-bit number that is known to be most random: namely, the topmost bit. And that's precisely what Random.nextBoolean() does. Figure 3 shows the resulting plot if we use Random.nextBoolean().

Figure 1: 256x256 of the low bit of successive calls to Random.nextInt().

Figure 2: 256x256 of the third-lowest bit of successive calls to Random.nextInt().

Figure 3: 256x256 of successive bits returned by Random.nextBoolean().

To get round this problem as best we can, we need to at least do the following:

  • always use the appropriate method for retrieving a random number from java.lang.Random:
    • to generate a random bit, use nextBoolean(), and not some other invention;
    • to generate a random byte, or a random number within a particular range, use the version of nextInt() that takes an upper bound;
  • essentially the same point expressed another way: use all the bits of the number you are given! If you call nextInt() and then only use the bottom few bits, you're actually discarding the part of the number that is most random!
  • avoid the idiom nextInt() % n— if n is a power of 2, then we're effectively taking bits off the bottom. (You should actually avoid this idiom anyway, as for non-powers of 2, it actually biases the results.)
  • avoid using Random for test data where it will pass through cases such as:
    if (randomNumber() >= n) {
       ...
    }
    
    If n is a power of 2, then the condition is effectively saying "test if the number is positive, then test if bit X is set". For example, >= 32 actually equates to "if (the number is positive and) bit 5 is set". Whether the number is positive or negative depends on the setting of the top bit, so will at least introduce some randomness, but we could also be introducing a subtle source of largely non-randomness as illustrated above. So the pattern of when the code inside the block is called could exhibit some of the kinds of patterns that we see in the figures above, with the conditional block being executed in "test" mode with greater regularity, or a greater number of times in succession etc, than will happen in actual usage. For what are now hopefully obvious reasons, you especially shouldn't do a test like this:
    if ((randomNumber() & 0xffff) >= n) {
       ...
    }
    

In any application, the behaviour of java.lang.Random (and LCGs in general) described here is liable to introduce subtle patterns into your "random" data. Clearly, this makes the class unsuitable for "serious" uses of random numbers such as Monte Carlo experiments, gambling applications, cryptography etc. But even for fairly "casual" applications such as generating test data for an application, the disparity in randomness across the bits of generated numbers can cause problems, such as the final point in the list above. So the bottom line is:

If you can't easily predict the impact that this behaviour is going to have on your application, use another generator that produces bits with equal randomness.

1. It's important to consider, of course, that what seem to us as unlikely will occasionally happen "by pure chance". For example, consider a block of 8 pixels. With each pixel having a perfect 0.5 probability of being set, than there is still a 1 in 256 chance that these 8 pixels will form some given pattern, such as a checkerboard. (And the chance of there being "a checkerboard somewhere" is of course greater if we allow, say, either 0101 or 1010 to count.) Similarly, taking 256x256=65536 pixels, by pure chance we would expect on average to find one run of 16 pixels set to the same value. Although our technique for spotting the patterns is informal here, it is clear in Figure 1 that these patterns occur with far greater frequency and regularity than expected by chance.

comments powered by Disqus

Written by Neil Coffey. Copyright © Javamex UK 2013. All rights reserved.