Concurrent structures and collections in Java 5

Usage and performance of ConcurrentHashMap (1)

A key structure implemented as part of the Java Collections API is the HashMap. It allows you to 'map' or associate keys with values and crops up in all sorts of uses such as:

HashMaps can be very important on server applications for caching purposes. And as such, they can receive a good deal of concurrent access. Before Java 5, the standard HashMap implementation had the weakness that accessing the map concurrently meant synchronizing on the entire map on each access. Synchronizing on the whole map fails to take advantage of a possible optimisation: because hash maps store their data in a series of separate 'buckets', it is in principle possible to lock only the portion of the map that is being accessed. This optimisation is generally called lock striping.

Java 5 brings a hash map optimised in this way in the form of ConcurrentHashMap. A combination of lock striping plus judicious use of volatile variables gives the class two highly concurrent properties:

To illustrate the benefits of ConcurrentHashMap over an ordinary (but synchronized) HashMap, Figure 1 shows the results of a test to measure the difference in throughput between a synchronized HashMap1 and a ConcurrentHashMap. In each case, a given number of threads concurrently sit in a tight loop choosing a random key and seeing if that key is in the map. If not, the thread has an approximately 10% chance of adding a mapping of that key to the map; if the key is already mapped, there is a 1% chance that the mapping will be removed. After running a few thousand iterations, each thread sleeps for a small random number of milliseconds before running the loop again. Thus, the program aims to simulate a moderate-ish amount of contention on the map, as might be experienced by an important cache map on a busy server. The "throughput" measurement is the total number of accesses (actually in thousands of accesses) to the map performed by all threads together in two seconds.

Figure 1: Throughput of ConcurrentHashMap vs a synchronized HashMap
under moderate contention as might be experienced by a frequently referenced cache on a busy server.

Figure 1 shows results of this test running in Java 1.5.0_08 under Linux on a dual processor Xeon server with HyperThreading enabled2. The improvement to throughput from using a ConcurrentHashMap in this case is clear: up to 4 threads, throughput more or less doubles with a doubling of threads when using a ConcurrentHashMap. In contrast, it takes four times the number of threads to only double throughput with a synchronized HashMap. After four threads, the throughput with a synchronized HashMap rapidly plateaus, whilst ConcurrentHashMap continues to give more throughput– albeit at a slower rate– as more threads are added. Goetz et al report more or less similar results on an 8-processor Sparc V880 (pp. 242-3): throughput with a synchronized map plateaus quickly (and in fact drops with more than one thread), whilst throughput with a ConcurrentHashMap continues to climb up to around 16 threads (twice as many threads as processors). Note that their test produces more contention on the map than in my test here.

The concurrency parameter

When constructing a ConcurrentHashMap, you can pass in a concurrency parameter which is intended to indicate "the estimated number of concurrently updating threads". The default concurency is 16 threads. The map is essentially split into this many segments (or to the nearest higher power of two).

Interestingly in the test here, there is a small but consistent advantage in overestimating the number of updating threads. The blue line shows throughput for the case where the estimate equals the actual number of threads, whereas the purple line shows results for an estimate of double the number of threads. (This may explain why Sun chose a default of 16 even though most machines currently have far fewer processors than this.)

A result which I am currently unsure how to explain is that increasing the concurrency parameter in the above test increases throughput even with only one thread! This is apparent in the graph above, and was consistently noticed in other experiments I ran. (In one extreme case, for example, changing the concurrency level from 1 to 64 gave about 30% extra throughput.) So if we optimise for multiple threads, it appears that in this case, single-threaded access isn't penalised.

More on using ConcurrentHashMap

Now we have seen the performance benefits of ConcurrentHashMap, on the next page, we'll look further at the functionality of ConcurrentHashMap.

Notes:
1. The synchronized variant of HashMap was created with the Collections.synchronizedMap(new HashMap(...)) wrapper.
2. i.e. depending on what instructions they are running, the machine has hardware support for executing between 2 and 4 threads simultaneously at a given instant.


Written by Neil Coffey. Copyright © Javamex UK 2008. All rights reserved.