When is it safe to omit synchronization?
The inclusion of this section may seem a bit surprising given the previous sections.
So far, we've been trying to hammer home the fact that whenever you have shared data,
there must be some synchronization mechanism, be it via the
the volatile keyword, or one of
the new concurrency utilities such as the Java 5 atomic classes, components such as explicit locks or Semaphores etc.
We've tried to stress that "patterns" that try to avoid any
kind of synchronization, such as the infamous double-checked locking, are wrong.
So now I want to introduce an interesting exception:
You can access a shared variable without synchronization iff
your program performs adequately in all possible outcomes resulting
from not having the synchronization.
So what do we mean by this? Well, imagine we have a single integer value, x. We
are going to allow multiple threads to read and write it concurrently, but
we will not declare it volatile or synchronize on it in any way.
Each concurrent thread will perform an operation consisting of:
- reading the current value of x;
- performing some calculation to get the "next" value of x;
- storing that value back into x.
Now, depending on the JVM and architecture we're running on, any of the
following could happen:
- Possibility 1: total optimisation in the VM
- The JVM could completely optimise away the reads and writes to memory,
meaning that each thread effectively worked on its own copy of x,
as though x was thread-local. The general principle
is: if you don't tell the JVM that a variable is accessed by different threads, then
it is allowed to assume that they aren't.
- Possibility 2: JVM hits main memory each time and CPU guarantees of visibility
- On the other end of the scale, the JVM might not make any optimisation at all,
and the CPU might guarantee that reads/writes to a non-volatile variable
effectively behave as though they were volatile— in this case:
- we have a plain, boring old race condition, in which two threads executing in parallel
could be "very unlucky" and both read and then write the same value, effectively
"missing an update";
- how likely we are to hit the race condition
depends on the time taken to calculate the new value of x relative to the
frequency the operation is performed;
- Possibility 3: JVM hits main memory, but memory barrier constraints in architecture
- The JVM might not make such an optimisation, but details of the underlying
architecture might mean that writing of the new value of the variable was
delayed until the next memory barrier (in Java terms, generally
the next write to a volatile variable,
or exit from a synchronized block by that thread); in this case:
- we still
have a race condition;
- but the chance of hitting it now also depends on the
frequency with which the parallel threads hit a memory barrier.
If this sounds a bit complicated, it is, and that's why the rule of thumb is
not to try and omit synchronization unless you're really sure it's safe!
But if we could show that our code behaves correctly in all three cases, then
we could potentially leave out synchronization.
Coming back to the above possibilities,
the idea of having a variable where it doesn't effectively matter whether it behaves as a
thread-local variable or not may sound a bit strange. But it turns out that Sun's JDK
implementation has at least one example.
Example: implementation of ConcurrentSkipListMap
An interesting example occurs in the implementation of ConcurrentSkipListMap. We won't go into
all the details of the skip list algorithm here. But essentially, every time an item is
added to the map, a random number needs to be generated to decide on exactly how
the new object is added to the skip list's internal structure.
The random number generation
algorithm used is Marsaglia's XORShift, which uses a single integer state variable
(i.e. we essentially have a read-calculate-set each time). Since the aim is to eliminate
contention on put operations as much as possible, it wouldn't be desirable
to introduce an extra lock on the random number generator. An option could be
an AtomicInteger, or possibly even plain volatile variable with the
risk of occasional duplicates when a race condition occurred.
But in fact, what the designers decided was "sod it, let's just use a plain old
integer variable". No synchronization. No volatile. No atomic variable. No nothing.
For the random number generation in ConcurrentSkipListMap, this
is generally an appropriate decision because:
- if the JVM/CPU behaves as in case (1), then each thread has its own random
number generator... so what?
- in case (2), we get a race condition, but each number is generated in a very
small number of operations, so the risk is small;
- in case (3), the ConcurrentSkipListMap does access other
volatile references inside its internal structure as a matter of course, so again,
the risk of a race condition
isn't actually so great;
- in either case, the consequences of a race condition on the random number generator
would be a duplicated number, and hence a slightly suboptimal skip list— provided this
didn't happen too often, it wouldn't tend to matter.
After all this thought process, I'm sure you'll agree that cases where it's safe to leave out
synchronization are going to be rare and difficult to justify. This case in the JDK has surely
involved some thrashing things out among concurrency experts working on the
collections library. The take-home message should generally be "don't do this at home".