Context and process switching
Following our discussion of thread scheduling and Java,
we now turn to look in more detail at the issue of context switching. Roughly speaking,
this is the procedure that takes place when the system switches between threads running on
the available CPUs.
Switching between threads will have some overhead:
- the thread
scheduler must actually manage the various thread structures and make decisions about
which thread to schedule next where, and every time the thread running on a CPU
actually changes— often referred to as a context switch—
there'll be some negative impact due to e.g. the interruption of the instruction
pipeline or the fact that the processor cache may no longer be relevant;
- switching between threads of different processes
(that is, switching to a thread that belongs to a different process from the
one last running on that CPU) will
carry a higher cost, since the address-to-memory mappings must be changed, and
the contents of the cache almost certainly will be irrelevant to the
Context switches appear to typically have a cost somewhere between 1 and 10 microseconds
(i.e. between a thousandth and a hundredth of a millisecond) between the fastest and
slowest cases (same-process threads with little memory contention vs different processes).
So the following are acceptable:
- a modest number of fast-case switches—
e.g. a thousand per second per CPU will generally be much less than 1% of CPU usage for the
context switch per se;
- a few slower-case switches in a second, but where each
switched-in thread can do, say, a milliseconds or so of worth of real work (and ideally
several milliseconds) once switched in, where the more memory addresses the thread
accesses (or the more cache lines it hits), the more milliseconds we want it to run interrupted for.
So the worst case is generally where we have several "juggling" threads which each time they
are switched in only do a tiny amount of work (but do some work, thus hitting
memory and contending with one another for resources) before context switching.
What causes too many slow context switches in Java?
Every time we deliberately change a thread's status or attributes (e.g. by sleeping,
waiting on an object, changing the thread's priority etc), we will cause a context switch. But usually
we don't do those things so many times in a second to matter.
Typically, the cause of excessive context switching comes from contention on shared resources,
particularly synchronized locks:
- rarely, a single object very frequently synchronized on could become a bottleneck;
- more frequently, a complex application has several different objects that are
each synchronized on with moderate frequency, but overall, threads find it difficult to make
progress because they keep hitting different contended locks at regular intervals.
The second case is generally worse, because the juggling threads, each time they make
a tiny bit of progress, fight for shared CPU cache, thus making each other less efficient
each time they're switched in.
Avoiding contention and context switches in Java
Firstly, before hacking with your code, a first course of action is
upgrading your JVM, particularly if you are not yet using
Java 6. Most new Java JVM releases have come with improved synchronization
Then, a high-level solution to avoiding synchronized lock contention
is generally to use the various classes from the Java 5 concurrency framework
(see the java.util.concurrent package). For example,
instead of using a HashMap with appropriate synchronization,
can easily double the throughput with 4 threads and treble it with 8 threads
(see the aforementioned link for some ConcurrentHashMap performance measurements).
A replacement to synchronized with often better
concurrency is offered with various explicit
lock classes (such as ReentrantLock).
At a lower level, solutions include holding on to locks for less time
and (as part of this), reducing the "housekeeping" involved in managing a lock.
The Java 5 atomic
classes such as AtomicInteger
effectively provide a way to access a shared variable with "less housekeeping",
thus improving throughput.