Search this site


 Home  JNI Intro  JNI data types  JNI overhead

Search this site:
Threads Database Profiling Regular expressions Random numbers Compression Exceptions C Equivalents in Java

 What do you think of this article? Did it help you? Found a mistake? Feedback and suggestions here

The overhead of native calls in Java

Compared to a pure Java method call, calling a user-written native method usually has a significant overhead. The reasons for this are as much to do with optimisations that the JVM can't make compared to regular Java methods:

  • the JVM can't inline the native method;
  • the JVM doesn't know enough about the method to make optimisations that it could make when compiling a regular Java method (for example, it has to assume that all of the parameters passed in are always used);
  • the JVM can't make other optimisations that it could make if it were dynamically compiling the code (e.g. compiling a constant parameter is a constant operand to a machine instruction rather than placing it on the stack and reading it off again);
  • in order to make the call into the DLL or library, the JVM may have to perform extra work, such as rearranging items on the stack.

What all this boils down to is that as of Hotspot 1.6.0, a call to a native method takes just over 200 clock cycles. Some more precise timings I made on a 1.86 GHz Pentium under Windows XP are shown in the following table1. I took timings for calls to three different static native methods, which took one, three and five integer parameters respectively. As the figures show, the majority of the overhead is in the act of making a native call per se rather than in placing individual parameters on the stack:

No int parameters
to native method
Clock cycles / JNI call
1234
3239
5244
JNI call overhead under Windows XP

So, how good or bad is 200 clock cycles? Well, for an occasionally-called method that in turn calls a Windows API call, this overhead of the Java/native interface is surely negligible. The cases where more consideration is needed are, for example, methods performing mathematical operations that we might have nativised in the hope of a speedup. We must take into account, for example, that:

  • a typical basic arithmetic operation typically takes 2 clock cyles or thereabouts2 on Intel hardware;
  • in many cases (e.g. a method that performs a simple operation on its parameters), Hotspot and other modern JVMs do a very effective job of effectively optimising away the cost of a pure Java method call.

So this means that a native method performing a few simple operations on its parameters probably won't be worthwhile.

Native methods in the standard libraries

The eagle-eyed will have noticed various native methods in the JDK libraries that perform relatively simple tasks. For example, ByteBuffer.put() writes a single byte to memory; we really don't want a 200+ clock cycle overhead to such a simple method.

For this reason, native methods in the standard library don't necessarily go through the JNI but can actually be treated specially by the JIT compiler. For example, under Hotspot (and presumably other good JIT-compiling JVMs), the various ByteBuffer methods are actually compiled directly to single machine instructions as appropriate.


1. The native method in question simply returned a constant value. You should always take timings such as these with appropriate quantities of salt: they're quite difficult to make reliably. I took reasonable precautions (taking nanosecond timings of a large number of repeated calls; taking mean measurements of a number of runs; ignoring measurements while the JVM was "warming up") and encouragingly, the actual calculated number of cycles/call came extremely close to a whole number of cycles (for example, in the last case, the actual calculated value was 244.032 clock cycles to 3 decimal places).
2. On modern CPU architectures, the number of clock cycles required by a given instruction is a little complex because it depends, for example, on how quickly the required data is made available to the given part(s) of the CPU and on those components becoming avaialble; these factors in turn depend on surrounding instructions. But for example, a series of additions on registers can typically run at the "burst" speed of 2 clock cycles per instruction.


Written by Neil Coffey. Copyright © Javamex UK 2009. All rights reserved.