Monday 30 December 2013

JVM Escape Analysis

Hi there, and welcome. This content is still relevant, but fairly old. If you are interested in keeping up-to-date with similar articles on profiling, performance testing, and writing performant code, consider signing up to the Four Steps to Faster Software newsletter. Thanks!
 
 
I recently came across a nice example of Oracle's Hotspot JVM using escape analysis in order to perform stack allocation, rather than heap allocation.

I'm sure that in many codebases across the planet, the following code fragments are familiar territory:



When trying to write garbage-free, or low-garbage code (as we do at LMAX), it is necessary to think about any code that may allocate unnecessary objects. Every invocation of the dispatchEvent method in the code above will cause the creation of an Iterator object (since the for-loop construct is just syntactic sugar for List.iterator().hasNext()/next()).

Using a byte-code viewer (I use ASM Bytecode Outline) to inspect the dispatchEvent method shows the creation and use of an Iterator object (via List.iterator()):

   L5
    LINENUMBER 32 L5
    ALOAD 1
    INVOKEINTERFACE java/util/List.iterator ()Ljava/util/Iterator;
    ASTORE 3
   L6
   FRAME APPEND [java/util/Iterator]
    ALOAD 3
    INVOKEINTERFACE java/util/Iterator.hasNext ()Z
    IFEQ L7
    ALOAD 3
    INVOKEINTERFACE java/util/Iterator.next ()Ljava/lang/Object;
    CHECKCAST epickrram/example/Listener
    ASTORE 4

This would seem like an excellent candidate for escape analysis - if the compiler is smart enough, it should allocate the Iterator object on the stack, and save having to touch the heap.

This can be tested by running a simple test while printing out the GC activity:


The output with -XX:+DoEscapeAnalysis is:

Event Count: 10000000 

The output with -XX:-DoEscapeAnalysis is:

0.113: [GC [PSYoungGen: 23552K->416K(27456K)] 23552K->416K(90176K), 0.0010690 secs] [Times: user=0.01 sys=0.00, real=0.00 secs] 
Total time for which application threads were stopped: 0.0012140 seconds
...
0.451: [GC [PSYoungGen: 94576K->0K(94656K)] 94592K->336K(157376K), 0.0011190 secs] [Times: user=0.01 sys=0.00, real=0.00 secs] 
Total time for which application threads were stopped: 0.0012220 seconds

Event Count: 10000000


So we can see that the JVM is very kindly using stack allocation for the Iterator objects when escape analysis is enabled (the default since JDK1.6).

In the particular example I was looking at, I noticed that we only ever added a single Listener instance, so the length of the listeners list was always one. In this case it is wasteful to construct the new Iterator instance, even if it is allocated on the stack. The list of Listener objects can be replaced by a single reference to a Listener:


This small optimisation makes an order-of-magnitude difference when compared using a caliper benchmark. The benchmark has two tests - Access, which calls a single listener instance, and Iteration, which iterates over a list of size one.