I was talking to @fjeremic in his github issue he shared some decompiled bytecode. He seems to have found the issue, have a look.
The OpenJ9 JIT compiler takes as input the Java bytecode of the method to be compiled and it translates the bytecode into what we call “trees”. If you have a look at the second comment you can see I’ve added some options to the java
command:
bin/java -Xjit:'limit={com/travisfw/voxelsphere/lwjgl3/Vector3.taxiDistance()F}(count=0,lastOptIndex=5,lastOptSubIndex=1,traceFull,log=log.trace),disableAsynccompilation,enablefastscorchingrecompilation' -Xnoaot -Xmx4G -Xms4G -jar test.jar
The limit
option takes a regex as input and it only JIT compiles methods that match the regex. Since in this case we gave it an exact signature, only one method will ever be JIT compiled. This isolates the problem. To find this method I did A/B testing on a list of compiled methods using a binary search to find which method we have to compile in order to exhibit the bug. This is how I originally arrived at that signature.
Moving on after the curly braces specifying the method regex come the parenthesis in which we can apply options to that particular compile. In this case we added a few of them. count=0
says to start compiling this method on the 0th (first) invocation of the method in the JVM interpreter. This just means we will compile it right away. lastOptIndex=5,lastOptSubIndex=1
limits optimizations. There are several hundred optimizations in the OpenJ9 JIT compiler which are executed in sequential order. This option says to stop performing optimizations after the 5th one, and within the 5th optimization stop performing transformations after the 1st one. traceFull,log=log.trace
says to generate a JIT trace file on disk by name of log.trace
and to trace everything (full).
disableAsynccompilation
disables asynchronous compilation, meaning when a compile is requested the JVM will block until the compile is finished, and then it will execute the compied body. This prevents the method being interpreted while the JIT compile is happening in a background thread. If this were to happen we risk the program completing and exiting before we have finished compiling the method. Finally, this particular problem happen at a high optimization level, we have 5: cold, warm, hot, very-hot, scorching. enablefastscorchingrecompilation
lowers the thresholds needed to reach a scorching compilation which throws every single optimization we have (even the quite CPU expensive ones) at the method.
All in all this command will generate a log.trace
file with very useful information for OpenJ9 JIT developers. The file is textual and you can open it up in any editor. It has multiple sections. One of the first things you will see is the input bytecodes though. It looks like this:
=======>com/travisfw/voxelsphere/lwjgl3/Vector3.taxiDistance()F
<ilgen
method="com/travisfw/voxelsphere/lwjgl3/Vector3.taxiDistance()F">
<request> {OrdinaryMethod(<NULL>)} </request>
+------------- Byte Code Index
| +-------------------- OpCode
| | +------------- First Field
| | | +------------- Branch Target
| | | | +------- Const Pool Index
| | | | | +------------- Constant
| | | | | |
V V V V V V
0, JBaload0getfield
1, JBgetfield 13
4, JBfstore1
5, JBiconst0
6, JBistore2
7, JBfload1
8, JBinvokestatic 24
11, JBaload0getfield
12, JBgetfield 15
15, JBfstore1
16, JBiconst0
17, JBistore2
18, JBfload1
19, JBinvokestatic 24
22, JBfadd
23, JBaload0getfield
24, JBgetfield 17
27, JBfstore1
28, JBiconst0
29, JBistore2
30, JBfload1
31, JBinvokestatic 24
34, JBfadd
35, JBreturn1
Then we translate the bytecode into “tree” intermediate language (IL) representation:
n1n BBStart <block_2> [ 0x11957580] bci=[-1,0,7] rc=0 vc=0 vn=- li=- udi=- nc=0
n5n fstore <auto slot 1>[#368 Auto] [flags 0x5 0x0 ] [ 0x119576c0] bci=[-1,4,7] rc=0 vc=0 vn=- li=- udi=- nc=1
n4n floadi com/travisfw/voxelsphere/lwjgl3/Vector3.x F[#367 final Shadow +8] [flags 0xa0605 0x0 ] [ 0x11957670] bci=[-1,1,7] rc=1 vc=0 vn=- li=- udi=- nc=1
n3n aload this<'this' parm Lcom/travisfw/voxelsphere/lwjgl3/Vector3;>[#366 Parm] [flags 0x40000107 0x0 ] (X!=0 sharedMemory ) [ 0x11957620] bci=[-1,0,7] rc=1 vc=0 vn=n7n istore <auto slot 2>[#369 Auto] [flags 0x3 0x0 ] [ 0x11957760] bci=[-1,6,7] rc=0 vc=0 vn=- li=- udi=- nc=1
n6n iconst 0 [ 0x11957710] bci=[-1,5,7] rc=1 vc=0 vn=- li=- udi=- nc=0
n10n ResolveCHK [#307] [ 0x11957850] bci=[-1,8,7] rc=0 vc=0 vn=- li=- udi=- nc=1
n9n fcall java/lang/Math.abs(F)F[#370 unresolved notAccessed static Method] [flags 0x400 0x0 ] [ 0x11957800] bci=[-1,8,7] rc=2 vc=0 vn=- li=- udi=- nc=1
n8n fload <auto slot 1>[#368 Auto] [flags 0x5 0x0 ] [ 0x119577b0] bci=[-1,7,7] rc=1 vc=0 vn=- li=- udi=- nc=0
n13n fstore <auto slot 1>[#368 Auto] [flags 0x5 0x0 ] [ 0x11957940] bci=[-1,15,7] rc=0 vc=0 vn=- li=- udi=- nc=1
n12n floadi com/travisfw/voxelsphere/lwjgl3/Vector3.y F[#371 final Shadow +12] [flags 0xa0605 0x0 ] [ 0x119578f0] bci=[-1,12,7] rc=1 vc=0 vn=- li=- udi=- nc=1
n11n aload this<'this' parm Lcom/travisfw/voxelsphere/lwjgl3/Vector3;>[#366 Parm] [flags 0x40000107 0x0 ] (X!=0 sharedMemory ) [ 0x119578a0] bci=[-1,11,7] rc=1 vc=0 vn
n15n istore <auto slot 2>[#369 Auto] [flags 0x3 0x0 ] [ 0x119579e0] bci=[-1,17,7] rc=0 vc=0 vn=- li=- udi=- nc=1
n14n iconst 0 [ 0x11957990] bci=[-1,16,7] rc=1 vc=0 vn=- li=- udi=- nc=0
n18n ResolveCHK [#307] [ 0x11957ad0] bci=[-1,19,7] rc=0 vc=0 vn=- li=- udi=- nc=1
n17n fcall java/lang/Math.abs(F)F[#370 unresolved notAccessed static Method] [flags 0x400 0x0 ] [ 0x11957a80] bci=[-1,19,7] rc=2 vc=0 vn=- li=- udi=- nc=1
n16n fload <auto slot 1>[#368 Auto] [flags 0x5 0x0 ] [ 0x11957a30] bci=[-1,18,7] rc=1 vc=0 vn=- li=- udi=- nc=0
n23n treetop [ 0x11957c60] bci=[-1,27,7] rc=0 vc=0 vn=- li=- udi=- nc=1
n19n fadd [ 0x11957b20] bci=[-1,22,7] rc=2 vc=0 vn=- li=- udi=- nc=2
n9n ==>fcall
n17n ==>fcall
n22n fstore <auto slot 1>[#368 Auto] [flags 0x5 0x0 ] [ 0x11957c10] bci=[-1,27,7] rc=0 vc=0 vn=- li=- udi=- nc=1
n21n floadi com/travisfw/voxelsphere/lwjgl3/Vector3.z F[#372 final Shadow +16] [flags 0xa0605 0x0 ] [ 0x11957bc0] bci=[-1,24,7] rc=1 vc=0 vn=- li=- udi=- nc=1
n20n aload this<'this' parm Lcom/travisfw/voxelsphere/lwjgl3/Vector3;>[#366 Parm] [flags 0x40000107 0x0 ] (X!=0 sharedMemory ) [ 0x11957b70] bci=[-1,23,7] rc=1 vc=0 vn
n25n istore <auto slot 2>[#369 Auto] [flags 0x3 0x0 ] [ 0x11957d00] bci=[-1,29,7] rc=0 vc=0 vn=- li=- udi=- nc=1
n24n iconst 0 [ 0x11957cb0] bci=[-1,28,7] rc=1 vc=0 vn=- li=- udi=- nc=0
n28n ResolveCHK [#307] [ 0x11957df0] bci=[-1,31,7] rc=0 vc=0 vn=- li=- udi=- nc=1
n27n fcall java/lang/Math.abs(F)F[#370 unresolved notAccessed static Method] [flags 0x400 0x0 ] [ 0x11957da0] bci=[-1,31,7] rc=2 vc=0 vn=- li=- udi=- nc=1
n26n fload <auto slot 1>[#368 Auto] [flags 0x5 0x0 ] [ 0x11957d50] bci=[-1,30,7] rc=1 vc=0 vn=- li=- udi=- nc=0
n30n freturn [ 0x11957e90] bci=[-1,35,7] rc=0 vc=0 vn=- li=- udi=- nc=1
n29n fadd [ 0x11957e40] bci=[-1,34,7] rc=1 vc=0 vn=- li=- udi=- nc=2
n19n ==>fadd
n27n ==>fcall
n2n BBEnd </block_2> [ 0x119575d0] bci=[-1,35,7] rc=0 vc=0 vn=- li=- udi=- nc=0
These trees then go through the series of optimizations and then get fed into the backend code generator which performs instructions selection, register assignment, and binary encoding, among several other passes.
We use this tree representation to analyze and manipulate the program in the JIT compiler and to carry out optimizations. The trace files we use to see how the method morphs as it transitions from optimization to optimization. This is what was used to identify the problem described in the OpenJ9 issue.
Hope that sheds some light on the debugging tools and techniques we use over at the OpenJ9 project and how JIT compiles within the JVM are debugged when there are problems. Feel free to ask any further questions if you’re curious!
Really nice, thanks a lot for such a detailed explanation.