Little Performance Test

Hello,

recently, I sad down to make a very little performance comparison between Groovy 2.0 with @CompileStatic and Java. See this article. Here are the measurements for Kotlin 0.1.2580:

Kotlin(static if): 1837ms Kotlin(instance if): 1619ms

I know, a lot of people to a lot of very difficult work in a very smart and clever way and then someone comes along with performance and security ans stuff ... But I just wanted to drop this as a note since I was interested in the figures.

Below find the code. Note, that there is no measurement for the “ternary if” case since not meaningful in Kotlin.

open class Fib() {

  class object {
  fun fibStaticIf(n: Int) : Int {
           if(n >= 2)
           return fibStaticIf(n - 1) + fibStaticIf(n - 2)
           else
           return 1
  }
  }

  fun fibIf(n: Int) : Int {
  if (n >= 2)
           return fibIf(n - 1) + fibIf(n - 2);
  else
           return 1;
  }
}

fun main(args : Array<String>)
{
  var start = System.currentTimeMillis();
  Fib.fibStaticIf(40);
  System.out.println("Kotlin(static if): " + (System.currentTimeMillis() - start) + “ms”);

  &nbsp;&nbsp;start = System.currentTimeMillis();
  &nbsp;&nbsp;Fib().fibIf(40);
  &nbsp;&nbsp;System.out.println("Kotlin(instance if): " + (System.currentTimeMillis() - start) + "ms");

}

Guys,

there is good news :-). The code below

class object {   fun fibStaticIf(n: Int) : Int {            return if(n >= 2)            fibStaticIf(n - 1) + fibStaticIf(n - 2)            else            1   }   }

  fun fibIf(n: Int) : Int {
  return if (n >= 2)
           fibIf(n - 1) + fibIf(n - 2)
           else
           1
  }


in contrast compares very well when executed to the performance in Java. Here are the figures for the changed code from above:

Kotlin(static if): 1017ms Kotlin(instance if): 993ms

This looks really good now :-). The reason I didn't do it that way from the beginning was that this did not compile:

return   if(n >= 2)           fibStaticIf(n - 1) + fibStaticIf(n - 2)   else           1

At least the IDE shows some red lines. So I thought I cannot be done at all in Kotlin. Somebody might not have formatted the code that way to begin with. Only some former Smalltalk developers would have had the idea to format the code that way ;-). When playing with the code a bit in order to prepare a post to ask why the code above does not work in Kotlin I figured out that this here does compile, though:

return if(n >= 2)           fibStaticIf(n - 1) + fibStaticIf(n - 2)   else           1

And as already said, the performance figures are really good now :-)

Cheers, Oliver

I created this issue: http://youtrack.jetbrains.com/issue/KT-2685

Thanks a lot for your work. It seems that we must be doing something suboptimal in the byte code, as, ideally, Kotlin should be exactly as fast as Java. We'll look into it

Thanks fo your comments!

- The Kotlin static version is slower than plain Java. Of course, Kotlin isn't using invokestatic, but rather invokevirtual on the class object.

Putting this code into a class object is rather strange. To correspont to Java's "static" case, it should be a plain package-level function. Then the invokevirtual problem will be gone, I think.

- The Kotlin static version becomes faster if we prefix the recursive calls with "this.". Doing so will basically remove the bytecode instructions that fetch the static class object field and the execution speed matches the instanced version. The Kotlin compiler could probably do this automatically.

Good point.

- Decompiling the Kotlin code reveals a lot of extra bytecode instructions, compared to plain Java. These are mostly lots of "nop" and some "athrow"s. Not sure what's going on here, but more bytecode instructions do affect JIT effectiveness (negatively), even if they don't do anything.

Yes, we should clean this up.

Could you please check with master brunch. I've put some optimizations and on my setup performance is the same as Java

Thanks for the changes Alex.

I can confirm performance is now the same between Kotlin and Java (instanced versions), without using this. on the class object calls. I can also confirm that moving the class object methods to the package level matches the performance of the static Java versions.

The redundant bytecodes remain on some of the compiled methods, but I think you didn’t make any changes regarding those. They don’t seem to affect performance in this particular case anyway.

Could you please a bit elaborate on redundant byte code. What exactly do you mean?

I described it in my first reply above, but it seems to have been edited with someone else's reply. Weird.

Anyway, you’ll see what I mean if you decompile the classes generated by Kotlin. As an example, here’s the Java version of fibIf (using JDK7’s javac):

public int fibIf(int n) { //   0   0:iload_1    //   1   1:iconst_2    //   2   2:icmplt          21 //   3   5:aload_0    //   4   6:iload_1    //   5   7:iconst_1    //   6   8:isub            //   7   9:invokevirtual   #4   <Method int fibIf(int)> //   8   12:aload_0    //   9   13:iload_1    //   10   14:iconst_2    //   11   15:isub            //   12   16:invokevirtual   #4   <Method int fibIf(int)> //   13   19:iadd            //   14   20:ireturn    //   15   21:iconst_1    //   16   22:ireturn    }

and here's the same method using Kotlin:

@JetMethod(flags=0x00000010, returnType="I") public final int fibIf(@JetValueParameter(name="n", type="I") int n) { //   0   0:iload_1    //   1   1:iconst_2    //   2   2:icmplt          27 //   3   5:aload_0    //   4   6:iload_1    //   5   7:iconst_1    //   6   8:isub            //   7   9:invokevirtual   #26  <Method int fibIf(int)> //   8   12:aload_0    //   9   13:iload_1    //   10   14:iconst_2    //   11   15:isub            //   12   16:invokevirtual   #26  <Method int fibIf(int)> //   13   19:iadd            //   14   20:ireturn    //   15   21:nop            //   16   22:nop            //   17   23:nop            //   18   24:nop            //   19   25:nop            //   20   26:athrow           //   21   27:iconst_1    //   22   28:ireturn    //   23   29:nop            //   24   30:nop            //   25   31:athrow           //   26   32:nop            //   27   33:nop            //   28   34:nop            //   29   35:athrow           }

You'll notice that the two methods are identical up to instruction #14, but after that the Kotlin instructions are very different.

That's interesting. Seems to be something new introduced in ASM4 probably (part of their compute frame algorithm) - never saw it before. Unlikely to effect performance but still ugly.