Kotlin native performance

What is the current status with regards to performance of Kotlin Native?

I understand that it’s not reasonable to expect it to be as fast as on the JVM, but I see performance drops of 100 times or so.

My application is implemented as a multiplatform library, and is an implementation of a programming language (called KAP). My benchmarks written in KAP itself and then run on the interpreter written in Kotlin runs on the order of 100 times slower when using the native version compared to the one running on the JVM.

My question is: Is there any plan to make Kotlin Native significantly faster for my use case? The reason I’m asking is because I’m considering dropping support for multiplatform and make my implementation pure JVM. That would make development a lot easier, and no one would want to run the native version anyway with the performance the way it is.

For reference, this is my project. If anyone wants information as to how to run my benchmarks, let me know and I’ll be happy to explain: https://github.com/lokedhs/array

Hello,@Loke! Can you explain further, how should one use those benchmarks? I see the instructions on the project build, but no details unfortunately.

Sure. The test case I used is the classic implementation of fibonacci, implemented in the worst possible way. It’s actually a special test case for lambda functions in my language and not really representative of what normal code looks like. However, since it’s so slow it’s a useful way to test the performance differences between the JVM and native.

To run the JVM version, simply run:

$ ./gradlew gui:run

When the UI appears, type the following (making sure to type return between the 2 lines:

fibtest ← λ{ x←⍵ ◊ if(x≤1) {1} else { (⍞fibtest x-1) + ⍞fibtest x-2} }
⍞fibtest¨⍳26

This should compute the first 26 fibonacci numbers:

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181 6765 10946 17711 28657 46368 75025 121393┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛

On my machine this takes about 1 second.

To try it with the native version, run the following:

$ ./gradlew build
$ array/build/bin/linux/releaseExecutable/array.kexe --lib-path=array/standard-lib

The native version does not load the KAP standard library by default, so it needs another command first, but other than that it’s the same code as in the JVM example:

use("standard-lib.kap")
fibtest ← λ{ x←⍵ ◊ if(x≤1) {1} else { (⍞fibtest x-1) + ⍞fibtest x-2} }
⍞fibtest¨⍳26

On my machine this runs for 50 seconds.

Hello, @Loke! The main performance problem with 1.4-M1 was connected with HashMap performance. It’s already fixed https://github.com/JetBrains/kotlin-native/commit/c485511f1b97aabf8f8496535b6126f7f145435c. This performance improvement will be included in the next 1.4-M3.
Also you can get a bit faster version using experimental allocator (freeCompilerArgs += ["-Xallocator=mimalloc"] in your gradle file).
The combination of 1.4-M3 and experimental allocator speed up your example 9 times on my machine (MacOS, Intel Core i7).
Still K/N version would be slower than JVM version even after this improvement, but detailed profiling of such application is quite inefficient, there is a great number of recursion calls. It seems that K/N GC just works slower with such number of objects, but it would be great if you could separate long working code from your application as separate example after 1.4-M3 is published and you checked with this version.

Thank you very much for your analysis. It was very helpful.
A 9 times performance improvement puts it within a order of magnitude of the JVM performance, which should be considered acceptable. Especially given the fact that my application seems to trigger everything that makes Kotlin native slow. It’s doing deep recursions on very large arrays of boxed numbers.
I have a plan to add an optimisation where arrays of integers or floating point numbers gets specialised implementations using arrays on unboxed numbers. Based on what you have told me, it would seem as though the native version would benefit a lot more from this than the JVM?

Yep, unboxing should help in your example to speed up.

Please, check firstly on your machine after 1.4-M3 is published, because I use another platform and I had more than 1 minute duration on 1.4-M1. It should become faster, but the speed up coefficient could be different.

I have some question regarding K/N performance

  1. Is is possible to use K/N to work in embeded real-time systems like C++ does? If not why? is it possible to customize compiler, reduce some functionalities, to accomplish that?
  2. Is there some option in compiler to reduce the binary size, which seems to be far heavier then C++ binary?

The first question strongly depends on what do you call real-time systems and what are your requirements. Even modern JVM could be used in real time (non embedded though). I do not think it will ever be possible to do some hardware-specific optimizations in Kotlin-Native just because kotlin is basically safe language, but if you are not doing something really low-level, it should work for you.

Perhaps - it depends on your definition of realtime (https://en.wikipedia.org/wiki/Real-time_computing#Criteria_for_real-time_computing).

Kotlin is only suitable for soft real-time. Languages with fully automatic memory management are poorly suited for hard/firm real-time systems due to unpredictable latencies (especially Kotlin, where lack of value types will make things even worse).

Binary size is affected by linker options and enabled/disabled optimization.