Deep nested inline functions run slowly and can causes an error at compile


#1

Repro is synthetic, but I think demonstative enough. See below for explanation.

inline fun stopwatch (f: () -> Unit): Long{
    val startTime = System.nanoTime()
    f()
    val estimatedTime = System.nanoTime() - startTime
    println("- exec time ${estimatedTime.toDouble() / 1000 / 1000 / 1000} sec")
    return estimatedTime
}

fun main(args: Array<String>) {

    val t1 = Test()
    stopwatch{t1.test()}
    println("${t1.i}")

    val t2 = Test()
    stopwatch{t2.test()}
    println("${t2.i}")

    val t3 = Test10Inline()
    stopwatch{t3.test()}
    println("${t3.i}")

    val t4 = Test10Inline()
    stopwatch{t4.test()}
    println("${t4.i}")

    val t5 = TestManyInline()
    stopwatch{t5.test()}
    println("${t5.i}")

    val t6 = TestManyInline()
    stopwatch{t6.test()}
    println("${t6.i}")
}

// clear case - everything is OK
class Test{
    var i = 0L
    private fun f00() {i++}
    private fun f01() {f00();f00()}
    private fun f02() {f01();f01()}
    private fun f03() {f02();f02()}
    private fun f04() {f03();f03()}
    private fun f05() {f04();f04()}
    private fun f06() {f05();f05()}
    private fun f07() {f06();f06()}
    private fun f08() {f07();f07()}
    private fun f09() {f08();f08()}
    private fun f10() {f09();f09()}

    private fun f11() {f10();f10()}
    private fun f12() {f11();f11()}
    private fun f13() {f12();f12()}
    private fun f14() {f13();f13()}
    private fun f15() {f14();f14()}
    private fun f16() {f15();f15()}
    private fun f17() {f16();f16()}
    private fun f18() {f17();f17()}
    private fun f19() {f18();f18()}
    private fun f20() {f19();f19()}
    fun test() {
        f20();
    }
}

// works but too long - about
class Test10Inline{
    var i = 0L
    private inline fun f00() {i++}
    private inline fun f01() {f00();f00()}
    private inline fun f02() {f01();f01()}
    private inline fun f03() {f02();f02()}
    private inline fun f04() {f03();f03()}
    private inline fun f05() {f04();f04()}
    private inline fun f06() {f05();f05()}
    private inline fun f07() {f06();f06()}
    private inline fun f08() {f07();f07()}
    private inline fun f09() {f08();f08()}
    private inline fun f10() {f09();f09()}

    private fun f11() {f10();f10()}
    private fun f12() {f11();f11()}
    private fun f13() {f12();f12()}
    private fun f14() {f13();f13()}
    private fun f15() {f14();f14()}
    private fun f16() {f15();f15()}
    private fun f17() {f16();f16()}
    private fun f18() {f17();f17()}
    private fun f19() {f18();f18()}
    private fun f20() {f19();f19()}
    fun test() {
        f20();
    }
}

// compilation failed
class TestManyInline{
    var i = 0L
    inline fun f00() {i++}
    inline fun f01() {f00();f00()}
    inline fun f02() {f01();f01()}
    inline fun f03() {f02();f02()}
    inline fun f04() {f03();f03()}
    inline fun f05() {f04();f04()}
    inline fun f06() {f05();f05()}
    inline fun f07() {f06();f06()}
    inline fun f08() {f07();f07()}
    inline fun f09() {f08();f08()}
    inline fun f10() {f09();f09()}

    inline fun f11() {f10();f10()}
    inline fun f12() {f11();f11()}
    fun f13() {f12();f12()}
    fun f14() {f13();f13()}
    fun f15() {f14();f14()}
    fun f16() {f15();f15()}
    fun f17() {f16();f16()}
    fun f18() {f17();f17()}
    fun f19() {f18();f18()}
    fun f20() {f19();f19()}
    fun test() {
        f20();
    }
}

What I do
In all thee cases we call function f20 that do 2^20 increments of i. In the first case we do it without kotlin inline, in the second we inline 10 inner levels, the third we try to inline more levels. Every test runs twice. stopwatch is naive realization of stopwatch to show execution time.

What I expected

  1. Results must be the same and all cases executes without errors
  2. Compilation time can grow with inlining.
  3. Execution time is roughly the same (does not differ more than 2-3 times).

What I get

  1. Class TestManyInline does not compile with runtime error [Internal Error] org.jetbrains.kotlin.codegen.CompilationException: Back-end (JVM) Internal error: wrong code generated java.lang.StackOverflowError. If you don’t try to inline some more.
  2. If we remove TestManyInline we can see that execution of Test.test() 10-100 times faster than Test10Inline.test().
  3. Compilation only part with Test.test() 3-4 times faster than only Test10Inline.test()
  4. Results are the same.
  5. We see warnings on inline functions (that is expected): Expected performance impact of inlining 'private final inline fun f00(): Unit defined in Test10Inline' can be insignificant. Inlining works best for functions with lambda parameters

Example of output for Test.test() and Test10Inline.test():

- exec time 0.004929329 sec
1048576
- exec time 8.11111E-4 sec
1048576
- exec time 0.06781020299999999 sec
1048576
- exec time 0.056903366999999996 sec
1048576

Environment:

  • win10-64
  • IDEA Community IC-143.1653.2
  • Kotlin JPS plugin version 1.0.0-beta-3595-IJ143-32
  • Kotlin Compiler version 1.0.0-beta-3595
  • JRE 1.8.0_66b18

Question:
Is it a bug or expected behaviour?


#2

Such structure of inlining results in exponential method size grow,
you can check that:

Test.class size is 2.9 kb
Test10Inline.class is 326kb.

In second case runtime jit compiler can’t perform further optimizations cause of big method size.

Stack overflow exception could be bypassed by increasing thread stack size in compiler options.
But already ‘f12’ function will exceed java limit on method size (65536 bytes).

Non-exponential case works as expected:


inline fun stopwatch (f: () -> Unit): Long{
    val startTime = System.nanoTime()
    f()
    val estimatedTime = System.nanoTime() - startTime
    println("- exec time ${estimatedTime.toDouble() / 1000 / 1000 / 1000} sec")
    return estimatedTime
}

fun main(args: Array<String>) {
    val t1 = Test()
    stopwatch{t1.test()}
    println("${t1.i}")

    val t2 = Test()
    stopwatch{t2.test()}
    println("${t2.i}")

    val t3 = TestManyInline()
    stopwatch{t3.test()}
    println("${t3.i}")

    val t4 = TestManyInline()
    stopwatch{t4.test()}
    println("${t4.i}")

}


// clear case - everything is OK
class Test{
    var i = 0L
    private fun f00() {i++}
    private fun f01() {f00();}
    private fun f02() {f01();}
    private fun f03() {f02();}
    private fun f04() {f03();}
    private fun f05() {f04();}
    private fun f06() {f05();}
    private fun f07() {f06();}
    private fun f08() {f07();}
    private fun f09() {f08();}
    private fun f10() {f09();}

    private fun f11() {f10();}
    private fun f12() {f11();}
    private fun f13() {f12();}
    private fun f14() {f13();}
    private fun f15() {f14();}
    private fun f16() {f15();}
    private fun f17() {f16();}
    private fun f18() {f17();}
    private fun f19() {f18();}
    private fun f20() {f19();}
    fun test() {
        f20();
    }
}

class TestManyInline {
    var i = 0L
    private inline fun f00() {i++}
    private inline fun f01() {f00();}
    private inline fun f02() {f01();}
    private inline fun f03() {f02();}
    private inline fun f04() {f03();}
    private inline fun f05() {f04();}
    private inline fun f06() {f05();}
    private inline fun f07() {f06();}
    private inline fun f08() {f07();}
    private inline fun f09() {f08();}
    private inline fun f10() {f09();}

    private inline fun f11() {f10();}
    private inline fun f12() {f11();}
    private inline fun f13() {f12();}
    private inline fun f14() {f13();}
    private inline fun f15() {f14();}
    private inline fun f16() {f15();}
    private inline fun f17() {f16();}
    private inline fun f18() {f17();}
    private inline fun f19() {f18();}
    private inline fun f20() {f19();}
    fun test() {
        f20();
    }
}
- exec time 3.1897699999999996E-4 sec
1
- exec time 3.28E-6 sec
1
- exec time 8.43E-6 sec
1
- exec time 6.44E-7 sec
1

#3

The microbenchmarking mantra: you should always use JMH for benchmarking performance. Microbenchmarks are inherently very hard to get right, and without JMH are nearly impossible to get right


#4

Thank you for the imortant note. Agree with every word. I used JMH and VisualJVM to find exact slow place (it is not so fatal in real code) and after that placed this simple code to make code indepenndent of any other libs.

This is the essence of repro. I find ineffetive place in my code and write synthetic excessive repro. More short repro can look like this:

inline fun foo(f: () -> Unit) {
    f()
    f()
}
fun testS(){
    var i = 0L
    foo{foo{foo{foo{foo{foo{foo{foo{i++}}}}}}}} //add foos to drop compiler
    println("$i")
}

note that foo() can be written in way that does not mean two necessory f() executions. They can be inlined but not executed.

Anyway, it is not fine when a compiler fall with stack trace. I expected that compiler sould show me info/warning/error and compile code (if possible).
My mistake was that I thought inline is recomendation (as in some other languages), but it seems that Kotlin forces it as can. If so, then compiler IMHO should get meaningful error/warning message. And the best way for me is warning message without inlining in byte code when inlining is dangerous or very slow.