Inlining Arrays of Lambdas

derektom14 · June 27, 2020, 6:48am

Suppose you want to sort a list of objects by certain values in order:

data class Data(val a: Boolean, val b: Boolean, val c: Boolean, val d: Int)

The easiest way to do it is to just list the properties you want to sort by in lambdas:

val comparatorLambdas = compareBy<Data>({ it.a }, { it.b }, { it.c }, { it.d })

Alternatively, you could list the properties in a compareBy/thenBy chain:

val comparatorChain = compareBy<Data> { it.a }
            .thenBy { it.b }
            .thenBy { it.c }
            .thenBy { it.d }

Or, you could construct the comparator entirely manually:

private inline fun Int.ifDiff(onDiff: (Int) -> Unit) = if (this != 0) onDiff(this) else Unit

val comparatorManual = object : Comparator<Data> {
        override fun compare(o1: Data, o2: Data): Int {
            o1.a.compareTo(o2.a).ifDiff { return it }
            o1.b.compareTo(o2.b).ifDiff { return it }
            o1.c.compareTo(o2.c).ifDiff { return it }
            o1.d.compareTo(o2.d).let { return it }
        }
    }

How efficient is each comparator? Let’s look at a benchmark of using each one to sort an array of 100,000 elements:

Benchmark                                   (N)  Mode  Cnt   Score   Error  Units
ComparatorBenchmarkHelper.sortChain      100000  avgt   10  37.531 ± 1.532  ms/op
ComparatorBenchmarkHelper.sortManual  100000  avgt   10  26.763 ± 0.257  ms/op
ComparatorBenchmarkHelper.sortLambdas    100000  avgt   10  93.089 ± 3.965  ms/op

We see that sortLambdas is incredibly slow compared to sortChain and sortManual, with sortManual being clearly the fastest. Why? Because sortChain creates nested Comparators, leading to multiple virtual calls, and sortLambdas has an array of lambdas, leading to an iteration as well as multiple virtual calls.

This would be much improved if it were possible to inline a vararg of lambdas. Take the above compareBy call, which uses this method:

public fun <T> compareBy(vararg selectors: (T) -> Comparable<*>?): Comparator<T> {
    require(selectors.size > 0)
    return Comparator { a, b -> compareValuesByImpl(a, b, selectors) }
}

Inlining compareValuesByImpl, we would get:

public fun <T> compareBy(vararg selectors: (T) -> Comparable<*>?): Comparator<T> {
    require(selectors.size > 0)
    return Comparator { a, b ->
        for (fn in selectors) {
            val v1 = fn(a)
            val v2 = fn(b)
            val diff = compareValues(v1, v2)
            if (diff != 0) return diff
        }
        return 0
    }
}

If we were able to inline selectors, a compiler would be able to resolve the require at compilation time, and from there, it could unroll the loop and inline each lambda, effectively becoming comparatorManual (except that compareValues also does unnecessary null checks, which an inlining compiler could know to eliminate on non-null values).

(This is a bit similar to my prior topic on Sequences and Inlined Lambdas, and would be one of the optimizations that its concepts would create. This one happens to be more immediately apparent, as a Comparator can easily become a major part of code execution when sorting large lists, so it’s important that they be as efficient as possible, and existing vararg compareBy as simply intolerable performance.)

fvasco · June 27, 2020, 5:37pm

Hi, @derektom14,
it looks interesting to me, can you provide a reproducer using different data size?

Maybe it is possible to change the actual compareBy with something like:

fun <T> compareBy(vararg selectors: (T) -> Comparable<*>?): Comparator<T> =
        selectors.drop(1).fold(compareBy(selectors.first())) { c, s -> c.thenBy(s) }

What do you think?

derektom14 · June 27, 2020, 9:59pm

That method is effectively the same as comparatorChain, except that you’re no longer inlining the lambdas, so the performance is instead the same as comparatorLambdas:

Benchmark                                    (N)  Mode  Cnt   Score   Error  Units
ComparatorBenchmarkHelper.sortChain       100000  avgt   10  38.646 ± 2.322  ms/op
ComparatorBenchmarkHelper.sortEfficient   100000  avgt   10  28.239 ± 2.376  ms/op
ComparatorBenchmarkHelper.sortLambdas     100000  avgt   10  87.527 ± 4.053  ms/op
ComparatorBenchmarkHelper.sortLambdasAlt  100000  avgt   10  84.818 ± 4.888  ms/op

I can add different sizes, but I don’t expect it to tell much of a different story, as each run will still take roughly n log(n) comparisons, so all of the times should increase while maintaining the same relative values.

derektom14 · June 27, 2020, 10:24pm

Here are the benchmarks with varying values of N:

Benchmark                                   (N)  Mode  Cnt   Score    Error  Units
ComparatorBenchmarkHelper.sortChain        1000  avgt   10   0.223 ±  0.003  ms/op
ComparatorBenchmarkHelper.sortChain       10000  avgt   10   2.972 ±  0.035  ms/op
ComparatorBenchmarkHelper.sortChain      100000  avgt   10  38.040 ±  1.075  ms/op
ComparatorBenchmarkHelper.sortEfficient    1000  avgt   10   0.165 ±  0.004  ms/op
ComparatorBenchmarkHelper.sortEfficient   10000  avgt   10   2.176 ±  0.036  ms/op
ComparatorBenchmarkHelper.sortEfficient  100000  avgt   10  27.268 ±  0.215  ms/op
ComparatorBenchmarkHelper.sortLambdas      1000  avgt   10   0.470 ±  0.043  ms/op
ComparatorBenchmarkHelper.sortLambdas     10000  avgt   10   6.672 ±  0.736  ms/op
ComparatorBenchmarkHelper.sortLambdas    100000  avgt   10  99.819 ± 22.008  ms/op

The order remains clear.

fvasco · June 28, 2020, 7:07am

Hi, @derektom14,
did you understand why sortLambdasAlt is so different from sortChain.
What you effectively measured?
Can you share your code?

derektom14 · June 28, 2020, 8:26am

The reason is due to inlining. The chain relies on two methods, compareBy and thenBy:

@kotlin.internal.InlineOnly
public inline fun <T> compareBy(crossinline selector: (T) -> Comparable<*>?): Comparator<T> =
    Comparator { a, b -> compareValuesBy(a, b, selector) }

@kotlin.internal.InlineOnly
public inline fun <T> Comparator<T>.thenBy(crossinline selector: (T) -> Comparable<*>?): Comparator<T> =
    Comparator { a, b ->
        val previousCompare = this@thenBy.compare(a, b)
        if (previousCompare != 0) previousCompare else compareValuesBy(a, b, selector)
    }

When the lambdas are inlined, their invocations no longer require a virtual call.

However, in your alternative, you are using a vararg of lambdas, and as the topic of this thread points out, those can’t be inlined. Therefore, you end up creating multiple lambda objects, and for each step in the chain, a virtual call must be made by the selector for each parameter.

This exact phenomenon is covered in this Medium article.

Topic		Replies	Views
How to change the order of compareBy in Kotlin Android	3	7138	September 22, 2017
In-place sort for IntArray Support	8	3419	April 3, 2020
How are Lambdas compiled to bytecode Language Design	7	2203	October 15, 2019
In place sorting	2	1416	March 31, 2015
Sequences and Inlined Lambdas Language Design	25	4315	September 5, 2022

Inlining Arrays of Lambdas

Related topics