⁠inline vararg

H4xX0r1337 · July 22, 2019, 8:35am

I made a small benchmark to check all ways to populate an arraylist. Those are the results (code below):

Test #1: 416 ms
Test #2: 380 ms
Test #3: 152 ms

Test #3 (manually adding the values to an arraylist) is almost three times faster than kotlins dedicated arrayListOf function.

The arrayListOf function could be improved like that:

inline fun <T> arrayListOf(inline vararg content: T): ArrayList<T> {
    return ArrayList<T>(content.size).apply { content.forEach { add(it) } }
}

The compiler can count the arguments fed into the function and determine content.size. The forEach statement will be compiled into 10 ArrayList.add statements. This will reduce overhead.

Benchmark:

import kotlin.system.measureTimeMillis

fun main() {
    val millis1 = measureTimeMillis {
        repeat(5000000) {
            arrayListOf(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
        }
    }
    println("Test #1: $millis1 ms")

    val millis2 = measureTimeMillis {
        repeat(5000000) {
            val list = ArrayList<Int>()
            list.add(1)
            list.add(2)
            list.add(3)
            list.add(4)
            list.add(5)
            list.add(6)
            list.add(7)
            list.add(8)
            list.add(9)
            list.add(10)
        }
    }
    println("Test #2: $millis2 ms")

    val millis3 = measureTimeMillis {
        repeat(5000000) {
            val list = ArrayList<Int>(10)
            list.add(1)
            list.add(2)
            list.add(3)
            list.add(4)
            list.add(5)
            list.add(6)
            list.add(7)
            list.add(8)
            list.add(9)
            list.add(10)
        }
    }
    println("Test #3: $millis3 ms")
}

fvasco · July 22, 2019, 9:02am

On which platform you get these results?

Are you considered to write a good benchmark using a test framework?

How differents solutions performs on different sizes?

H4xX0r1337 · July 22, 2019, 9:11am

I get this result on the JVM.
The Test #3 is always faster, but more so on smaller sizes. The difference with 20 elements is rather small.
I don’t know about the “good benchmark” yet. But since inlining would get rid of the if statement and get rid of the array overhead, it’s quite obvious.

fvasco · July 22, 2019, 9:48am

Ok, don’t worry.

Write a good test is hard, on JVM it is really hard.
I can read your test: “the first block is slower than last”, this is often true on JVM.

If you want to write a valuable test on JVM then you have to consider JMH.

See my comments here:

H4xX0r1337 · July 22, 2019, 9:59am

I’ll try JMH once I got some time. Will it do some warmup before the benchmark itself?

fvasco · July 22, 2019, 10:24am

Yes, it does.
You have to consider 8 minutes for each test, during you cannot use your PC.

H4xX0r1337 · July 22, 2019, 1:31pm

Done testing. Program: package de.unknownimport org.openjdk.jmh.annotations.Benchmarkimport java. - Pastebin.com
Output: # JMH version: 1.21# VM version: JDK 1.8.0_221, Java HotSpot(TM) 64-Bit Server - Pastebin.com

fvasco · July 22, 2019, 2:56pm

Great work @H4xX0r1337!

Are you sure? Proof it

Yes,
you have to test your own proposal, you missed it.

H4xX0r1337 · July 22, 2019, 3:21pm

I was talking about language design. My proposal was introducing inline varargs. Everything done with them would be compiled into multiple statements. It would result in the same compiled code like test #3, which is the most performant.

fvasco · July 22, 2019, 4:01pm

A loop unrolling performed at compile time, this will produce a larger code size, that can reduce speed on larger functions.

A macro can implement your suggestion, but it is unsupported.

H4xX0r1337 · July 22, 2019, 4:03pm

But that’s still better than creating an array with then get’s destroyed only to populate an ArrayList. Nobody will feed thousands of elements in arrayListOf.

Varia · July 22, 2019, 4:08pm

You have still yet to prove that unrolling is in fact faster using benchmarks. arrayListOf is implemented quite differently from your unrolled version, which means that your previous benchmark is not sufficent.

fvasco · July 22, 2019, 4:11pm

Yes, I agree.
It is true for small dataset.

H4xX0r1337 · July 22, 2019, 4:22pm

Did you even look at the benchmark?

Varia · July 22, 2019, 4:45pm

Yes. You are comparing

val list = ArrayList<Int>()
list.add(1)
list.add(2)
list.add(3)
list.add(4)
list.add(5)
list.add(6)
list.add(7)
list.add(8)
list.add(9)
list.add(10)

to

arrayListOf(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)

where

public fun <T> arrayListOf(vararg elements: T): ArrayList<T> =
    if (elements.size == 0) ArrayList() else ArrayList(ArrayAsCollection(elements, isVarargs = true))

private class ArrayAsCollection<T>(val values: Array<out T>, val isVarargs: Boolean) : Collection<T> {
    /* other members omitted... */
    public fun toArray(): Array<out Any?> = /* inline function expanding to: */
        if (isVarargs && values.javaClass == Array<Any?>::class.java)
            (values as Array<Any?>)
        else
            java.util.Arrays.copyOf(values, values.size, Array<Any?>::class.java)
}

and the following constructor from OpenJDK:

    public ArrayList(Collection<? extends E> c) {
        elementData = c.toArray();
        size = elementData.length;
        if (elementData.getClass() != Object[].class)
            elementData = Arrays.copyOf(elementData, size, Object[].class);
    }

In other words, you are comparing an implementation that first creates the ArrayList, and then inserts the elements to an implementation that creates an intermediate Collection backed by the varargs array, which is then passed to the constructor of ArrayList, which in turn unwraps the array and ends up using that as its backing array.

H4xX0r1337 · July 22, 2019, 4:49pm

This was only an example. What about other Collections not using a backed array? The benchmark results show that test #3 is the best one yet.

Varia · July 22, 2019, 5:10pm

Specifying the initial capacity is not really relevant to this discussion, so your benchmark essentially just compares two versions. And those two differ in so many ways that it is not safe to say that the difference in performance is caused by unrolling. A more reasonable test would be to compare you own implementation of arrayListOf to the unrolled version.

fvasco · July 22, 2019, 5:34pm

@Varia I made that test and added the Blackhole, but @H4xX0r1337’s proposal remains valid.

However it is not clear for me what is proposable.

fvasco · July 22, 2019, 8:15pm

Hi @H4xX0r1337,
I squash here some personal considerations, I hope this helps.

The ⁠inline vararg modifier is not required, an inline function can automatically inline parameters, adding ⁠inline modifier does not helps a lot.

I think that the whole discussion can be reduce to consider array as a constant and able the compiler to optimizing it.
If I wrote the code 1+2 the compiler can replace it with 3, similar if I write arrayOf(1)+arrayOf(2) the compiler should replace it with arrayOf(1, 2), it replace arrayOf(0,0,0).size with 3 and, finally, it unroll for loop on constant array.

I think that the above consideration should be enough for the compiler to perform all the requested optimization.

Currently Hotspot compiler is able to perform this optimization, maybe Graal does or will do it. In such case this proposal is not really interesting.

Another consideration is about performance: in a real world application the impact can be negligible, so the compilation phase take longer, application are larger and there is no difference with the unoptimized version.
So we need a use case with valuable results.
E.g.: performing this optimization in a typical IA library can boost the performance of 5%
But “this optimization reduce the listOf(1,2,3) cost of 1 microsecond” is not a good starting point.

fvasco · July 23, 2019, 9:25am

@darksnake do you have some consideration about this enhancement?

Topic		Replies	Views
Inline variadic functions Language Design	8	3291	July 13, 2018
Performance Problem	8	3820	January 17, 2018
Deep nested inline functions run slowly and can causes an error at compile	3	3453	January 20, 2016
Proposal for iterable array in inline functions Language Design	1	1092	May 13, 2018
Faster Collection.map{} function by 44% Language Design	8	1381	May 5, 2024

⁠inline vararg

Related topics