The compiler can count the arguments fed into the function and determine content.size. The forEach statement will be compiled into 10 ArrayList.add statements. This will reduce overhead.
I get this result on the JVM.
The Test #3 is always faster, but more so on smaller sizes. The difference with 20 elements is rather small.
I don’t know about the “good benchmark” yet. But since inlining would get rid of the if statement and get rid of the array overhead, it’s quite obvious.
I was talking about language design. My proposal was introducing inline varargs. Everything done with them would be compiled into multiple statements. It would result in the same compiled code like test #3, which is the most performant.
But that’s still better than creating an array with then get’s destroyed only to populate an ArrayList. Nobody will feed thousands of elements in arrayListOf.
You have still yet to prove that unrolling is in fact faster using benchmarks. arrayListOf is implemented quite differently from your unrolled version, which means that your previous benchmark is not sufficent.
val list = ArrayList<Int>()
list.add(1)
list.add(2)
list.add(3)
list.add(4)
list.add(5)
list.add(6)
list.add(7)
list.add(8)
list.add(9)
list.add(10)
to
arrayListOf(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
where
public fun <T> arrayListOf(vararg elements: T): ArrayList<T> =
if (elements.size == 0) ArrayList() else ArrayList(ArrayAsCollection(elements, isVarargs = true))
private class ArrayAsCollection<T>(val values: Array<out T>, val isVarargs: Boolean) : Collection<T> {
/* other members omitted... */
public fun toArray(): Array<out Any?> = /* inline function expanding to: */
if (isVarargs && values.javaClass == Array<Any?>::class.java)
(values as Array<Any?>)
else
java.util.Arrays.copyOf(values, values.size, Array<Any?>::class.java)
}
and the following constructor from OpenJDK:
public ArrayList(Collection<? extends E> c) {
elementData = c.toArray();
size = elementData.length;
if (elementData.getClass() != Object[].class)
elementData = Arrays.copyOf(elementData, size, Object[].class);
}
In other words, you are comparing an implementation that first creates the ArrayList, and then inserts the elements to an implementation that creates an intermediate Collection backed by the varargs array, which is then passed to the constructor of ArrayList, which in turn unwraps the array and ends up using that as its backing array.
Specifying the initial capacity is not really relevant to this discussion, so your benchmark essentially just compares two versions. And those two differ in so many ways that it is not safe to say that the difference in performance is caused by unrolling. A more reasonable test would be to compare you own implementation of arrayListOf to the unrolled version.
Hi @H4xX0r1337,
I squash here some personal considerations, I hope this helps.
The inline vararg modifier is not required, an inline function can automatically inline parameters, adding inline modifier does not helps a lot.
I think that the whole discussion can be reduce to consider array as a constant and able the compiler to optimizing it.
If I wrote the code 1+2 the compiler can replace it with 3, similar if I write arrayOf(1)+arrayOf(2) the compiler should replace it with arrayOf(1, 2), it replace arrayOf(0,0,0).size with 3 and, finally, it unroll for loop on constant array.
I think that the above consideration should be enough for the compiler to perform all the requested optimization.
Currently Hotspot compiler is able to perform this optimization, maybe Graal does or will do it. In such case this proposal is not really interesting.
Another consideration is about performance: in a real world application the impact can be negligible, so the compilation phase take longer, application are larger and there is no difference with the unoptimized version.
So we need a use case with valuable results.
E.g.: performing this optimization in a typical IA library can boost the performance of 5%
But “this optimization reduce the listOf(1,2,3) cost of 1 microsecond” is not a good starting point.