`for` statement performance improvement (Kotlin/JVM)

davidecannizzo · August 24, 2021, 3:25pm

While using Jetpack Compose, I noticed a fastForEach extension defined for some random–access collections to avoid the overhead of creating a new iterator.
At this point, I wonder if a change could be made to the Kotlin language so that tricks like those are not needed.
If the forEach extension method that is currently defined for Iterable<T> were an operator that for loops used, when available, in place of Iterable<T>.iterator(), a List<T>.forEach extension of such operator could use indexes.

This is how it’d look like:

inline operator fun <T> List<T>.forEach(action: (T) -> Unit) {
    for (i in indices) action(get(i))
}

fun foo() {

    val myList = listOf("a", "b", "c")

    // The expression that we're iterating is `myList`, whose type is
    // `List<*>`, so the compiler would choose the `List<T>.forEach`
    // operator defined above instead of creating an iterator.
    for (item in myList)
        println(item)
}

The same would apply to Arrays as well. We all know how much overhead creating objects does cause (more pressure to the GC, more memory being used, CPU time wasted creating and initializing the object, and so forth) — such design would allow to improve the performance of the for loop and the forEach extension for types that support random access. The jitter might optimize things while running even if using iterators, but this proposal would be trivial to implement and is an opportunity to emit more efficient bytecode in the first place.

Admittedly, there are some aspects that need to be defined, such as the forEach behavior when the list/array is modified while being iterated. However, I think this solution is worth being considered and worked on to define it better.

For backward–compatibility, a forEach method that is not defined with the operator keyword is allowed but ignored by for statements. When there is no forEach operator available for the type being iterated, the type is required to implement Iterable<T> and the compiler reverts to using Iterable<T>.iterator().

Regarding to targets other than the JVM, I’m not currently aware of how things work in Kotlin/Native, but I can definitely speak for Kotlin/JS — the forEach operators for List<T> and Array<T> would be expect definitions and each platforms would implement them the best. In the JVM, it would look like I’ve described, while in JS they would be defined as inline intrinsics and the compiler would emit for..of JS statements directly.

broot · August 24, 2021, 4:49pm

My two cents on this idea:

It would require to run some benchmarks first, because optimizations in JVM are really awesome and it is possible that in practice these iterator objects are not at all created. Still, it could make sense to optimize the bytecode for use with environments where optimizations may be not so advanced (Android? GraalVM?).
It is really a terrible idea to provide such forEach() override for the List interface. List provides random access, but it does not guarantee any performance on such access. It would make iterating over e.g. LinkedList much slower. I mean like tens or hundreds times slower (actually, it would become quadratic instead of linear). forEach() override could be provided for ArrayList, but usually we type properties/locals as just List, so such override would be rarely used. Another solution is to create extension on List, but check for ArrayList on runtime, but that would inline two separate implementations everywhere. Alternatively, such forEach() could be provided with another name to be chosen explicitly by the developer - similar to how Jetpack Compose did this.
Using indices in your example is really a bad idea. It creates another object for iterating (IntRange), probably even heavier than an iterator

But generally, I see your point. I’m just not sure if there is a good solution to this problem. I guess the main reason why Java/Kotlin decided to use iterators in the first place is because the list itself knows the best, how to iterate over it efficiently. Providing separate implementations of forEach() for different types of lists seems a little like a step back.

chipays · August 24, 2021, 6:13pm

Article related to your suggestion: Benchmarking Kotlin Collections api performance

If avoiding the iterator performance generally results in faster code, why use an iterator at all? The key lies in the type of backing list. The previous benchmarks were performed on ArrayLists. However swithing the backing list to a LinkedList paints a very different picture.

The fast versions without the Iterators are much slower than their standard counterparts. This is because collections without random access generally perform better when traversed using iterators. With an understanding of the predictable performance penalty it’s best to use an iterator when you are unsure of what type of List you will be operating on.

…

These seem to have been included as part of a larger optimization effort aimed at reducing measurement performance. Exploring the larger context gives us more insight into the justification here. As these were implemented in low level parts of the compose SDK some places which may be called multiple times a second and avoiding extra GC cycles would help maintain good performance. Which means that a potential next step for these benchmarks could be to gather data on the relative frequency of GC operations triggered by the respective runtimes due to the higher amount of allocations in the stdlib version vs. their “fast” counterparts.

davidecannizzo · August 24, 2021, 9:01pm

I understand your concerns.

In my opinion, the Kotlin compiler shouldn’t emit inefficient bytecode hoping for possible optimizations by the JVM magical jitter. Also, it should be noted that when apps first launch, they’re not jitted yet. Startup time and initial efficiency of apps would benefit from optimizations at the bytecode level.
Actually, I initially thought about forEach on LinkedLists being slow, and indeed I’ve said that random access collections only would have to use my proposal. My snippet wasn’t meant to be complete enough, and only intended to depict a general idea that might be worthwhile to experiment on and refine. Multiple dispatch would solve the LinkedList issue, but Kotlin isn’t a functional language
I thought that the indices property would have been optimized away by the Kotlin compiler and replaced with a normal (Java–like) for loop in the generated code. If an IntRange object is created, instead — well, that would be one more optimization that should be discussed on.

broot · August 24, 2021, 9:07pm

Ad.3.
Actually, it seems you are right. It is optimized to classic for loop with a counter.

davidecannizzo · August 24, 2021, 9:08pm

I can understand the point about premature optimization made in the article you mentioned, which interestingly discussed the same topic I opened up here. However, if some small optimizations like these can be made in the Kotlin language, anyone would benefit, whether in every specific case those are useful or premature.
Actually, I initially thought about LinkedLists being an issue, but I kind of mixed things up with C# and forgot that LinkedList also implements List. However, I believe we could come up with a solution for this. My snippet wasn’t meant to make it into a Kotlin release

davidecannizzo · August 24, 2021, 9:10pm

Well, I seemed to recall that from some time ago when I happened to wonder about optimizations made on ranges. Hopefully, it looks like the Kotlin team had already thought to optimize ranges away in some cases.

davidecannizzo · August 24, 2021, 10:19pm

Here’s what I’ve currently come up with for arrays, which is trivial:

inline fun <T> Array<T>.fastForEach(action: (T) -> Unit) {
    for (i in indices) action(get(i))
}

Obviously, as per my proposal that method would be named forEach and would be defined as an operator.

As to lists and the issue you mentioned when providing different implementations for lists that support random access and those that do not, I was trying to figure out a solution. If the forEach method were defined in the List<T> interface (as opposed to defining an extension) and a default implementation provided that just uses iterator(), it’d be implementers’ responsibility to override it for better performances, when possible. And ArrayList, rather than being a typealias to Java’s implementation, it could be a subclass of the latter that additionally overrides the forEach method, like follows:

class FastArrayList<T> : ArrayList<T>() {

    // This method would be named `forEach` and would
    // override its definition from `List<T>`.
    inline fun fastForEach(action: (T) -> Unit) {
        for (i in indices) action(get(i))
    }
}

This is how the newly–defined List<T>.forEach method would look like:

interface List<T> {
    ...

    inline operator fun forEach(action: (T) -> Unit) {
        iterator().forEach(action)
    }

    ...
}

So, if Kotlin doesn’t support multiple dispatch, we could at least take advantage from polymorphism… But, we could no longer inline methods, which is why the above snippets won’t compile anyway.

While I’m putting some research effort into this topic, it seems more difficult than I wanted to believe to come up with a reliable implementation that doesn’t lead to doubling the loop block in two separate implementations.

stachenov · August 31, 2021, 4:39am

Java has RandomAccess marker interface for cases like this, so there’s no problem to distinguish between slow and fast indexed access lists at least in the JVM.

That being said, however, there are more reasons why it’s not a generally good idea.

One example that immediately comes to mind is CopyOnWriteArrayList. It’s simply unsafe to use index-based iteration on it, just like on any other concurrent collection. Its iterators, however, guarantee to never throw.

Instead of improving the language itself, I think it would be better to simply optimize away iterators in cases where it’s possible.

madmax1028 · August 31, 2021, 8:59am

To achive a better iteration performance you could use Java’s Iterable.forEach(Consumer<E>) method. Some collection types have a specialized version of this method implemented. For example the ArrayList’s implementation doesn’t use iterators at all.

The only thing to note is that you need to tell compiler to use Consumer<E> parameter instead of (E) -> Unit:

// Kotlin's own inline function
ArrayList<Any>().forEach(::println)

// overridden version of Java's Iterable.forEach()
ArrayList<Any>().forEach(Consumer(::println))

// just uses a default method implementation based on iterator
LinkedList<Any>().forEach(Consumer(::println))

davidecannizzo · August 31, 2021, 11:09am

@madmax1028 Why should that achieve better iteration performance?
Looking at the OpenJDK implementation of ArrayList, I can’t find any evidence why it should be more efficient — it just does the same things that iterators do (the only difference is accessing the list size and a reference go elementData within the stack, but this is going to become irrelevant when code gets jitted, and should still be insignificant when interpreting).
Also, the reason why to avoid using iterators is to avoid instantiating one. But Iterable.forEach needs an instance of the Consumer interface (Java doesn’t support Kotlin inlining), which puts us right back where we started.
Furthermore, it leads to limited flow control and less optimization by the jitter (I can’t find any guarantee for which the jitter would treat it specially).

davidecannizzo · August 31, 2021, 11:21am

I know about that. The problem is, however, we cannot rely on the RandomAccess interface statically because in Kotlin it is common to have references typed List<T> although they actually reference an ArrayList<T>. So, there’s no compile–time way to set RandomAccess lists apart from others. And doing that at runtime would lead to inlining the same block twice, which I was already discussing with @broot.
Regarding your concerns as to concurrent lists like CooyOnWriteArrayList, it’d be best to limit the random access iteration to ArrayLists only, I think.
There is also the case I had already mentioned to be worked on, i.e. the currently undefined behavior we’d have when accessing the same list concurrently.

madmax1028 · August 31, 2021, 11:32am

If you want to go that way then how about something like this:

fun main() {
	ArrayList<Any>().fastForEach(::println)
//	LinkedList<Any>().fastForEach(::println) // doesn't compile
	
	ArrayList<Any>().fastForEach2(::println)
	LinkedList<Any>().fastForEach2(::println)
}

inline fun <LIST, ELEMENT> LIST.fastForEach(action: (ELEMENT) -> Unit) where LIST : List<ELEMENT>, LIST : RandomAccess {
	for (i in indices) {
		action(this[i])
	}
}

inline fun <ELEMENT> List<ELEMENT>.fastForEach2(action: (ELEMENT) -> Unit) {
	when (this) {
		is RandomAccess -> fastForEach(action)
		else -> forEach(action)
	}
}

davidecannizzo · August 31, 2021, 11:38am

@madmax1028 fastForEach2 would lead to duplicated block when inlined. I’ve already been discussing this with @broot and we couldn’t find any solution yet.

fastForEach is fine, though as you know it’s limited to the cases when the static type system can make sure that the receiver implements RandomAccess, which are rare cases.

As a suggestion, though, you could have called those two methods the same and have put to one of them a @JvmName annotation specifying a different name, if I’m not wrong. But this is only going to solve the naming mess, not the duplicated block issue.

madmax1028 · August 31, 2021, 11:40am

Yeah, that’s even better. The compiler will choose a more suited method by itself then.

davidecannizzo · August 31, 2021, 11:43am

…though we still have the duplicated block issue

davidecannizzo · August 31, 2021, 12:07pm

I was thinking that, if we can’t come up with a better solution that avoids duplicating the block, the Kotlin standard library could at least provide an additional List<T>.fastForEach method so developers can choose on their own. Most likely, a better name could be chosen, like randomAccessForEach, or possibly a shorter one.
When it comes to arrays, instead, the Array<T>.forEach method may already be improved to use indices rather than an iterator, if it wasn’t for the concurrent access issue I once mentioned, for which it’s likely the case to provide an additional method for arrays as well.

broot · August 31, 2021, 12:35pm

Concurrency shouldn’t be a problem in the case of arrays. But current implementation already does what you suggested - it does not use iterators, but indexes.

davidecannizzo · August 31, 2021, 1:30pm

I’ve just looked at the implementation of the Array<out T>.forEach method but it looks like it is using a normal for–each.
Since you said Kotlin already uses indices to iterate over arrays, I’ve investigated further, and indeed it turns out you’re right — the compiler actually translates array for–each statements to for statements with indices.
My proposal to make the forEach method an operator would standardize such behavior and allow that kind of optimization that is currently done by the compiler to be coded in the Kotlin standard library as:

inline fun <T> Array<out T>.forEach(action: (T) -> Unit) {
    for (index in indices) action(this[index])
}

davidecannizzo · August 31, 2021, 1:48pm

I was thinking that, in order to solve the duplicated block issue that we would currently have in the List<T>.fastForEach implementation, goto’s can help us. If Kotlin had goto statements,

inline fun <T> List<out T>.fastForEach(action: (T) -> Unit) {

    val size = size

    if (size < 1) return

    val iterator: Iterator<out T>?
    var item: T
    var index = 0

    when (this) {
        is ArrayList<*> -> {
            item = this[0]
            iterator = null
        }
        else -> {
            iterator = iterator();

            if (!iterator.hasNext()) return

            item = iterator.next()
        }
    }

    next@ action(item)

    when (iterator) {
        null -> {
            if (index < size) {
                item = this[++index]
                goto next
            }
        }
        else -> {
            if (iterator.hasNext()) {
                item = iterator.next()
                goto next
            }
        }
    }
}

Of course, since Kotlin doesn’t support goto statements, such an implementation would need to be an intrinsic that the compiler implements directly as JVM bytecode (which supports goto’s).
I’m not sure if that’s going to be faster, though. It may be worth experimenting.

Topic		Replies	Views
Operator fun iterator Language Design	4	1827	September 20, 2018
Iterating with: operator fun <T> Iterable<T>.invoke(f: T.(Int) -> Unit) Language Design	6	1423	April 22, 2018
forEachApply Language Design	2	1521	March 8, 2018
For-each on Iterator? Language Design	2	3186	January 3, 2018
For vs forEach	18	26168	October 17, 2018

`for` statement performance improvement (Kotlin/JVM)

Related topics