`for` statement performance improvement (Kotlin/JVM)

I don’t have much to contribute to the main idea itself, but specifically about the inlined code duplication issue, I feel like the solution here will come from bytecode optimizations in the Kotlin compiler because it can be smart enough to have the action in literally a go-to block and have both the fast and slow implementation just go-to that block. Also, remember that most applications that are actually released used some form of minifier, and so I wouldn’t be quick to focus too much on the code duplication issue since a minifier could and should be able to notice the same chunk of code duplicated twice and use the go-to solution for that (in fact I bet most already do). I’d suggest maybe make a small sample and compile it then run it through Proguard or an equivalent and decompile the minified version.

Regardless, there’s still a little issue with this puzzle, which is that the inline block duplication will slow down compilation speed, but there’s many-a-ways for that to be solved. Firstly, of course, the compiler can be smart enough to do the go-to solution. But even if not, there can be cases where it is statically-known at compile-time whether or not something is RandomAccess (like in your example ArrayList would be statically-known to implement RandomAccess) and so it then follows that the whole else branch in the inlined fastForEach2 can just be statically-removed.

So yeah just simply minifiers are really trained to handle cases like this and so really the “bigger” concern here would be looking at quirks like CopyOnWriteArrayList and how to handle those

1 Like

As for this implementation, it can actually be rewritten to valid and semantically-equal Kotlin code (this can be cleaned up to look much nicer but I’m too lazy to look up the stdlib functions for nice index manipulation and stuff):

import java.util.*
fun main(){
    arrayListOf("hello", "world", "here").fastForEach(::println)
    linkedListOf("hello", "world", "here").fastForEach(::println)
}

fun <T> linkedListOf(vararg elements: T): LinkedList<T> = LinkedList<T>().apply {
    for(element in elements){
        add(element)
    }
}

inline fun <T> List<out T>.fastForEach(action: (T) -> Unit) {

    val size = size

    if (size < 1) return

    val iterator: Iterator<T>?
    var item: T
    var index = -1

    when (this) {
        is ArrayList<*> -> item = this[++index].also {
            iterator = null
            // Debug printing just to see what really happens
            println(index)
        }
        else -> {
            iterator = iterator();

            if (!iterator.hasNext()) return

            // Debug printing just to see what really happens
            println("used iterator")
            item = iterator.next()
        }
    }

    do { action(item) }
	while(when (iterator) {
        null -> {
            if (index < size - 1) {
                item = this[++index]
                // Debug printing just to see what really happens
                println(index)
                true
            } else false
        }
        else -> {
            if (iterator.hasNext()) {
                // Debug printing just to see what really happens
                println("used iterator")
                item = iterator.next()
                true
            } else false
        }
    })
}

and it only inlines that lambda once, and so it works like a charm. Of course, it will also inline a couple lines of ArrayList code that won’t be used for a LinkedList (and vice versa) but those are negligible and probably can’t be avoided anyways.

And remember kids: Go-to is never the answer!

How about this?

inline fun <T> List<T>.fastForEach(action: (T) -> Unit)
{
	val size = size
	
	if (size < 1) return
	
	val iterator = when (this)
	{
		is RandomAccess -> null
		else            -> iterator()
	}
	
	var i = -1
	
	do
	{
		val cur = when (iterator)
		{
			null ->
				if (++i < size)
					this[i]
				else break
			else ->
				if (iterator.hasNext())
					iterator.next()
				else break
		}
		
		action(cur)
	}
	while (true)
}

This avoids retrieving the next value in two places

2 Likes

Well, I understand that “goto’s are never the answer.” I appreciate that Kotlin doesn’t support them. I thought of the solution I posted as JVM bytecode and I represented it as Kotlin code — I haven’t taken time to reason about that and find other solutions (I’ve had other things to do today).
Thanks for your contribution, it looks like a valid possibility.

1 Like

I appreciate your point of view. Personally, I usually prefer to avoid relying on optimizations that might happen but that are not guaranteed. This time, I think I can get to agree to let minifiers (and likely also the jitter) do the job for us — today, long method blocks are discouraged, and so even shorter will likely be our for–each blocks; therefore code size will likely be little of an issue anyway. And indeed, as you suggested, duplication will probably go away. I’m not very worried about compilation speed, either — mainly because of blocks being short.
Regarding CopyOnWriteArrayList, we probably don’t need to cover that — I think we may choose to limit this solution to ArrayLists only, as that’s likely the most common collection. CopyOnWriteArrayList is already less efficient on writing, so we may settle with it not being as efficient as possible when iterating.

1 Like

That looks nice. That’ll likely be shorter in the generated bytecode, which is better.

1 Like

Honestly, I didn’t expect this much contribution to this topic. I thought that almost no one cared about this, but I myself try to avoid creating more instances than needed, so I figured out we should get rid of iterators in the most common cases.

Thank you all

1 Like

Hmm… am I the only one here who don’t really see how this “goto thing” solves the problem of code duplication? :smiley: The problem was that the function is inlined into every place where Iterable.forEach() is used. Which means quite a lot of places. And in most cases half of this inlined code is a dead code. This goto solution looks like even more code to inline, so how does it solve the problem?

Also, I don’t think any minifiers, proguards or other compile-time optimizers could remove this redundant code. This is only possible at runtime or with very, very complicated static analysis.

But honestly, I don’t know if this is at all a problem. Usually, we prefer performance over the code size and it could be controlled with a compiler flag.

edit:
On the other hand, in most cases the difference in performance between using iterators or not using them is really negligible.

1 Like

It’s more about the lambda itself that is being inlined. The instructions to handle a while loop are quite easily to represent in bytecode, however, copying the block into each invoke call for it is what can result in duplication. For example, this:

inline fun test(block: () -> Unit) {
    block()
    block()
}

vs this:

inline fun test(block: () -> Unit) {
    repeat(2) {
        block()
    }
}

the 1st will, with a big enough block lambda, have bigger byte-code than the 2nd, which only inlines the parameter once and places it in (what is inlined to) a for loop. minifiers and proguard can remove this repeated code (as in specifically the inlined lambda itself, not the iteration logic) but also it can be minimized by making sure it inlines only once

1 Like

Only now I’ve realized that your latest post was a reply to @broot’s.
Indeed, you’re correct. It looks like I had mistaken @broot and turned their issue to another one that was probably going to be even more important.
What @broot is telling makes sense, but I think the solution we’ve currently come up with is short enough that code size won’t be much of an issue.
In many cases, especially those where performance actually matter, the for–each block and any branch involved (including methods being called) will likely be way bigger than the few JVM instructions that the solution you suggested will translate to.

1 Like

Indeed, just call them jumps, and they magically go away :wink:

1 Like

Ahh, right, I totally missed the fact that the code of the lambda will be inlined twice :man_facepalming:

1 Like