Faster Collection.map{} function by 44%

Hi,
default implementation of kotlin map function uses MutableCollection<>.add() method for building of result list. The add method has unnecessary overhead (checking of capacity size, update of size of list,…)

My idea is to use array as target and then convert that into list.
I tried two implementations - one iterator based and one index based and gained 5-15% performance. It is not game changer, but map is quite frequently used function, so it could have real-world impact.

What do you think about this implementation of map functions?

Test code based on GitHub - Kotlin/kotlinx-benchmark: Kotlin multiplatform benchmarking toolkit

package test

import org.openjdk.jmh.annotations.*
import java.util.concurrent.TimeUnit

@State(Scope.Benchmark)
@Fork(1)
@Warmup(iterations = 10)
@Measurement(iterations = 10, time = 1, timeUnit = TimeUnit.SECONDS)
class TestBenchmark {
    private lateinit var data: List<String>

    @Setup
    fun setUp() {
        data = List(100) { it.toString() }
    }

    @Benchmark
    fun mapDefault() {
        data.map { it + it }
    }

    @Benchmark
    fun mapArray1() {
        data.mapArray1 { it + it }
    }

    @Benchmark
    fun mapArray2() {
        data.mapArray2 { it + it }
    }
}

inline fun <T, reified R> List<T>.mapArray1(transform: (T) -> R): List<R> {
    val result = arrayOfNulls<R?>(size)
    for (i in indices)
        result[i] = transform(this[i])
    return result.asList() as List<R>
}


inline fun <T, reified R> Collection<T>.mapArray2(transform: (T) -> R): List<R> {
    val iterator = iterator()
    return Array(size) { transform(iterator.next()) }.asList()
}

mapDefault: 1277464 ops/s - baseline
mapArray2: 1342551 ops/s + ~ 5% ops/s
mapArray1: 1480874 ops/s + ~ 15% ops/s

1 Like

This is interesting, I’ll look into it when I have some free time. Some comments for now:

  1. it + it may a pretty heavy operation compared to what we try to benchmark. My intuition says it should take much more than the mapping itself. I suggest using ints, not strings.
  2. You should either return the results of map or push it into the blackhole. Otherwise, JVM may entirely remove the map() and then you benchmark an empty function.
  3. Arrays also do size checks. If using lists, theoretically we do 2 such checks vs 1, but collections are most probably highly optimized in JVMs, so I wouldn’t assume what we see in the source code is what we run. It is good you looked for benchmarks.

But generally, I think the idea makes sense.

1 Like

Thanks broot, you are right. Now improvement is even more significant

Performance of mapArray2 is worse than default,
default
mapDefault 3928969 - baseline
mapArray1 5659579 ~ +44%
mapArray2 3279785 ~ -17%

package test

import org.openjdk.jmh.annotations.*
import org.openjdk.jmh.infra.Blackhole
import java.util.concurrent.TimeUnit

@State(Scope.Benchmark)
@Fork(1)
@Warmup(iterations = 10)
@Measurement(iterations = 10, time = 1, timeUnit = TimeUnit.SECONDS)
class TestBenchmark {
    private lateinit var data: List<Int>

    @Setup
    fun setUp() {
        data = List(100) { it }
    }

    @Benchmark
    fun mapDefault(blackhole: Blackhole) {
        blackhole.consume(data.map { it + it })
    }

    @Benchmark
    fun mapArray1(blackhole: Blackhole) {
        blackhole.consume(data.mapArray1 { it + it })
    }

    @Benchmark
    fun mapArray2(blackhole: Blackhole) {
        blackhole.consume(data.mapArray2 { it + it })
    }
}

inline fun <T, reified R> List<T>.mapArray1(transform: (T) -> R): List<R> {
    val result = arrayOfNulls<R?>(size)
    for (i in indices)
        result[i] = transform(this[i])
    return result.asList() as List<R>
}


inline fun <T, reified R> Collection<T>.mapArray2(transform: (T) -> R): List<R> {
    val iterator = iterator()
    return Array(size) { transform(iterator.next()) }.asList()
}

@ybznek
please consider that not all List are instance of RandomAccess.

3 Likes

What is generally better is using a sequence because it isn’t building a collection and just lazily handling one item at a time

1 Like

I wouldn’t say sequences are “generally better”. It is like N*M vs M*N. Of course, at least theoretically we can do it in less CPU cycles and less allocations, but in practice I believe sequences are generally slower as of Kotlin 1.9. Maybe they could be improved in the future, but we are not there yet.

The thrust of the original question was trying to optimize how the allocation is done to speed up the map. If you do not actually need the collection it will generally be “better” because none of that allocation is ever done.

What is the source of this belief? I see no way that they are “generally slower” and certainly not slower than mapping from one collection into another collection.

I think the main point of the question is not to optimize allocation or avoid using collections. Point is to use the lower level array for the majority of the workload and then only wrap it into a collection as the last step - instead of working through the wrapper all the time.

I read such claims a few times in the internet, with some benchmarks to prove it. I never tried to test this myself, but the explanation actually makes a lot of sense. Sequences add overhead, they are complicated and can’t be inlined. Loops are obviously simple to the JVM and can be easily optimized. If Kotlin compiler will be ever able to fully inline sequences to produce a single loop with just operations inside, that will be for sure better. But for now it generates a loop with a chain of calls between operators. And apparently, this is generally slower than going multiple times in a loop.

See for example: Sequences and Inlined Lambdas , but as said above, I think I saw other similar discussions either here, on Stackoverflow or maybe Kotlin bugtracker.

Of course, there are cases where sequences are better, but speaking about the general superiority or about the default approach which we should choose, the answer is not that simple.

In the corrected example, I suspect that the reason mapArray2 has worse performance than the built-in map is because you’ve involved multiple List implementations. The setUp creates a java.util.ArrayList, and map also creates a java.util.ArrayList, but in using Array.asList(), you create a java.util.Arrays.ArrayList. When the JVM is dealing with multiple implementations of an interface, it doesn’t get as many optimization opportunities.