Map() processed all items and can be very slow

oliver · April 12, 2023, 1:09pm

I am working with huge lists and have some performance problems:
I narrowed it down to the following code:

    var counter1 = 0
    var counter2 = 0
    val list = ArrayList<Point>(100)
    for (i in 0 until 100) list.add(Point(i, 0))

    list.map {
        it.x
        ++counter1
    }.subList(0, 5).forEach {
        ++counter2
    }
    Log.i("XXXXXXXXXXXXX", "counter1 == $counter1, counter2 == $counter2")

I expected the output to be:
I/XXXXXXXXXXXXX: counter1 == 5 counter2 == 5

But it actually is:
I/XXXXXXXXXXXXX: counter1 == 100, counter2 == 5

This means that map() processes all elements even if only a part of them are really needed. Why is it not implemented in a way that it returns an Iterator over the transform function?
This would be way more efficient for huge lists or am I missing something here?
Are there other more efficient versions similar to map() or do I have to write my own transformation function?

broot · April 12, 2023, 3:09pm

Collection transformations don’t work on iterators, but on collections. map() produces entirely new list, the list “doesn’t know” it will be filtered at a later step - it has the same size as the original list.

If you look for lazy transformations then you should use sequences:

list.asSequence()
    .map { ... }
    .take(5)
    .toList()

oliver · April 12, 2023, 3:37pm

Ah thanks, this makes sense. So I should use sequences over lists whenever possible:

    var counter1 = 0
    var counter2 = 0
    val list = ArrayList<Point>(100)
    for (i in 0 until 100) list.add(Point(i, 0))

    list.asSequence().map {
        it.x
        ++counter1
    }.take(5).forEach {
        ++counter2
    }
    Log.i("XXXXXXXXXXXXX", "counter1 == $counter1, counter2 == $counter2")

Now it’s printing:
I/XXXXXXXXXXXXX: counter1 == 5, counter2 == 5

broot · April 12, 2023, 3:48pm

Well, I don’t think we should use sequences whenever possible, but whenever we can benefit from them. Your example is a case when it is reasonable to choose sequences over collections.

As for now sequences are generally a little slower and they require additional code. I personally use collections by default and switch to sequences if needed.

arocnies · April 12, 2023, 5:23pm

I second what broot said.
Koddos for doing the proper loop of looking for the performance problem instead of pre-optimizing.

Definitely don’t use sequences whenever possible.
A rule of thumb is to switch to sequences for performance when you’re dealing with large collections and many functional operations. Don’t pre-optimize but you can consider if your collection will be unbounded and prepare for that.

And of course use sequences when the kind of data you’re working with should be represented as a stream of items of unknown (potentially infinite) length.

Topic		Replies	Views
Faster Collection.map{} function by 44% Language Design	8	916	May 5, 2024
Issue with mapTo() on a sequence Libraries	4	894	May 16, 2021
Sequences and Inlined Lambdas Language Design	25	4126	September 5, 2022
Using sequences to avoid copying large collection - Performance Support	3	2222	April 2, 2022
Large Sequences performing worse than large lists	1	2045	July 11, 2017

Map() processed all items and can be very slow

Related topics