Map() processed all items and can be very slow

I am working with huge lists and have some performance problems:
I narrowed it down to the following code:

    var counter1 = 0
    var counter2 = 0
    val list = ArrayList<Point>(100)
    for (i in 0 until 100) list.add(Point(i, 0))

    list.map {
        it.x
        ++counter1
    }.subList(0, 5).forEach {
        ++counter2
    }
    Log.i("XXXXXXXXXXXXX", "counter1 == $counter1, counter2 == $counter2")

I expected the output to be:
I/XXXXXXXXXXXXX: counter1 == 5 counter2 == 5

But it actually is:
I/XXXXXXXXXXXXX: counter1 == 100, counter2 == 5

This means that map() processes all elements even if only a part of them are really needed. Why is it not implemented in a way that it returns an Iterator over the transform function?
This would be way more efficient for huge lists or am I missing something here?
Are there other more efficient versions similar to map() or do I have to write my own transformation function?

Collection transformations don’t work on iterators, but on collections. map() produces entirely new list, the list “doesn’t know” it will be filtered at a later step - it has the same size as the original list.

If you look for lazy transformations then you should use sequences:

list.asSequence()
    .map { ... }
    .take(5)
    .toList()
2 Likes

Ah thanks, this makes sense. So I should use sequences over lists whenever possible:

    var counter1 = 0
    var counter2 = 0
    val list = ArrayList<Point>(100)
    for (i in 0 until 100) list.add(Point(i, 0))

    list.asSequence().map {
        it.x
        ++counter1
    }.take(5).forEach {
        ++counter2
    }
    Log.i("XXXXXXXXXXXXX", "counter1 == $counter1, counter2 == $counter2")

Now it’s printing:
I/XXXXXXXXXXXXX: counter1 == 5, counter2 == 5

Well, I don’t think we should use sequences whenever possible, but whenever we can benefit from them. Your example is a case when it is reasonable to choose sequences over collections.

As for now sequences are generally a little slower and they require additional code. I personally use collections by default and switch to sequences if needed.

2 Likes

I second what broot said.
Koddos for doing the proper loop of looking for the performance problem instead of pre-optimizing.

Definitely don’t use sequences whenever possible.
A rule of thumb is to switch to sequences for performance when you’re dealing with large collections and many functional operations. Don’t pre-optimize but you can consider if your collection will be unbounded and prepare for that.

And of course use sequences when the kind of data you’re working with should be represented as a stream of items of unknown (potentially infinite) length.