Very poor performance of immutable maps

Hello,

I’m using Kotlin Native. I have a routine for assembling stuff into a map:

var map = emptyMap<String, String>()
for (i in 1..500) {
  map += i.toString() to i.toString()
}

(representative code only, untested)

I’m finding this is really, truly, painfully slow — the real life version was taking about 800ms to assemble about 500 strings. However, the mutable version:

val map = HashMap<String, String>()
for (i in 1..500) {
  map[i.toString()] = i.toString()
}

…takes about 0.3ms.

This isn’t right — there’s no reason why the immutable map should be any slower than the mutable one. The only thing I can think of is that the immutable map implementation is physically copying the entire map each time I apply a modification. Is this normal, or is it just a bug in the Kotlin Native standard library, which I know is currently very immature?

The documentation doesn’t mention complexity of operations at all (that I can find)…

Immutable map is not supposed to be mutated! Your code indeed creates a new map and copies all of the map’s content to the new object. Please consider, what you are doing first.

If you want to append elements to the map, then you need a mutable map. Use immutable one only if you want to read it (and never write). Also you should never use emptyMap for anything but empty map constant.

1 Like

Er, no, that’s not true — immutable collections are perfectly able to be modified, quickly and easily (sloppy language here: actually what you do is you allocate a new top-level object which internally points at details of the implemention). This is a solved problem: functional programming languages do this as a matter of course.

e.g. here’s Scala’s collection complexity page, which shows that adding an item to an immutable map takes the same time as to a mutable one: Index of /overviews/collections

If this is not the case in Kotlin, then the docs need to say so clearly and explicitly, because it very much violates the Principle of Least Surprise.

4 Likes

You are using plus operation on a map. This operation, when used on immutable map, creates a copy of initial map, it is explicitly stated in the documentation:

Creates a new read-only map by replacing or adding an entry to this map from a given key-value pair.

You can also see the implementation in the source code: https://github.com/JetBrains/kotlin/blob/1.2.60/libraries/stdlib/src/kotlin/collections/Maps.kt#L539

You should remember, that contrary to java (and maybe Scala, I don’t know), Kotlin separates mutable and immutable collections. So you should not use immutable one instead of mutable, it is just a design mistake.

1 Like

Logically copying the map doesn’t necessitate physically copying the map.

To add an item to an immutable HashMap, all you do is duplicate the hash table, where each chain points at the corresponding chain from the original map; except that the one chain you need to modify gets a new list node added on the front. The tail points at the chain from the original map. So you have two cheap allocations, one for the hash table, one for the list node, and now you’ve made a logical copy of the map in O(1) time. If an immutable collection interface offers mutation methods, this is the expectation of how it’ll work.

But you’re right, the Kotlin implementation is backed by a mutable LinkedHashMap, which does a physical copy (it has to, because it’s mutable). That’s really disappointing — that’s not how immutable data structures are supposed to work at all.

It’s a massive library implementation flaw, and seriously needs to be highlighted in the documentation. How do I escalate? File a bug?

(I have found kotlinx.collections.immutable/proposal.md at master · Kotlin/kotlinx.collections.immutable · GitHub, which is a proposal for overhauling all this, but it dates from about 2016…)

3 Likes

Could you first write a use case, where you really need multiple mutations of immutable map, which could not be replaced by mutable map?

In my opinion, if you are really trying to change something in a immutable map, you are doing something wrong and need to revise your architecture.

1 Like

To clarify, Map interface in Kotlin represents a read-only map, not an immutable one. This interface allows to read values from a map, but it doesn’t allow to make any assumptions whether the map could be modified or not.
The only reasonable way to implement plus for a generic Map instance without knowing its actual implementation, is to copy it to another map adding or replacing the specified entry.

2 Likes

@darksnake: the standard use case is that it allows shared references to A while still allowing cheap computation B = A + Pair(). Because A never changes, it’s drastically easier to reason about and makes a lot of code much, much simpler — for example, A becomes trivially thread-safe. There are frequently memory benefits, too (as B, C, D etc will all share the bulk of their representation with A). As I said, this is routine in most other languages with immutable collections.

@ilya.gorbunov: yes, indeed, but that doesn’t mean that the concrete representation of the ImmutableMap class which is returned by emptyMap() can’t override plus with an efficient implementation (which is what I was expecting it would do).

Further investigation shows that the kotlin.collection docs are actually very careful about when they use ‘read-only’ and when they use ‘immutable’, so they are technically correct — but I still think it’s misleading; the API design suggests immutability.

Which, of course, they’re anything but…

val mutableMap = HashMap<String, String>()
val notImmutableMap: Map<String, String> = mutableMap
mutableMap.put("foo", "bar")
/* notImmutableMap has silently changed! */
4 Likes

The primary goal of Kotlin is not to be a functional language but rather to be highly compatible with Java. For this reason, the Collection interfaces are designed such that they can be compiled into Javas collection interfaces in order to make them fully compatible. Since Javas Map interface does not define a non-destructive insert operator, Kotlins Map cannot define one either. Instead, the plus operator is defined as an extension function. Extension functions are non-virtual, which means it must work with all implementations of the Map interface, including those that are actually mutable.

Scala chose a different route by defining a completely new hierarchy of collection types. This gives them complete freedom to add whatever they want to the interfaces so naturally they added non-destructive updates. Scalas collections, however, are incompatible with Javas collections.

That’s a completely reasonable compromise; the current approach avoids needing any Kotlin-specific collection implementation (which is good). But that also locks Kotlin into an implementation with some very unusual behaviour: adding an item to a map should not be O(n) (even if it’s an immutable map). The fact that it’s this way because it has to be to maintain Java compability is an implementation detail.

This needs to be clearly called out in the documentation — right now there’s no indication of complexity at all, or which operations are considered fast or slow. I should not have been surprised to discover this!

So I’d still like to try and get the documentation updated. What’s the best way to go about this?

4 Likes

I don’t think the documentation for the collections/maps are bad here. I think the problem is that most people when starting to use kotlin don’t understand that readonly is not equal to immutable.
People look at List or Map and see immutable even though the documentation only calls it read only.

I think the part of the documentation that maybe should be changed is https://kotlinlang.org/docs/reference/
IMO there should be an additional page (probably listed as other) discussing readonly vs immutable.
This seems to be a common point of confusion and there is no real part of the documentation discussing it. This would IMO help more than adding thousands of additional comments to every function handling mutable vs readonly collections.
I might look into writing up something tomorrow.

I just want to mention that the read-only variants are not necessarily slow to use in use cases like the initial code example - that one can be rewritten like this:

val map = (1..500).associate { it.toString() to it.toString() }

In my experience, rethinking the approach like this yields a reasonably efficient solution to most problems without falling back to mutable collections. The standard library is actually really mature in the functions it provides for this way of working.

4 Likes

Invoke toString once, using map…associate may reduce String allocation

It is kotlin way to do such things, it does not intermediate maps at all. The question was to use read-only maps the way they are used in different languages and not intended to be used in kotlin.

@fvasco Sure, that would have been even better, I was just going off the original example not looking to closely at what the values actually were.

@darksnake Exactly my point, that’ why I wanted to show the actual Kotlin way, as nobody had given an example yet.

@Wasabi375 The documentation does actually refer to the result of mapOf(), listOf() etc as being immutable: mapOf - Kotlin Programming Language So it would be natural to assume (as I did) that these would behave like traditional immutable data structures (which they don’t).

I do think this needs to be explicitly documented. I like the idea of a read-only vs immutable page, but complexity needs to be documented too, as Kotlin’s library doesn’t behave like people would automatically expect.

2 Likes

You have O(N^2) code, as every insertion creates new map object. Not sure what do you expect from it performance-wise, and generally it doesn’t correlate much with Kotlin/Native.

Only if you create the list/map with one element, otherwise it is specified as read-only.

I don’t fully agree with you on that. I think people with an deeper understanding of kotlin would expect this sort of behavior. Collection<T>.plus(element: T) is declared as an extension function therefor implying that the implementation is not dependent on the underlying implementation of the collection. This means that the only assumption the function has about the data is that it is read only, which means that it has to create a copy.
Your expected behavior is not possible to implement as an extension, because the underlying types don’t specify the necessary methods.

You are using functions written for read-only lists/maps and expect them to behave as if the lists where immutable. If you want the benefits of immutable data you should probably use kotlinx.collections.immutable.

1 Like

Oh, yeah — my mistake. (See how easy it is to misunderstand the documentation?)

But I don’t have that deeper understanding, or at least didn’t when I discovered this, so I don’t expect this behaviour. This is why this all needs to be explicitly called out in the documentation.

plus() being an extension function means that it has to copy the map, true — but that’s an implementation detail. It’s not at all obvious from the API, which suggests a traditional immutable map implementation, which encourages people who don’t read between the lines in the documentation (like I didn’t) to expect an O(1) implementation.

Your wrong. The fact that plus is an extension means that it only has access to the interfaces methods and nothing else. The interface is defined to be read-only and not immutable.

Methods in this interface support only read-only access to the list

So I don’t see where it suggests to be an immutable implementation