Associate / associateBy merge strategy default

Currently

listOf("a", "a", "b").associateBy { it } == mapOf("a" to "a", "b" to "b")

The current default is to silently choose the last value for each key. Throwing an exception would be safer. An unexpected choice of value is more likely to result in hard to detect bugs than an unexpected failure.

Also, to me at least, failure is the least surprising default strategy. It’s the safest option. There is no obvious default choice amongst the ways to choose a particular value (first, last, random, . . .) and they’re all closer to each other than to failure. Java’s Collectors.toMap() behaves this way (Java doing something is not by itself a good reason to do it, but it does lend weight to “less surprising” given many Kotlin developers are familiar with Java.)

1 Like

The original reason why such strategy was chosen is the performance: in order to ensure that there is no duplicate key inserted we’d have to search each key twice in the map.

I’m on my phone now so no idea if this will format properly or not, but based on the current impl of the associate… template and of MutableMap (so no idea if this has changed since the original impl!) this could be done with only a variable store + comparison extra because MutableMap.put now returns the previous value (if there was one)–

for (element in this) {
    val previous = destination.put(keySelector(element), element)
    if (previous == null) throw something
}
return destination

Obviously now there is also backwards compatibility to consider as well though!

Note that the map can contain null as a value mapped to some key, and put will return null for that key in the same way as it would return null when the key is missing.

Ah, yeah I see where I went wrong.

I was going off the signature / docs of MutableMap.put(K,V):V? “Return the previous value associated with the key, or null if the key was not present in the map.”

Since I saw put(K, V): V? I assumed the value could not be null because V != V? (and since I can’t think of any time I’ve used a map with null values this didn’t immediately strike me as wrong.) But V has no upper bound so can be nullable.

So, entirely my misunderstanding. (The put(…) docs could be more clear on this though! - put - Kotlin Programming Language)

So since Kotlin supports null values in maps, it’s impossible to detect whether or not a value for a previous key was present without calling containsKey(…) - not even the new methods in Java 8 will save the extra call since they don’t support null values.

If you still want to get that behavior, you can resort to more complex combo of groupingBy and reduce, though it doesn’t address the original problem with default behavior.

The idea of groupingBy + reduce is that you group the collection by key and then apply reducing function to an each group:

listOf("apple", "bar", "arc")
    .groupingBy { it[0] }
    .reduce { key, acc, element -> error("Duplicate key '$key' encountered for element '$element'") }

Since the reducing function is applied only when there are at least two elements in a group, you can throw an exception immediately from that function.

More about groupingBy and related extensions here: Grouping - Kotlin Programming Language

1 Like

Thanks! This is nicer than my current solution (we’ve got an Iterable.asStream() extension method for interop with our unconverted Java code so I added an extension method that does something like .asStream().collect(toMap(..))

Should kotlin really sacrifice safety for a little bit of extra performance? Immutable collections operations aren’t that performant to begin with. I’ve run into bugs with associateBy a few times now and they are very, very hard to debug.

It is easy to write what you need yourself. Here is a simplistic implementation:

fun <K, V> List<V>.safeAssociateBy(block: (V) -> K) =
     associateBy(block)
            .also { resultingMap -> check(resultingMap.size == size) }

Thanks jstuyts. This check is also so cheap it could be included by default, so new users wouldn’t run into this kind of problem.

1 Like

On https://kotlinlang.org/ they state that you should choose Kotlin because it’s “Concise”, “Safe”, “interoperable”, and “tool friendly”. I don’t see “fast” on that list, so you’d think the language would default to “safe”.

(Not that performance is unimportant, but until you’ve run a profiler you shouldn’t be opting for fast over safe! Unsafe+fast should be opt-in not the default.)

3 Likes