Associate / associateBy merge strategy default

jordan_stewart · May 24, 2017, 8:55pm

Currently

listOf("a", "a", "b").associateBy { it } == mapOf("a" to "a", "b" to "b")

The current default is to silently choose the last value for each key. Throwing an exception would be safer. An unexpected choice of value is more likely to result in hard to detect bugs than an unexpected failure.

Also, to me at least, failure is the least surprising default strategy. It’s the safest option. There is no obvious default choice amongst the ways to choose a particular value (first, last, random, . . .) and they’re all closer to each other than to failure. Java’s Collectors.toMap() behaves this way (Java doing something is not by itself a good reason to do it, but it does lend weight to “less surprising” given many Kotlin developers are familiar with Java.)

ilya.gorbunov · May 24, 2017, 9:49pm

The original reason why such strategy was chosen is the performance: in order to ensure that there is no duplicate key inserted we’d have to search each key twice in the map.

jordan_stewart · May 24, 2017, 11:36pm

I’m on my phone now so no idea if this will format properly or not, but based on the current impl of the associate… template and of MutableMap (so no idea if this has changed since the original impl!) this could be done with only a variable store + comparison extra because MutableMap.put now returns the previous value (if there was one)–

for (element in this) {
    val previous = destination.put(keySelector(element), element)
    if (previous == null) throw something
}
return destination

Obviously now there is also backwards compatibility to consider as well though!

ilya.gorbunov · May 25, 2017, 12:07am

Note that the map can contain null as a value mapped to some key, and put will return null for that key in the same way as it would return null when the key is missing.

jordan_stewart · May 25, 2017, 6:58am

Ah, yeah I see where I went wrong.

I was going off the signature / docs of MutableMap.put(K,V):V? “Return the previous value associated with the key, or null if the key was not present in the map.”

Since I saw put(K, V): V? I assumed the value could not be null because V != V? (and since I can’t think of any time I’ve used a map with null values this didn’t immediately strike me as wrong.) But V has no upper bound so can be nullable.

So, entirely my misunderstanding. (The put(…) docs could be more clear on this though! - put - Kotlin Programming Language)

So since Kotlin supports null values in maps, it’s impossible to detect whether or not a value for a previous key was present without calling containsKey(…) - not even the new methods in Java 8 will save the extra call since they don’t support null values.

ilya.gorbunov · May 25, 2017, 2:52pm

If you still want to get that behavior, you can resort to more complex combo of groupingBy and reduce, though it doesn’t address the original problem with default behavior.

The idea of groupingBy + reduce is that you group the collection by key and then apply reducing function to an each group:

listOf("apple", "bar", "arc")
    .groupingBy { it[0] }
    .reduce { key, acc, element -> error("Duplicate key '$key' encountered for element '$element'") }

Since the reducing function is applied only when there are at least two elements in a group, you can throw an exception immediately from that function.

More about groupingBy and related extensions here: Grouping - Kotlin Programming Language

jordan_stewart · May 25, 2017, 7:08pm

Thanks! This is nicer than my current solution (we’ve got an Iterable.asStream() extension method for interop with our unconverted Java code so I added an extension method that does something like .asStream().collect(toMap(..))

ricardomf · February 20, 2018, 6:50pm

Should kotlin really sacrifice safety for a little bit of extra performance? Immutable collections operations aren’t that performant to begin with. I’ve run into bugs with associateBy a few times now and they are very, very hard to debug.

jstuyts · February 20, 2018, 7:05pm

It is easy to write what you need yourself. Here is a simplistic implementation:

fun <K, V> List<V>.safeAssociateBy(block: (V) -> K) =
     associateBy(block)
            .also { resultingMap -> check(resultingMap.size == size) }

ricardomf · February 20, 2018, 7:17pm

Thanks jstuyts. This check is also so cheap it could be included by default, so new users wouldn’t run into this kind of problem.

jordan_stewart · February 20, 2018, 8:33pm

On https://kotlinlang.org/ they state that you should choose Kotlin because it’s “Concise”, “Safe”, “interoperable”, and “tool friendly”. I don’t see “fast” on that list, so you’d think the language would default to “safe”.

(Not that performance is unimportant, but until you’ve run a profiler you shouldn’t be opting for fast over safe! Unsafe+fast should be opt-in not the default.)

Topic		Replies	Views
Eliminating the usage of exceptions where possible (like accessing an index in a list) Language Design	4	2612	October 9, 2018
List to Map with key unchanged Libraries	5	1211	April 29, 2018
Map.withDefault not defaulting? Language Design	5	8515	November 8, 2023
Which methods are used for optimisation? Native	10	919	February 24, 2024
Kotlin collection.map supported yet	2	1156	February 26, 2012

Associate / associateBy merge strategy default

Related topics