Storing map in code vs in a file

V2 · June 16, 2023, 2:10am

In situations where we have to add a large immutable string to string map with all entries known during compile time, into our program:

What is the better option?

store the map entries in code, e.g.:

    fun loadMap(): Map<String, String> = mutableMapOf<String, String>().also {
        it["K1"] = "V1"
        it["K2"] = "V2"
        it["K3"] = "V3"
        /* ... */
    }

store the map in a resource file; load file and de-serialize it into a map when needed

It appears to me that option (1) is far superior with no limitations, perhaps beside the method size limit.
(2) is potentially more maintainable but that’s about it… am I correct?

tothiz · June 16, 2023, 11:55am

I’m not 100% sure about the pros and cons.

Mostly because de-serialization happens even if you store the map in the code. In that case de-serialization is done by the VM when it loads the class, and the serialized file format happens to conform however the VM stores the compiled code.

CPU wise I think it is better to have a small cycle that goes over the data than a long-long-long code that does not fit into the CPU cache.

Also, a file stored outside is much more flexible, easier to reuse, export, import etc.

On the other hand, if your map is fix and you reference the entries individually in your code I think it is much better to use delegates than string keys. That way the compiler checks if the reference is valid instead of having possibly typos all around.

fvasco · June 18, 2023, 8:48pm

I write constant data in source code only for maintainability, otherwise I prefer an external resource file.
I vote for 2.

dalewking · June 24, 2023, 4:17pm

This can be simplified to:

    val map = mapOf(
        "K1" to "V1",
        "K2" to "V2",
        "K3" to "V3",
        /* ... */
    )

V2 · June 24, 2023, 5:00pm

to is an infix function that construct Pair instances; it would worsen the performance especially for a large map, no?

fvasco · June 25, 2023, 6:09am

Without benchmark, I prefer readability over perfomance.

V2 · June 25, 2023, 7:36am

In our case where a large map is constructed, when we are too lazy to do a benchmark, I find it wiser to use the guaranteed more performant solution, than to use a solution that may or may not be the same as performant.

I understand the concept that “premature optimization is the root of all evil”, but I am not taking the risk of creating large amounts of unnecesary Pair objects, plus the extra array created by the vararg function.

broot · June 25, 2023, 7:48am

I find it wiser to use whatever is easier to read by a human and not whatever could potentially make my app start 5ms faster.

Also, I wouldn’t call it a “guaranteed more performant solution”. One solution requires to allocate Pair objects, but JVM is pretty good at allocating short-lived objects. Second solution requires to copy the whole data multiple times to grow the resulting map. I have no idea, which is faster, but I wouldn’t be surprised if mapOf(). But still… these are so small differences that we shouldn’t really care.

V2 · June 25, 2023, 8:50am

What do you mean copy the whole data? Could you explain please

broot · June 25, 2023, 9:15am

Do you know how growable data structures based on arrays work (e.g. HashMap)? They simply preallocate more space than they need right now and whenever they run out of space, they allocate a new bigger one, copy the data there and deallocate the old one. In the case of maps they actually have to do more than simply copying, because they need to create new buckets and re-allocate items to them. If you create a map as in your example and add 2000 items to it, I believe it will have to do it 6 times already.

It doesn’t happen if using mapOf(), because in that case we don’t add one item at a time, we know the needed map size upfront. Of course, we can provide the size to the HashMap constructor and avoid growing while putting one item at a time. But we didn’t do this in the example above.

V2 · June 25, 2023, 9:18am

ah I see what you mean

public fun <K, V> mapOf(vararg pairs: Pair<K, V>): Map<K, V> =
    if (pairs.size > 0) pairs.toMap(LinkedHashMap(mapCapacity(pairs.size))) else emptyMap()

pairs.size here is the key

broot · June 25, 2023, 9:26am

Yes, if using mapOf(), the compiler knows the number of items at the compile time and puts this size directly in the bytecode. If adding item after another, the compiler doesn’t know what will be the final size.

Again, assuming this code comes from some code generator, it shouldn’t really matter, because in that case we could easily provide the proper capacity to the map.

V2 · June 25, 2023, 9:27am

Thanks, this is an important point that I missed

dalewking · June 25, 2023, 11:14pm

Others have addressed the performance concerns (which I agree with), but I did also want to point out that there is a more concise way to build the map the way you originally did in a more readable fashion using the buildMap function

val map = buildMap {
      put("K1", "V1")
      put("K2", "V2")
      put("K3", "V3")
      /* ... */
  }

It will also let you specify the capacity if you know it.

This way might actually be faster that your original version, because I notice that it actually uses an internal platform-specific function to do the building in a platform specific way.

kyay10 · June 26, 2023, 4:15pm

It’s good to mention that using buildMap removes the compile-time known size, but you can simply place the size there buildMap(10) { ... }, and as long as it’s within an order of magnitude or so, you should see very similar performance

dalewking · June 27, 2023, 2:32pm

Which is why I mentioned it:

fvasco · June 28, 2023, 8:08am

There’s one thing I didn’t understand about the initial question. If there is a strong desire by @V2 to improve the performance of a piece of code that needs to be executed once but without conducting any benchmarks, then what is the issue regarding data in the code or an external resource?

V2 · July 20, 2023, 9:02am

This whole question is a “shower thought”, I had no issue to be solved

and hey thanks for all the responses, I did learn something

Topic		Replies	Views
Very poor performance of immutable maps Support	56	25432	November 7, 2019
List/ImmutableList & Map/ImmutableMap might be less problematic than MutableList/List & MutableMap/Map	4	3207	September 11, 2012
.map questions Support	3	1238	September 9, 2020
String literals, way to compute hashCode at compile-time? Support	8	2404	March 31, 2022
Adding Properties to Map that is collected from a sequence Language Design	1	880	October 3, 2018

Storing map in code vs in a file

Related topics