Storing map in code vs in a file

In situations where we have to add a large immutable string to string map with all entries known during compile time, into our program:

What is the better option?

  1. store the map entries in code, e.g.:
        fun loadMap(): Map<String, String> = mutableMapOf<String, String>().also {
            it["K1"] = "V1"
            it["K2"] = "V2"
            it["K3"] = "V3"
            /* ... */
        }
    
  2. store the map in a resource file; load file and de-serialize it into a map when needed

It appears to me that option (1) is far superior with no limitations, perhaps beside the method size limit.
(2) is potentially more maintainable but that’s about it… am I correct?

1 Like

I’m not 100% sure about the pros and cons.

Mostly because de-serialization happens even if you store the map in the code. In that case de-serialization is done by the VM when it loads the class, and the serialized file format happens to conform however the VM stores the compiled code.

CPU wise I think it is better to have a small cycle that goes over the data than a long-long-long code that does not fit into the CPU cache.

Also, a file stored outside is much more flexible, easier to reuse, export, import etc.

On the other hand, if your map is fix and you reference the entries individually in your code I think it is much better to use delegates than string keys. That way the compiler checks if the reference is valid instead of having possibly typos all around.

1 Like

I write constant data in source code only for maintainability, otherwise I prefer an external resource file.
I vote for 2.

This can be simplified to:

    val map = mapOf(
        "K1" to "V1",
        "K2" to "V2",
        "K3" to "V3",
        /* ... */
    )
1 Like

to is an infix function that construct Pair instances; it would worsen the performance especially for a large map, no?

Without benchmark, I prefer readability over perfomance.

In our case where a large map is constructed, when we are too lazy to do a benchmark, I find it wiser to use the guaranteed more performant solution, than to use a solution that may or may not be the same as performant.

I understand the concept that “premature optimization is the root of all evil”, but I am not taking the risk of creating large amounts of unnecesary Pair objects, plus the extra array created by the vararg function.

I find it wiser to use whatever is easier to read by a human and not whatever could potentially make my app start 5ms faster.

Also, I wouldn’t call it a “guaranteed more performant solution”. One solution requires to allocate Pair objects, but JVM is pretty good at allocating short-lived objects. Second solution requires to copy the whole data multiple times to grow the resulting map. I have no idea, which is faster, but I wouldn’t be surprised if mapOf(). But still… these are so small differences that we shouldn’t really care.

1 Like

What do you mean copy the whole data? Could you explain please

Do you know how growable data structures based on arrays work (e.g. HashMap)? They simply preallocate more space than they need right now and whenever they run out of space, they allocate a new bigger one, copy the data there and deallocate the old one. In the case of maps they actually have to do more than simply copying, because they need to create new buckets and re-allocate items to them. If you create a map as in your example and add 2000 items to it, I believe it will have to do it 6 times already.

It doesn’t happen if using mapOf(), because in that case we don’t add one item at a time, we know the needed map size upfront. Of course, we can provide the size to the HashMap constructor and avoid growing while putting one item at a time. But we didn’t do this in the example above.

2 Likes

ah I see what you mean

public fun <K, V> mapOf(vararg pairs: Pair<K, V>): Map<K, V> =
    if (pairs.size > 0) pairs.toMap(LinkedHashMap(mapCapacity(pairs.size))) else emptyMap()

pairs.size here is the key

Yes, if using mapOf(), the compiler knows the number of items at the compile time and puts this size directly in the bytecode. If adding item after another, the compiler doesn’t know what will be the final size.

Again, assuming this code comes from some code generator, it shouldn’t really matter, because in that case we could easily provide the proper capacity to the map.

1 Like

Thanks, this is an important point that I missed

Others have addressed the performance concerns (which I agree with), but I did also want to point out that there is a more concise way to build the map the way you originally did in a more readable fashion using the buildMap function

val map = buildMap {
      put("K1", "V1")
      put("K2", "V2")
      put("K3", "V3")
      /* ... */
  }

It will also let you specify the capacity if you know it.

This way might actually be faster that your original version, because I notice that it actually uses an internal platform-specific function to do the building in a platform specific way.

2 Likes

It’s good to mention that using buildMap removes the compile-time known size, but you can simply place the size there buildMap(10) { ... }, and as long as it’s within an order of magnitude or so, you should see very similar performance

Which is why I mentioned it:

1 Like

There’s one thing I didn’t understand about the initial question. If there is a strong desire by @V2 to improve the performance of a piece of code that needs to be executed once but without conducting any benchmarks, then what is the issue regarding data in the code or an external resource?

This whole question is a “shower thought”, I had no issue to be solved :joy:

and hey thanks for all the responses, I did learn something