store the map in a resource file; load file and de-serialize it into a map when needed
It appears to me that option (1) is far superior with no limitations, perhaps beside the method size limit.
(2) is potentially more maintainable but that’s about it… am I correct?
Mostly because de-serialization happens even if you store the map in the code. In that case de-serialization is done by the VM when it loads the class, and the serialized file format happens to conform however the VM stores the compiled code.
CPU wise I think it is better to have a small cycle that goes over the data than a long-long-long code that does not fit into the CPU cache.
Also, a file stored outside is much more flexible, easier to reuse, export, import etc.
On the other hand, if your map is fix and you reference the entries individually in your code I think it is much better to use delegates than string keys. That way the compiler checks if the reference is valid instead of having possibly typos all around.
In our case where a large map is constructed, when we are too lazy to do a benchmark, I find it wiser to use the guaranteed more performant solution, than to use a solution that may or may not be the same as performant.
I understand the concept that “premature optimization is the root of all evil”, but I am not taking the risk of creating large amounts of unnecesary Pair objects, plus the extra array created by the vararg function.
I find it wiser to use whatever is easier to read by a human and not whatever could potentially make my app start 5ms faster.
Also, I wouldn’t call it a “guaranteed more performant solution”. One solution requires to allocate Pair objects, but JVM is pretty good at allocating short-lived objects. Second solution requires to copy the whole data multiple times to grow the resulting map. I have no idea, which is faster, but I wouldn’t be surprised if mapOf(). But still… these are so small differences that we shouldn’t really care.
Do you know how growable data structures based on arrays work (e.g. HashMap)? They simply preallocate more space than they need right now and whenever they run out of space, they allocate a new bigger one, copy the data there and deallocate the old one. In the case of maps they actually have to do more than simply copying, because they need to create new buckets and re-allocate items to them. If you create a map as in your example and add 2000 items to it, I believe it will have to do it 6 times already.
It doesn’t happen if using mapOf(), because in that case we don’t add one item at a time, we know the needed map size upfront. Of course, we can provide the size to the HashMap constructor and avoid growing while putting one item at a time. But we didn’t do this in the example above.
Yes, if using mapOf(), the compiler knows the number of items at the compile time and puts this size directly in the bytecode. If adding item after another, the compiler doesn’t know what will be the final size.
Again, assuming this code comes from some code generator, it shouldn’t really matter, because in that case we could easily provide the proper capacity to the map.
Others have addressed the performance concerns (which I agree with), but I did also want to point out that there is a more concise way to build the map the way you originally did in a more readable fashion using the buildMap function
It will also let you specify the capacity if you know it.
This way might actually be faster that your original version, because I notice that it actually uses an internal platform-specific function to do the building in a platform specific way.
It’s good to mention that using buildMap removes the compile-time known size, but you can simply place the size there buildMap(10) { ... }, and as long as it’s within an order of magnitude or so, you should see very similar performance
There’s one thing I didn’t understand about the initial question. If there is a strong desire by @V2 to improve the performance of a piece of code that needs to be executed once but without conducting any benchmarks, then what is the issue regarding data in the code or an external resource?