Questions about structural mutability

What I mean by structural mutation (I think it’s the correct technical term but I can’t find much on this topic) is for example if a lists mutates its internal representation to make place for a new entry or optimize for speed and so on.
An Array haves a fixed size, so could I assume that Kotlins in array never mutate their structure (meaning if I add, delete an entry, only that entry is touched). Is Kotlin guaranteeing this?

Also for mutable collections like lists and maps, is somewhere documented which functions mutate the structure of the collection itself and not only the entry (like I expect add to mutate the structure)?
Or is this never exposed meaning that even if the get function (mylist[42]) doesn’t mutate the structure of the List today, it still may mutate it in future versions?

So I’m interested in the special case of arrays and also in mutable collections in general because the answer could possibly differ.

I suggest specifying the use-case, for which, you are trying to gather these guarantees.

I can give an example, if the array is structurally immutable than two coroutines / threads could use different indexes of an array to read and write there without the need for locks or mutex.

Same for a map, if reading and writing of value to a key doesn’t change the structure of the map, different values could be accessed by different threads simultaneously, but what if you add a new key value pair to the map, would this lead to corruption if you access other keys at the same time or not?

I’m not sure if this topic haves relevance in situations without concurrency.

1 Like

In cases where the Kotlin collection is an alias for the JVM collection you can read the JVM docs.

For example, from ArrayList:

A structural modification is any operation that adds or deletes one or more elements, or explicitly resizes the backing array; merely setting the value of an element is not a structural modification.

Java’s HashMap has a similar note.

Keep in mind that there are no such guarantees for List, Map, MutableList, etc since these are interfaces. There are different implementations and structure depends on the implementation.

As for arrays on JVM, their structure never changes.

Concurrent modification is easy to mess up, I suggest posting immutable messages and doing all modifications sequentially. For example, you could use a coroutine Actor.

1 Like

I know about actors and use them if they fit. What I’m trying to do is to create a pattern for which tells me when I can use concurrent modification and how I can make it easy for me.

The notice I make from your answer is that it’s ok with arrays and not ok with mutable collections (I avoid to define which implementation to use and work with the interfaces when ever I can).
But It would be nicer to get a definition for the array giving this guarantee that is independent from the jvm, one of the benefits of Kotlin is it that it can be portable (like to wasm in future).

It sounds like you want immutable collections. Kotlin currently has read-only collections in the stdlib but it does not have immutable collections in the stdlib.

There’s this kotlinx immutable collections library that isn’t yet released. It’s multiplatform so you could use it in WebAssembly.

I’m not sure if that’s really what I want. I want to mutate the elements, I just don’t want that mutating one element hinders another element to be mutated simultaneously. I actually want even less than that, I just want to know which operation would hinder me from touching the other elements and which wouldn’t.

If I’m right, the immutable collection you linked is more like a normal nonmutable collection in Kotlin (listOf) but with the difference that elements also can’t be mutated. But I do want to mutate the elements, that’s why I ask about the behaviour of mutable collections.

Ah, I see. So you still want shared mutable state, you just want the mutating to be done safely so that a mutation doesn’t break another concurrent mutation.

One way to solve this is to synchronize operations so that no two mutations/reads are happening at the same time. This would allow you to consume your mutable state concurrently*. You could also implement your own data structure that has the guarantees you want via synchronization.

Another option is to avoid mutable state entirely and just use message passing, but that’s a different approach.


Personally I haven’t found many uses for threadsafe mutable collections. For example, I rarely use HashTable which is a threadsafe implementation of Map in Java. I’ll usually synchronize the operations of the class mutating the state and just use a basic implementation such as HashMap.

If you want implementation detail guarantees, don’t use interfaces that don’t give those guarantees.

I don’t think your use-case applies outside Kotlin/JVM. Kotlin/Native doesn’t allow concurrent modification in the same way (see here) , and I’m not sure Kotlin/JS even allows multiple threads.

I’m asking to know if the interfaces give me this guarantees, so that I won’t use them the way if they don’t. That’s exactly the reason why I’m asking here. The interface could give this guarantee by convention. As an example the interface isn’t implementing add() but I can still trust that calling add() on it will add the element into the list (otherwise working with mutableList would be impossible, just imagine add(42) on MutableList would actually add 24 because it’s an implementation detail what the function does inside of its body). Interfaces don’t only give you signatures but also tell you what the functions should do (so the interface could tell you “this function is supposed to add an element without touching other elements”, the implementation detail would than be the question how to archive this). But I got my answer and know that Mutable Collections don’t give this guarantees, so I won’t use them in this way. It’s good that atleast it works for the Array case, actually I have a java program that already makes uses of this with an Array but back than I didn’t realise that this could be problematic.

I want to keep my code as portable as possible but this doesn’t mean I have to migrate now or in near future (I’m interested in wasm but that takes its time too) I haven’t looked in the topic in detail but thinks could change over time (you don’t really need multiple threads for this, coroutines are enough).

Information about thread-safety has traditionally been expressed in documentation, not code. And I’m not sure it would fit, as there’s such a range of behaviours. For example, Effective Java identifies five categories:

  • immutable
  • unconditionally thread-safe
  • conditionally thread-safe (needing external synchronisation in some cases)
  • not thread-safe (needing external synchronisation in all cases)
  • thread-hostile (can’t be made thread-safe)

And some of those need quite a bit more information before you can judge how to use them (such as which methods and patterns of use are safe and which need guarding; how that guarding needs to be done; performance implications; and so on).

(In fact, there are some semi-standard annotations — javax.annotation.concurrent in JSR305 — but that has only three categories, and I don’t think they’re widely used — possibly for that reason.)

Certainly, in the case of Java collections, different implementations have very different threading behaviours. Many are thread-compatible but not thread-safe in themselves (such as ArrayList); some are truly immutable; some give guarantees about thread-safety and/or performance (such as ConcurrentHashMap).

In the particular case of allowing a List to be mutated without disrupting any other threads using it, the CopyOnWriteArrayList class is a good solution: whenever it gets mutated, it takes a copy of its internal array, so as not to affect any code iterating through it. (This is especially appropriate for lists of event listeners, where dispatching events by iterating through the list is much more common than adding/removing listeners.)

1 Like

Unfortunately not true. Memory largely works per cache line, not byte. Look up the concept of false sharing. X86 processors are cache coherent so only slowed down by this, other architectures may require explicit synchronisation. Of course this assumes that the data type written can be written atomically (processors work on words - 32 or 64 bits - not bytes or bits, but may have byte instructions that don’t race anyway)

1 Like

Thank you for the information, that’s really unfortunate as you described it and takes away the whole idea. It was a concept one of my profs had shown to us, I blame him on this.

In general I’m sry that I don’t have the time to respond to every comment, Corona just reached my immediate surroundings in the last days leading to some unexpected chaos here.

No problem. False sharing is a very nasty one. Your professor is correct in that it is thread safe, and in a single core environment (even with hyperthreading) works as expected. Modern computers have more, and still need synchronization of memory access and caches so this can slow things down very much. Memory is orders of magnitude slower than L1 cache and on many architectures leads to full cache invalidation (there are some architectures were some of the cores share some degree of L3 cache).

The main lesson is that programming thread primitives is really hard and has many many caveats and pitfalls, both in correctness and in speed/overhead. It requires a deep understanding of how your chosen processor works to do it in an optimal way. In addition, the JVM doesn’t expose sufficiently detailed low-level controls to do this in detail. It can however intrinsify (do something special with standard library classes and not actually call the java code at all - as it is standard it knows the semantics) the standard library, so in almost all cases just stick with the standard threading components that can be optimized better and have had a lot of expert eyeballs (and usage).

1 Like

I have yet to encounter a higher-level language than assembly that permits false sharing. Java has a rather well-defined and quite strict memory model that does not permit false sharing. Standard C/C++11 has a weaker memory model, but it still does not permit false sharing:

Different threads of execution are always allowed to access (read and modify) different memory locations concurrently, with no interference and no synchronization requirements.

where

A memory location is

  • an object of scalar type (arithmetic
    type, pointer type, enumeration type)
  • or the largest contiguous sequence of bit fields of non-zero length

Source: Memory model - cppreference.com

This isn’t possible in Java or Kotlin. You would need a fully dependently typed language for this to model relationships between values but even then the type system is requiring you to solve these kinds of restrictions (from the implementor).

You could of course use Rusts concept of ownership management but it is very restrictive and not optimal at all.

Likewise you could use in/out parameters which copy the content during execution, manipulate the new location and copy the value back to original memory location.

Or better you synchronize all methods or let each class/object become its own actor communicating only over queues with other actors over messages, I think the latter was Alan Kay’s vision of object orientation.

Shared state is always logically unsafe if you can’t express the logic. But totalizing the world doesn’t help either when it comes to intractability or undecidability.

This is exactly the logic I’m talk about, you want contracts.
For instance, you could define a contract that x is in the list you want to add to and the size is incremented by one, but even then you don’t know if the other elements have changed, so you need to assert that all input elements of the original list will also be in the result list and the order must match.

Imagine how wonderful bloated such a signature can become probably chasing every newbe back to Java.

Or you can do it like in Idris, pulling the whole function block into the function signature requiring the implementor to satisfy the interface signature implying to decide the undecidable function equivalence.