Co-routine exits when asking for mutex.withLock

This is an intermittent problem that only occurs about once a week on one of our Linux servers.
We are running a Java application using OpenJDK 17 with an internal Kotlin component. We are depending on Kotlin 1.6.10 and kotlinx-coroutines-core:1.6.0-native-mt to run a server side application which opens many co-routines simultaneously and synchronizes them using a single Mutex object.

We identified using log messages that when the problem occurs our code is unable to acquire the mutex object on the withLock statement, however, it looks like the thread/coroutine is not locked waiting but seems to exit the function after reaching the withLock command, at least this is what we see in the thread dump i.e. that no thread is really waiting for the lock.

We tried running some simple coroutine diagnostic programs but all seem to work without problems.

Next step we intend to try using the kotlinx-coroutines-debug to output more diagnostics about the coroutine state when the problems occurs.

We also increased the number of file descriptors from 1024 to 20000, any other ideas what to try?

I never used kotlin native, so please apologize if I provide false info, but regarding this:

This is how coroutines suspend, at least in JVM. If they are going to suspend, they just return from the function and from all functions below on the stack. This way thread is freed to do something else while the coroutine is waiting. If the thread would wait on the lock, it would mean it doesn’t suspend, but block.

I guess in native this is not actually return, because we have a direct access to the stack and the instruction pointer, so I imagine it probably just replaces the stack and jumps to the address space of the dispatcher. But this is a pure speculation on my side. Anyway, the thread doesn’t sleep at the mutex.withLock().

1 Like