This is an intermittent problem that only occurs about once a week on one of our Linux servers.
We are running a Java application using OpenJDK 17 with an internal Kotlin component. We are depending on Kotlin 1.6.10 and kotlinx-coroutines-core:1.6.0-native-mt to run a server side application which opens many co-routines simultaneously and synchronizes them using a single Mutex object.
We identified using log messages that when the problem occurs our code is unable to acquire the mutex object on the withLock statement, however, it looks like the thread/coroutine is not locked waiting but seems to exit the function after reaching the withLock command, at least this is what we see in the thread dump i.e. that no thread is really waiting for the lock.
We tried running some simple coroutine diagnostic programs but all seem to work without problems.
Next step we intend to try using the kotlinx-coroutines-debug to output more diagnostics about the coroutine state when the problems occurs.
We also increased the number of file descriptors from 1024 to 20000, any other ideas what to try?