Coroutine memory management issue

I wish to expose my concern about Kotlin 1.3 coroutine implementation for JVM.

I wrote the follow test code:

suspend fun execute() {
    while (true) {
        val buffer = receive()
        consume(buffer)
        println("GC should collect buffer here")
    }
}

suspend fun receive() = Buffer()

suspend fun consume(buffer: Buffer) = Unit

class Buffer

This code generate a state machine ContinuationImpl with the attribute L$0 : Object, this is used to holds the buffer reference.
The current implementation does not allow the GC to collect the buffer instance, neither when the println occurs.
The state machine should release the L$0 reference before calling consume (fetch buffer and put in in the JVM stack, set L$0 = null, invoke consume), but probably the L$0 attribute can be avoided at all (to produce a smaller state machine).

This code works as expected (buffer is in a local variable named var10000 on decompiled code).

suspend fun execute() {
    while (true) {
        consume(receive())
        println("GC should collect buffer here")
    }
}

Is it possible reduce the memory allocation as much as possible?

4 Likes

Similar issue for

suspend fun execute() {
    var buffer = receive()
    consume(buffer)
    println("GC should collect buffer here")
    buffer = receive()
    consume(buffer)
    println("GC should collect buffer here")
}

and

suspend fun execute(buffer:Buffer) {
    consume(buffer)
    println("GC should collect buffer here")
}

I remind that local variable have to be freed before the consume invocation (it can involve long CPU tasks, locks or I/O operations)

As a consequence, when the coroutine reach the end of life, all local object references should be equals to null.

Will this issue addressed?
A handful of bytecode in the state machine can reduce future, unexpected memory issues.

I think you should create an issue for this. I don’t think there is going to be much of a discussion we can have here. This clearly is a problem with the current system, especially with long running coroutines.

This issue affects also short running, non-blocking/non-suspending coroutines.

My previous example

suspend fun execute() {
    var buffer = receive()
    consume(buffer)
    println("GC should collect buffer here")
    buffer = receive()
    consume(buffer)
    println("GC should collect buffer here")
}

requires a double TLAB’s allocation space, or the oldest buffer have to be promoted to the eden space.

Further, I use in production many thousand of long-running coroutines, so many garbage instances pollute the heap.

It is an interesting requirement to put on the compiler. It is inconsistent with the normal behaviour of the java compiler, but there is a clear memory cost here. The limitation is that it requires the Kotlin compiler to do an optimization pass over the code (which it currently doesn’t do - in this and many other cases), possibly using some form of ssa, with the added requirement to null local fields after they go out of scope (or even after their last use, which is normally earlier).

A garbage collector collects and disposes all unreachable instances, holding a reference in an unused (or no more used) local variable is not considered a hard link.

To avoid premature instance finalization you have to use java.lang.ref.Reference.reachabilityFence​, see reference documentation for further details Reference (Java SE 11 & JDK 11 )

For my point of view the current kotlin compiler behavior violates the GC specification.

There is no guarantee that the JVM will do this, it is an optimization that you can use the fence to avoid/disable. More importantly, it is an optimization in the JVM/Jit compiler, not in the Java compiler. In a coroutine all fields that span a suspend transition have to be stored as fields on the underlying coroutine state object. As a consequence of this implementation difference the JIT/JVM is unable to detect the fact that the field is now unused. Theoretically the JVM could be taught to look for Kotlin coroutines and implement the optimization (in practice very unlikely). Alternatively setting the field to null is valid, but does require the Kotlin compiler to do the analysis (but having much more clear context) and optimization.

I would suspect that the reason that the rule you refer to exists is that when a field goes out of scope this just allows the compiler to use the stack location for something else. This information is only stored in debugging information, but not in the regular bytecode - that only cares about the maximum stack size. Looking through bytecode for the last reference to a memory location is fairly trivial. However reassignment is much trickier especially in the context of branches. I would strongly suspect that there are edge cases that the JVM does not optimize this where theoretically it would be possible and the memory leak could have a significant lifetime.

Despite all this, I agree that the behaviour you want is desirable and a drawback to Kotlin coroutines as currently implemented. It highlights again the fact that Kotlin is not as close to JVM bytecode as Java is and some optimization in the Kotlin compiler is worthwhile/warranted.

This is a know issue since two years ago, it will not addressed in Kotlin 1.3.

So I have to consider the suspending functions very tricky in production.

https://youtrack.jetbrains.com/issue/KT-16222