Starting a coroutine in a suspend function and _not_ waiting for it to finish

I want to implement a suspending function that starts a coroutine which runs a background job in a loop.

Imagine a connect() function which, once the connection is establishes, needs to spawn a coroutine that receives data in the background continuously, in an infinite loop, until it is canceled. Here’s some pseudo code:

suspend fun connect() {
    // connection is established here
    
    backgroundJob = launchInTheBackground { // This is the unclear part
        while (true) {
            val myData = receiveData(); // This is a suspend function
            processReceivedData(myData);
        }
    }
}

suspend fun disconnect() {
    backgroundJob.cancel()
    backgroundJob = null

    // disconnecting is done here
}

The launchInTheBackground part is what is unclear to me. What I want to do is to start a coroutine in the same scope that connect() is called in. This is important to make sure that connect() does not wait until the loop finishes (which it never does except when you cancel the coroutine).

So far, I had to pass the coroutine scope explicitely. Then I can replace launchInTheBackground with scope.launch. But having to explicitely pass a scope to the function is not exactly nice. I wonder if you guys can think of a better approach here?

Easy. Just surround your connect function body with coroutineScope {} and then do launch instead of the hypothetical launchInTheBackground

This does not work. As the coroutineScope documentation states:

This function returns as soon as the given block and all its children coroutines are completed.

This means that coroutineScope will wait until the loop ends, which it never will.

1 Like

Oh whoops sry I got confused about your use case. Make your function an extension on CoroutineScope, and then use launch normally

See this for more details as to why that’s the case:

(Fun fact, you can actually still use launch inside of a suspend fun, it’s just that auto-cancellation then wouldn’t work. If you want to explicity do that anyways, and to then allow the user to only explicity cancel the job, then call CoroutineScope(coroutineContext).launch { ... } inside of your suspending function and make sure to expose the resulting job

Have you looked at channels/flows? The Kotlin conf 2019 has a pretty nice talk on structured concurrency that includes an example with channels.

You can always do GlobalScope.launch of you want to fire and forget (for better or worse).

The convention is to make your functions an extension function on CoroutineScope if they launch anything, and a suspend function if they simple suspend (not both). In this case I think you can get away with an extension function. You’re resulting call would be GlobalScope.connect

1 Like

Your suspend methods should not take a CoroutineScope and should wait for all work to finish. Methods that launch coroutines without waiting should not suspend and should take a CoroutineScope parameter or receiver.

This is the pattern you’ll see in the Kotlin coroutines library and you can read more about it here: Coroutine Context and Scope

You are on the right track taking the CoroutineScope, just remove those suspend keywords.

While avoiding the CoroutineScope parameter may seem nice, it’s important for readability and reasoning about code if suspend methods finish their work before they return and any method that starts concurrent work is identifiable by taking a CoroutineScope parameter/receiver.

1 Like

The problem is that in the “// connection is established here” part in my pseudo code above, suspend functions are called to establish the connection, which is why I make connect() a suspend function.

Now, you could argue that the part after “// connection is established here” could be extracted into a separate function, but the thing is that that loop must be started, otherwise communication won’t work. And always having to call a second function after connect just to actually finish the connection setup results in a confusing API. If the background read loop always has to be started in order to communicate, why not start it right there in connect() and stop it in disconnect()? There is no meaningful state where the connection is established but the loop isn’t running.

1 Like

You can include establishing the connection in the launched coroutine.

1 Like

That might work. Initially I wanted to reply that doing that would make it more difficult to report errors. However, this is still true of the background loop - if something goes wrong there, I have to figure out how to forward exceptions. This is a classical problem with two typical solutions:

  1. Exceptions in that background coroutine are stashed, and the next IO call immediately re-throws the stashed exception. This is how POSIX IO behaves. For example, if a socket connection breaks, the next recv() / send() call will immediately return with error.
  2. Add a callback that is invoked as soon as an exception is thrown.
  3. Rely on supervisorScope to handle the coroutine failure. Seems the most idiomatic approach, but I am the least experienced with this one.
1 Like

Not failing the parent when children fail isn’t exactly “handling” the failure, though you may want to use supervisorScope either route you go.

When accessing the result of a Deferred, you get the original error. This per use failure works like your #1.

Installing a CoroutineExceptionHandler let’s you capture failures that aren’t caught, passed to a parent, or wrapped up as a result. This more general handler behavior matches your #2.

1 Like

So, back to this. The problem with establishing the connection in the launched coroutine is that the connect() function then exits too early. It must suspend/block (suspend if a coroutine is used, block with traditional threads, though I’d greatly prefer the former) until the connection itself is established, but exits once it launched the background receive loop (launchInTheBackground above). So, I still have to go back to the suspend function.

To summarize: This case seems to only be possible with a suspend function:

suspend fun connect(backgroundScope: CoroutineScope) {
    // suspending functions that set up the connection are called here
    // connect() MUST block/suspend until these are done!
    backgroundJob = launch {
        runReceiveLoop() // runs until the background job is canceled
    }
}

suspend fun disconnect() {
    backgroundJob.cancel()
    backgroundJob = null

    // disconnecting is done here
}

The combination of having to wait until the connection is setup and launching the background receive loop seems to only be possible with a suspend function that takes a background scope.

One detail that may have been overlooked here is that the supplied scope does not have to be the same scope the suspending connect() function was called in. So I wonder if the “your suspend methods should not take a CoroutineScope and should wait for all work to finish” applies even with this detail in place.

1 Like

In cases like that, I expose a separate suspend method like awaitConnected.

Depending on the details, the extra method may be next to connect or I might define something like

interface ConnectionJob : Job {
    suspend fun awaitConnected()
}

and return that.

You can use a MutableStateFlow to track the state. Start it as false, set it true when connected, and call first { it } to wait for it to be true.

2 Likes

I am not sure if awaitConnected is a good idea. On one hand, it allows the caller to decide in what scope to wait for the connection to finish. On the other hand, it makes connect less intuitive, since once it finishes, the connection is in some sort of “half-connected” state. I suppose though that this can be explained away by referring to connect as an asynchronously connecting function…

How would you like the background task to be cancelled if nobody calls disconnect()?

If the answer is “it shouldn’t get cancelled at all”, then you can use GlobalScope.

Otherwise, the constructor for this object (the object that has connect, disconnect, and backgroundJob) should probably require a parent Job or CoroutineScope in a constructor argument to define the lifetime of the tasks it creates.

Got the chance to work on this again. I mostly succeeded in designing the code such that functions that launch new coroutines do not suspend. There is one remaining issue though.

Sometimes it is necessary to launch a coroutine that regularly sends “ping” messages to keep the connection alive. This keep-alive mechanism is not always required, and in fact may lead to other problems if it is always on. Currently this is handled like in this pseudocode:

suspend fun someOperationThatStartsKeepAlive() {
    // [...]
    if (keepAliveJob == null) {
        keepAliveJob = backgroundScope.launch {
            while (true) {
                // keep-alive ping code
                delay(1000) // one ping every second
            }
        }
    }
}

suspend fun someOperationThatStopsKeepAlive() {
    // [...]
    if (keepAlive != null) {
        keepAlive!!.cancel()
        keepAlive!!.join()
        keepAlive = null
    }
}

(backgroundScope is the scope that was passed to connect().)

This works, but is not exactly clean. Perhaps it would be possible to launch the keepAliveJob coroutine in the connect() call and “pause” that coroutine until keep-alive is actually needed. This partially works by launching with the LAZY CoroutineStart flag. The problem is that I cannot restart the coroutine, which is necessary if I had to turn off keep-alive and now need it again. Suggestions?