Coroutine async exception confusion when suspend function is involved

say you have, play link: Kotlin Playground: Edit, Run, Share Kotlin Code Online

import kotlinx.coroutines.*

fun main() {
	runBlocking {
        launch {
            launch {
                println(f())
            }
        }
    }

}

suspend fun f(): String {
    val scope = CoroutineScope(Dispatchers.IO)
    val r = scope.async {
        error("oh darn it")
        "oh boy!"
    }
    
	return try {
        r.await()
    } catch (e: Exception) {
        "oh maybe"
    }
}

A quote from a blog, which made me question everything
“Attention: The exception is only encapsulated in the Deferred , if the async Coroutine is a top-level Coroutine. Otherwise, the exception is immediately propagated up the job hierarchy and handled by a CoroutineExceptionHandler or passed to the thread’s uncaught exception handler even without calling .await() on it”.

Does suspend fun have some additional impact on exceptions and coroutines? The exception above doesn’t bypass the try catch. That async and await are not top level. So, has something changed or is the behavior with suspend functions different.

This is just one of many things after reading or listening to this or that about coroutines, which leads to:
Please consider not writing a blog post or YouTube video until you have done coroutines in the real world. Far too many out there fail and there is no good source I have found yet. Recommendations welcome. We desperately need decent docs on it and the official is not sufficient. I know hard work went into it, thank you, but it falls short.

And this is true. Why do you think otherwise? How does your example show that the error doesn’t propagate in the job hierarchy? Maybe if you actually pointed a finger and provide some links, we would be able to tell if there are code examples that are incorrect.

Also, await() docs are very clear on the fact that it re-throws uncaught exceptions:

resumes when deferred computation is complete, returning the resulting value or throwing the corresponding exception if the deferred was cancelled.

Structured concurrency is defined at one of the very first things in the coroutines docs. Then, it is explained further with an example. Also, this isn’t a term specific to Kotlin, so I think it doesn’t have to be explained as it is a new concept.

1 Like

Thanks for the reply. The try catch in the example handles the exception, the job is not cancelled with coroutine exception handler or passed to the thread uncaught exception handler. From:

" Attention: The exception is only encapsulated in the Deferred , if the async Coroutine is a top-level Coroutine. Otherwise, the exception is immediately propagated up the job hierarchy and handled by a CoroutineExceptionHandler or passed to the thread’s uncaught exception handler even without calling .await() on it".
In the code above this never happens, the try catch handles the exception and this routine is not a top level. So, I am at a loss as to why?

The docs say if cancelled, what if it wasn’t cancelled? In this way the await docs are not clear.

Regarding the linked article… well, actually I agree with you. I sometimes read articles posted e.g. on Kotlin group on reddit and these articles are often of a very bad quality. Not only they don’t explain the topic well, they contain half-truths or are simply wrong.

The linked article stops to make sense at the point (pretty much the beginning) where the author creates code like this: try { launch { error() } } and says “This is very unexpected and confusing” that try didn’t catch error. This is in fact very expected. Then the author “blames” coroutines for this behavior, but he doesn’t understand this is not at all specific to coroutines. He fails to recognize what is the true problem here and then bases his whole article on this wrong assumption. I don’t say the author is a bad software developer, but it doesn’t seem he understands this specific topic and still he decided to teach others about it.

My personal opinion is that the problem is caused by the fact that concurrent programming is by nature pretty complicated and hard to understand. Most of developers never really understood it very well. Mobile platforms make development pretty easy and this attracts inexperienced developers, but on the other hand mobiles require to use at least basic concurrency. People don’t understand it, they make wrong assumptions, they write bad quality code and sometimes… teach about it. Kotlin coroutines makes concurrency easier and more error-proof (personal opinion), but paradoxically it makes the situation even worse, because people use them without even basics of concurrency skills, like for example understanding what is synchronous and asynchronous execution.

Answering your question directly: async() does both. It propagates the error through the job tree (this is expected due to structured concurrency) and it throws from await() (this is expected by the consumer of its result).

1 Like

Structured Concurrency:

  1. Jobs don’t finish until all children are finished
  2. Failed/cancelled jobs cancel all children
  3. A failed child job fails it’s parent (unless using supervisor)

The concept is simple yet the consequences of these simple rules can be tricky to realize.

It does go up the Job hierarchy. If you add invokeOnCompletion listener to your Job() you’ll see that it does fail. I’m guessing you think that Job() creates a child job, but it doesn’t, it’s completely disconnected from whatever coroutine called f. If you want to create a child job, with that method, you have to pass it in as a parameter. The proper way to launch concurrent work in a suspend method is by calling coroutineScope.

EDIT: I think maybe the original post changed? I could have sworn it had CoroutineScope(Job()). But still CoroutineScope(Dispatchers.IO) also creates a new top level Job unrelated to the caller of f(). CoroutineScope() adds Job() if the CoroutineContext doesn’t already have one.

1 Like

Thank you, that I can wrap my head around. I changed my post above to have a complete example. If the exception goes up the job tree then why doesn’t the thread report an exception since no scope has an exception handler? Thanks again.

My apologies, I wanted to make a complete example with a link to try it live. That creates a top level job even though it is nested in multiple launches? I guess it does.

Thank you, I was being a grumpy old man, this is great.

And this is not well shown or represented, thank you. I’ve updated my example. What I was after was doin an async thing, catching exceptions from the network call, but keep cancelation.

Anyway, did you read through the official guide for coroutines? I think official documentation is sufficient to understand coroutines, it explains most of your concerns and you don’t have to worry about misinformation.

1 Like

Idk, I learned coroutines by digging through kotlinx.coroutines source code, reading past KEEPs, reading Roman Elizarov’s Medium posts, and watching Kotlin Conf presentations. The official guide left me pretty lost too. It shows some examples, but doesn’t really explain when you should use which and doesn’t give examples of what not to do.

1 Like

Ohh, interesting. Probably the only really confusing part to me was why can’t we invoke async()/launch() directly in a suspend function. And why it is so freaking hard to launch a coroutine inside a suspend function and return before finishing it. I was basically trying to split a long function with multiple async()/launch() into smaller functions and it turned out to be much harder than I expected. Only after reading Elizarov posts I realized a suspend function is meant to be cold, it is itself some kind of a “node” in the structured concurrency tree.

I agree there should be a section in docs about how to jump into coroutine world depending on the scenario (another question by OP). I missed a section on how to convert callback API to suspend API - I learnt this from SO. Also, I read Elizarov posts and the source code to understand coroutine internals and some advanced stuff, but this was mostly for curiosity.

Maybe I’ve just already forgotten about initial problems and it wasn’t as smooth as I remember :slight_smile:

1 Like

It is things like this found on the official guide at Asynchronous programming techniques | Kotlin Documentation

fun postItem(item: Item) {
    launch {
        val token = preparePost()
        val post = submitPost(token, item)
        processPost(post)
    }
}

suspend fun preparePost(): Token {
    // makes a request and suspends the coroutine
    return suspendCoroutine { /* ... */ }
}

So, what is processPost? looks kinda like a callback, something coroutines are supposed to eliminate. Also how is that function able to call launch? Did processPost get there from the closure? no, so if a person goes to write this code they will run into many issues. It does not help them use coroutines as a replacement.

So, this is just one of many incomplete examples for the non Android coroutine developer. No, the official docs do not cut it, wish they did.

For me now it has been returning a value. Bridging sync code with async, once you’re in a coroutine you feel locked in. For example launch does not allow return, job has no return or job data value.

Do a search and many people ask and get terrible answers. There are just poor to no examples I can find that are complete in this way.

Indeed, this part of page about async programming is outdated, I reported it to Kotlin team.

But I think broot referencing official one for kotlinx.coroutines, and I agree with him, it’s a great guide, it’s up to date, pretty well explains all concepts and show best practices: Coroutines guide | Kotlin

Also, see the full doc section of kotlinx.coroutines if you want more information: GitHub - Kotlin/kotlinx.coroutines: Library support for Kotlin coroutines

Well, it’s true, the only way to bridge back is to use runBlocking,
It is explained in the first section of “coroutines basics” guide:

runBlocking is also a coroutine builder that bridges the non-coroutine world of a regular fun main() and the code with coroutines inside of runBlocking { ... } curly braces.

It also is not a unique problem of coroutines, it’s true for any asynchronous code with callbacks, Rx even threads, or you continue to use async primitives all way down or you just block thread to wait for result

For example launch does not allow return

there is a section in coroutines guide about async coroutine builder and Deferred which allows you to return value:

jump into coroutine world depending on the scenario

It is mentioned in the readme of Kotlinx.coroutines, “read it first” guide which universal for any use case + additional guide about UI programming

I missed a section on how to convert callback API to suspend API

It’s a part of coroutines design document, it was there from the very beginning, even before kotlinx.coroutines guide: https://github.com/Kotlin/KEEP/blob/master/proposals/coroutines.md#wrapping-callbacks

the only really confusing part to me was why can’t we invoke async() / launch() directly in a suspend function

It’s a part of an introduction to coroutines “Hands on” Coroutines and channels − tutorial | Kotlin Documentation

realized a suspend function is meant to be cold

What do you mean? They are not colder than any other function. Cold is a term of reactive streams, it hardly related to suspend functions

Sorry for a lot of citations and links, I just want to say that most of the parts about which people complain and misunderstood are covered relatively well in official documentation and agree, most of 3rd party articles just has too low quality comparing to official guide
I don’t say that docs are perfect, for sure many things could be improved as nothing is perfect in this world

Regarding callback APIs in KEEP: I would still like to see it added to the main documentation as this is a very common need. But well, KEEP is linked from docs as additional resources, so I guess this isn’t that bad.

Regarding both runBlocking()/GlobalScope/CoroutineScope and “launch and return” problems - I think we really talk about different things here. You mean that all these items were mentioned in the docs. True, but this is not the same as explaining them. Yes, docs explain what runBlocking() does, what is GlobalScope, etc., but it doesn’t really help new developers to choose between them, it doesn’t explain what should be used in different scenarios. I think docs should really provide a separate article that would focus on how to use coroutines in practice, how to jump into suspending world depending on the case, how to integrate it with existing, non-suspendable code, etc. It should list ~5-10 most common, real use scenarios and explain how to handle this specific case. Callback APIs would also fit here nicely.

“launch and return” - again, docs explain that we use coroutine scope to launch new coroutines, that coroutineScope() could be used to acquire a scope inside a suspend function, blah blah blah - it really explains this all. Then you start working with the code, you have a function like this:

suspend fun foo() = coroutineScope {
    launch { doSomething() }
    launch { doSomethingElse() }

    // a lot of other code

    launch { doSomethingEvenElse() }
}

You would like to split it into multiple functions for code clarity, like you always do and you discover that you can’t. You understand general concurrency concepts, you basically understand all this stuff around scopes, etc., but then you don’t understand why whatever you do, this really simple and obvious task seems to be impossible without nasty workarounds. And I don’t think this is against generic concurrency programming concepts and/or against anything that was provided in the coroutine docs. It seems impossible just because. I managed to understand the problem only after reading Elizarov’s blog.

But anyway, as I said, I think this documentation is rather good and I think people have problems understanding coroutines mostly because they have limited experience with concurrency in general. They would be even more confused when trying to do the same using threads and with proper error handling, etc.

1 Like

Thanks for you engagement and replies. An example of where the docs as they are fall apart is in the coroutine basics. I mean no disrespect to the author, but it is a lazy explanation full of delay and println calls. Repeated over and over. This is not even close to real world and does nothing to further enforce what coroutines are good at. If anything it makes them look silly. So, I can sleep code and print, hooray!

Again, I understand someone took time to make that doc and it is appreciated, but you only need to show delay and println once. Seriously, that is the easiest thing to grasp. How about a network call?

I’m in the shoes of getting a library that uses coroutines and integrate it with an existing Java codebase. None of the examples come close to helping with this task. This is what many others may encounter and it is not a good story. Staying in the suspend function world is great for library authors and I see that now as intentional for the design, but integrators need some help beyond calls to delay.

There is good instruction for scope, but it doesn’t go far enough.