For me this looks extremely confusing, I believe that the underlying behaviour should not be influenced by the dispatchers used somewhere in the callstack. Especially when you call suspending functions from other libraries which you know nothing about.
One real case where this led me to big issues (bad performance, program halting) is calling Ktor HTTP client code with Dispatchers.IO. I guess because it creates a lot of coroutines internally and gets stuck when each of the coroutines is called in its own thread.
Would be interesting to hear your opinion if this is ok or not. Probably I am using coroutines in a wrong way. Would be also great to hear @elizarov opinion.
This is an intentional behavior. Function could decide to either use the dispatcher of the caller or specify its own if needed. If it performs I/O, it should switch to IO, if it performs concurrent CPU-heavy computation, it could switch to Default, also it could decide to limit the parallelism or concurrency, etc. And if it doesn’t have such special needs, it could use the dispatcher of the caller, which helps to avoid thread switches and allows the caller to partially control the execution.
Why does this concern you? Are you concerned that whenever you start a thread, it is influenced by the underlying implementation of the OS scheduler and by the number of CPU cores? When using coroutines, threads become carriers of our coroutines, similarly to CPU cores being carriers of threads in the classic code. Usually, we don’t care too much which CPU core picked up our thread. It is a similar story with the coroutine->thread association.
In the end of the day, in both cases your function did exactly what it wanted to do. It launched 3 concurrent coroutines, waited for them to finish, then returned. It doesn’t matter that much if it executed them in parallel or sequentially and in which order. If it needs such control, it should ask for it explicitly.
Unfortunately thread management is not transparent and can seriously affect execution especially when we talk about high-performant code. One case when I faced this was calling ktor client reading from two urls in parallel with Dispatchers.IO which simply led to it’s hanging. Without Dispatchers.IO it works normally but cannot be parallelized.
While I’ve not had enough experience with Ktor specifically (and hence I’m not aware of implementation details that could be so heavily influenced by the execution context), I would inform you that what you’re concerned with is not about coroutines or coroutine dispatchers in general. Rather, it’s about specific dispatchers such as Dispatchers.IO, which has a pool of threads and (if my memory serves me right) employs work-stealing. This is done so that blocking (as customary in Java) I/O operations can happen concurrently (each on a different thread) transparently for the user (i.e., you don’t have to explicitly specify which threads do what). The fact that each suspension point most often resumes to a different thread is necessary; in fact, imagine a situation where coroutine A in thread 1 suspends, and coroutine B is dispatched to the now-idle thread 1. Before coroutine B completes, coroutine A resumes. Where can it resume? Not on thread 1, because it may be doing blocking work from coroutine B. That’s why it will run on thread 2 (or any other idle thread). Basically, the IO dispatcher is optimized for maximum parallelism of blocking operations, which means having the largest-feasible thread pool and evenly distributing work on it.
If you want a single thread, or a thread pool you provide, to run coroutines, you can either implement your own dispatcher tailored for your specific situation or just create a Java Executor and a coroutine dispatcher from it (using Executor.asCoroutineDispatcher()).
In particular, from what I’m understanding, if you want a number of tasks to each have a dedicated thread, making sure resumptions don’t get intertwined, what you’re looking for seems to create a single-threaded dispatcher for each of those tasks. This would create a blocking queue on each thread where the same task can resume to after suspension points, but the dispatchers being unique to every task makes sure that tasks can’t steal threads from each other, and the dispatchers being single-threaded ensures they don’t move across threads (thus eliminating possible concurrency issues you may be facing with Ktor).
There might be a better solution to it that I may not be aware of, so it will be interesting to see what this conversation yields.
I just dumped my thoughts in a rather messy way. Feel free to ask for anything that wasn’t clear.
Your opening question honestly seems like “why did this function run on the main thread when I executed it on the main thread, but ran on different threads when I executed it in a thread pool?” If you have multiple threads, and multiple pieces of asynchronous code, the code will execute on multiple threads. Think of launch as being like using CompletableFuture.supplyAsync.
The way I see this, it’s not only working as intented : it’s meant to be like this to improve thread management. Whether it does or not improve, each developer can decide, but let me point out some useful tenets :
It is normal and expected that this feels weird to someone used to manage their own threads. This is, after all, a different paradigm.
Traditional thread management is evidently not a great way for humans to express themselves. Many programs are not threaded enough for their own good performance ; those that are will most of the time use threads in a too heavy way and in an unprincipled manner (e.g. devs randomly spinning a thread to fix an issue with some blocking call becauset that’s immediately the fastest fix (at least that’s my experience ; YMMV)). Programs that need good multithreaded performance will typically need to be written with that very explicit goal in mind.
The dispatcher concept abstracts away the thread semantics. Traditional thread management doesn’t help you with that ; you’d have to realize you want it, then build it yourself.
It allows for a clear, explicit and documented semantic contract between the dispatcher and its users, an API of sorts. When you say Dispatchers.IO, the reader can know that you want a pool, that you don’t care which of these threads the coroutines use, and very likely that you’re going to do blocking work or something with similar semantics. You can write your own with your preferred semantics.
It abstracts away the implementation of the dispatcher. With an API contract in place, you can improve the implementation of the dispatcher without breaking your clients (well, in theory at least).
It allows easier reuse of the dispatcher implementation.
The Dispatcher object also encourages common usage of thread pools, which is a common use case. YMMV, but in my experience any medium-sized project eventually grows one thread pool per programmer that needs one, wasting resources. Large teams with shared thread pools tend to struggle managing them, as with traditional thread management, it’s pretty difficult to achieve sharing threads past your immediate team boundary.
This has the clear virtue of separating the processing that has to be done from where it has to be done. Traditional thread management mixes this together in an inextricable way.
As I see it, these are the main reasons to build it this way. Whether this is superior in practice to traditional thread management, I guess time will tell. But the bar hasn’t been very high.
One can think the names of the dispatchers in Kotlin are not very clear (I do), but that would be a separate criticism. The debugging tools are also not there yet, but they’re improving.
Now I don’t have a lot of experience with dispatchers yet, but a good amount with traditional thread management, and I’ve suffered from the same difficulties you have : dispatchers are counterintuitive to me, I feel uncomfortable with the explicit bits being somewhere else, etc. But I feel like this about any unfamiliar paradigm trying to improve on something I’ve been doing for a lorg time. Time will tell whether this is better.
Finally, about performance : most use cases do not need the very minute thread management required for extreme performance. And as always, when you do need extreme performance you will not be able to dispense with caring about the low-level details. Having abstractions doesn’t change that. I see this in the same way as, an app with difficult memory constraints can’t afford to ignore detailed memory considerations even under GC but it’s not worse than with manual management (just different), while for all other apps it tends to be much easier.