Kotlin Coroutines and (upcoming) Java Loom

I heard different opinions about that. Loom does things on a different level, but it does not mean it is “better”. Some argumentation is welcome. I am not specialist in compiler inner workings, but I am interested to hear different opinions.

1 Like

Kotlin is a;b + coloured methods, but really this is very reductive. Kotlin is also an extensive library that combines with language features to provide the whole structured concurrency system, with coroutine scopes, builders, contexts, dispatchers, etc. etc.

I granted initially that Loom’s continuation trick was better, but that’s not the whole story.

Yes. Nobody’s apps will suddenly become asynchronous when they upgrade Java with Loom. You will still have to write asynchronous code to make that happen.

You can write it in Java with fibers or you can write it in Kotlin with coroutines.

If you write it in Java, it will use Loom continuations to implement fibers. If you write it in Kotlin, it should use the same underlying continuation mechanism to implement coroutines, if you target a Loom JVM. The Kotlin Continuation implementations would wrap Loom continuations, and the suspend keyword would have no effect on generated method code.

(If JVM was the only target environment, then Loom would eventually mean that the suspend keyword could simply be removed from the Kotlin language… but it’s not)

There are valid differences of opinion on coroutines vs fibers, but when it comes to the underlying continuation mechanism, Loom’s is just better.

The big difference is that Loom imposes no cost to it until you actually suspend, whereas Kotlin makes you pay whenever one suspending function calls another. This cost is the whole reason for introducing coloured functions instead of just making all functions suspendable.

Loom continuations should also end up faster when you do suspend, should use less memory, and of course produce more compact binaries.

But as I said, Kotlin can use this stuff, too.

I strongly suppose that you are mixing apples and oranges.
Loom’s Frame and Kotlin’s Continuation aren’t interchangeable.

Every Fiber requires a stack, it isn’t so free as you suppose.

This cost can be avoided in some cases, but in all cases the Thread’s stack is released.

It should and should not, it is not possible to test these considerations.
However, this kind of speculation is unuseful without any valid measurements.

Loom is just unavailable.


The question is not whether or not they’re interchangeable. The question is whether or not you can implement Kotlin continuations using Loom’s native facilities instead of CPS rewriting, and whether or not it would be beneficial.

I think you can and I think it would. I hope Kotlin’s developers are open to that sort of thing instead of being focused on a Kotlin vs. Loom narrative. I don’t think it helps anyone to make this a contest.


I don’t think anyone here is focused on anything like Kotlin vs Loom. @darksnake asked you about reasons why you think that Loom is supperior and you tried to back this up, but as @fvasco pointed out some of your arguments, don’t have any data to back them up. Maybe there are proper comparissons, but I couldn’t find any and so far you didn’t provide them either.
I think this part of fvasoc’s answer sums up the problem quite nicely:

This reminds me of disscussions about project Valhalla and how it can be merged with inline classes. In theory this sounds like something that should maybe be considered, but right now there isn’t really anything that can be done about it.
We don’t know anything about how a final version of Loom or Valhalla will look like or when it will be released so speculating about what is better or how they can be combined is just pointless.
Kotlin tries to give the best tools now. If there are better alternatives in the future, kotlin will have to adapt.

I don’t have any real input for this discussion. I’m not a great user of kotlin coroutines and my understanding of it is basic. My understanding of Loom is even less developed, but based on the discussion here, all I can conclude is that it’s to early to decide whether or not Loom would be a good fit/replacement for coroutines.


But you know it’s not that early, right? There’s enough written right here to basically understand how Loom continuations work (you have to scroll down to the implementation section): Main - Main - OpenJDK Wiki

AND, you can actually get a working prototype version right now AND you can look at the source.

It’s really plenty to develop a pretty good mental model of the costs, for anyone who really wants to do so.

Well, I’m pretty old, and what I hear is people rationalizing their emotional investments in their positions.

The same people who are perfectly happy to insist that a growing ArrayList is always better than a LinkedList are refusing to buy that a growing array stack, allocated on demand, along with the relaxation of some annoying restrictions, is better than a linked stack, allocated in advance (even when you won’t need it), that carries the same data.

If you care to read a little bit more into the subject, you will find that it is not yet all clear about the loom. Yes, it should allow to run Java thread-based code in a coroutine style, but it is not tested on real life systems and there definitely will be problems there. As for dispatch itself, you did not provide any valid reference. From other discussions I understood that Loom will still copy stack on each context switch, meaning it could give an overhead over simple reference copy in coroutines.

If you are telling, that there are already buildable samples, could you please build it and write an article with the comparison?

1 Like

The Argumentum ad hominem is the best way to declare that there isn’t any valid topic to support your own argue.
Frankly it seems to me that your ideas are hard to understand.

The first time a Loom coroutine suspends, the portion of the call stack from the top down to the coroutine entry is copied out into an array (actually 2 arrays). In Kotlin, the corresponding data will already have been copied into the linked chain of Continuation objects that get built by suspending function calls. The cost is of the same order for the data that is actually copied, but Kotlin will have performed more allocations, and Kotlin will also have built and discarded these continuation objects for calls to suspending function that didn’t actually suspend. This cost that you pay when you don’t suspend is the one that bothers me, but even neglecting that you can see that a loom suspension will have a lower amortized cost.

When a Loom coroutine is resumed, at least its top-most frame needs to be copied back to the call stack. The return address of this frame will be set to a handler so that when the top-most call returns, it will return into some code that will copy back the next-topmost-frame, etc., incrementally as they are required. Kotlin doesn’t have a directly corresponding cost here, but of course any frame copied back to the call stack must have been copied out at some point, and it’s copied back only once, so we can count this in the amortized cost of suspension.

When a Loom coroutine suspends again, its stack arrays will still contain any frames that will not resumed, so these do not need to be re-copied. It will make space at the end of the arrays if required and copy in any new frames. Again, Kotlin will have made Continuation objects for these frames, etc., etc., so the operation will be cheaper in Loom.

However, the very top-most frame in this case may be a frame that was copied out and in before. Kotlin will only make a Continuation object for a frame once, so this one frame represents an extra cost for Loom that Kotlin doesn’t have. A single stack frame is not a big thing, however, especially since Java doesn’t have any big value types, so this is essentially a small constant cost per suspend/resume that is dwarfed by all the other constant costs involved in that.

So, in terms of actual suspend/resume operations, Loom’s system is more efficient. Added to that, you have more efficient byte code, because it doesn’t have to indirect through a Continuation object, more compact byte code, and no red/blue function implementations.

The basic continuation mechanism in Loom is just better. This is not because the Kotlin guys made any mistakes, of course. It’s just the benefit of being able to mess with the VM.

But again, you know, Kotlin delivers more than just continuations. It’s a whole language that, among other things, provides a practical and easy to use coroutine model based on those continuations. Java has a long way to go before it matches Kotlin in that.


Which says, among other things:

While they are different constructs from the Loom fibers, Kotlin coroutines and Scala fibers will be able to leverage the native implementation. Custom mechanisms for scheduling multiple tasks will be most probably replaced by the native mechanisms. That way they’ll get access not only to better performance (as they’ll be using a native construct), but also to meaningful stack traces and other improvements.


Regarding performance, I saw a lot of discussions but no benchmark numbers, so I decide to do a simple benchmark with the same ‘concurrent prime sieve’ algorithm golang uses on its homepage to demonstrate its goroutine performance.

The result shows kotlin coroutines (on JVM16) is ~2.5x slower than go(1.16.3), while java loom(ea jdk build 17-loom+6-225) is ~4.5x slower. That’s a little surprising to me, loom is not as performant as I was expecting. :frowning:

1 Like

Could you please stop posting benchmark results without a proper benchmarking environment (JMH). In this case, you also use bad coroutine code, which creates additional channels (the channels are the most expensive primitive in coroutines).

1 Like

I know it’s not optimal to use channel for such a task, but here all implementations try to use channel when possible, IMO the key point of a benchmark is to compare the same thing, not to use the best possible solution, why does that not make sense?

And can you plz elaborate on how to use jmh to do cross language benchmarks? Or do you think it is not valid to do so at all.

JVM Benchmarks without JMH are not valid at all due to warm-up time and unpredictable deoptimizations.

About channels, they mean different things in different languages. In your cases it should be Flow, not Channel, and it should use mapping lazy operation, not create a new channel. There are a lot of tricks to do it right. And it does not make any sense to compare “different libraries doing the same thing” because different libraries do different things and have different optimizations.

Also, one thing that the author of the benchmark-game is missing is that different implementations of the VM could do things differently.

1 Like

In your cases it should be Flow, not Channel, and it should use mapping lazy operation

Good to know, will learn about it.

due to warm-up time and unpredictable deoptimizations.

I believe other langs with a JIT compiler have the same warm up penalty, and I don’t think JIT warm-up would contribute to 4-5x slowness and make the result invalid (comparing to go without JIT).

different implementations of the VM could do things differently

That is true, but at the same time ppl want to see comparisons of different aspects between languages or VMs, e.g. GC by binarytrees, I would just point out what is obviously wrong and suggest how to improve instead of saying ‘whatever, just don’t do it.’, which adds no real value to the problem.

It is not only warm-up, it is also deoptimizations.

What I try to explain is that you can’t draw any conclusions from micro-benchmarks. They are always wrong. Starting that different libraries and different runtimes optimize things differently, so it always happens, that you compare something that your runtime is optimized to do better with something another runtime should not do at all. Also, you need to remember to compare the code written by people with a similar level of experience.

The good programmer will write good code in any language.

Comparing coroutines with the loom is another level of mistake because coroutines are API for asynchronous programming and loom is an implementation of parallel execution. It is like proverbial oranges and apples. Coroutines could be used on top of the loom.


different libraries and different runtimes optimize things differently.

Totally agree, but how do you measure that? Just blindly take what is being advertised? Does micro benchmarks not reflect any aspect of this?

loom is an implementation of parallel execution.

That’s sth new to me, AFAIK, ppl have been associating loom with java’s concurrency support (by connecting it to coroutines like this post), do you mean that parallelization is the only goal of project loom?

Coroutines could be used on top of the loom

That is true, but I don’t quite get the logic why loom itself cannot be tested or benchmarked for concurrency tasks while being able to power a better coroutine implementation as fundamenal.

Why not? It could be far more.

Some types of JVM start off interpreting code; then, once a method has been called often enough (or accounted for enough time), it gets compiled in a background thread and used once available. If it gets called much more often, or the JVM may decide to recompile it with heavier optimisations. (It also keeps track of the assumptions made while optimising, in case they change. It can even de-optimise if appropriate.)

And since the difference between interpreted code and heavily-optimised compiled code could easily be a factor of 100 or more, I’d have no trouble believing in a factor if 4–5.