Avoid allocation of the Continuation object in tail recursive suspending functions

When a suspending function makes another suspending call, it allocates a Continuation to remember its state in case the called function suspends.

This happens even for tail calls, even though it is not necessary to maintain the function’s state in that case – it would be fine to pass through its continuation parameter instead of allocating a new one. Presumably the tail call Continuation is allocated just to produce a stack trace when an exception is thrown.

It would be great to be able to avoid this overhead in performance-sensitive code, even if that means the calling function might be missing from some stack traces. There is a tailrec keyword that seems like it should really enable these kinds of optimizations at the cost of stack trace omissions, but it doesn’t work for this case. Is there another way?

For context, I’m writing suspending replacements for Java IO streams, and one of the goals is performance – I’d like to be able to pump N data through a whole filter chain without allocating O(N) Continuations. I ended up finding a good way without this optimization, but it seems like this sort of optimization would still be useful in many other situations.

1 Like

Hi @mtimmerm,
can you publish a simple reproducer, using it to describe the issue and explain how to apply your suggestion?

Have you tried using Flows? As @fvasco suggested, try posting here some code to show us better the problem. You may be doing it in anti-pattern!

1 Like

@lamba92,

Flows don’t seem to be appropriate for reading or writing bytes or text from files or sockets, or for implementing filters and transcoders on byte or text streams.

You could write these things in a pretty straightforward way using flows, but it would be slow, involving a lot of buffer allocations, or complicated interactions with a special element type and maybe custom object pools.

If you were to take Java InputStream and OutputStream, and just add a suspend modifier to the read and write functions, then you would get closer, but the costs of implementing a filter chain like this would be much higher in Kotlin than they are in Java, because every time suspend read(...) calls upstreamInputStream.read, it would allocate a continuation.

It’s pretty hard to do this in a way that is both performant at Java’s level and idiomatic. Kotlin coroutines are fast… but not that fast. There is that allocation overhead even when nothing suspends.

I expect this is a factor in why Kotlin doesn’t provide suspending replacements for InputStream and OutputStream. I’m sympathetic but I think it’s a pretty big omission. I’ll put mine up on github pretty soon, since I think they’re turning out really well. You can do this, for example, and it’s fast:

(
    FileSource(file) + 
    Decryption(AESCBC, key)  +
    BytesToText(CharSets.UTF8) +
    DataTransformer(suspendingFunc) +
    TextToBytes(CharSets.UTF8) +
    FileSink(newfile)
).execute() //suspending

This thread is about tail recursion, though. I can’t post the idea that needed tail recursion, since I decided not to go that way.

Oh well, for that purpose a flow is not the right tool, and you would be damn right!

The right tool is a Channel!

The entire Ktor library, both server and client, are built using channels instead of Java streams. Have a look at it!

Yeah, Ktor is very impressive, but if I correctly grok the underlying I/O layers from my shallow inspection, then I think the exchange of packets and use of object pools makes difficult to do the low-level work of making byte/char sources, sinks, and transformers. It feels like it’s about midway between ordinary java streams and java NIO in terms of ease of use.

Also, I don’t see any byte filtering examples, but it looks like they would suffer from the problem I describe. If you have a filter chain with a source and M filters, then reading from the last one would allocate M continuations for each source read. If you read N packets, that’s M*N allocations of continuations, which really eats into the savings you get from using ObjectPool for packets.