When a suspending function makes another suspending call, it allocates a Continuation to remember its state in case the called function suspends.
This happens even for tail calls, even though it is not necessary to maintain the function’s state in that case – it would be fine to pass through its continuation parameter instead of allocating a new one. Presumably the tail call Continuation is allocated just to produce a stack trace when an exception is thrown.
It would be great to be able to avoid this overhead in performance-sensitive code, even if that means the calling function might be missing from some stack traces. There is a tailrec keyword that seems like it should really enable these kinds of optimizations at the cost of stack trace omissions, but it doesn’t work for this case. Is there another way?
For context, I’m writing suspending replacements for Java IO streams, and one of the goals is performance – I’d like to be able to pump N data through a whole filter chain without allocating O(N) Continuations. I ended up finding a good way without this optimization, but it seems like this sort of optimization would still be useful in many other situations.
Flows don’t seem to be appropriate for reading or writing bytes or text from files or sockets, or for implementing filters and transcoders on byte or text streams.
You could write these things in a pretty straightforward way using flows, but it would be slow, involving a lot of buffer allocations, or complicated interactions with a special element type and maybe custom object pools.
If you were to take Java InputStream and OutputStream, and just add a suspend modifier to the read and write functions, then you would get closer, but the costs of implementing a filter chain like this would be much higher in Kotlin than they are in Java, because every time suspend read(...) calls upstreamInputStream.read, it would allocate a continuation.
It’s pretty hard to do this in a way that is both performant at Java’s level and idiomatic. Kotlin coroutines are fast… but not that fast. There is that allocation overhead even when nothing suspends.
I expect this is a factor in why Kotlin doesn’t provide suspending replacements for InputStream and OutputStream. I’m sympathetic but I think it’s a pretty big omission. I’ll put mine up on github pretty soon, since I think they’re turning out really well. You can do this, for example, and it’s fast:
Yeah, Ktor is very impressive, but if I correctly grok the underlying I/O layers from my shallow inspection, then I think the exchange of packets and use of object pools makes difficult to do the low-level work of making byte/char sources, sinks, and transformers. It feels like it’s about midway between ordinary java streams and java NIO in terms of ease of use.
Also, I don’t see any byte filtering examples, but it looks like they would suffer from the problem I describe. If you have a filter chain with a source and M filters, then reading from the last one would allocate M continuations for each source read. If you read N packets, that’s M*N allocations of continuations, which really eats into the savings you get from using ObjectPool for packets.