Opinions on how to implement efficient coroutine based I/O

Hi everyone

I developper a while ago a platform agnostic networking library and a WebSocket server on top of it. During this somwhat bumpy development I never really managed to get good performances for small payloads.

I’m thinking about redoing everything with a different approach and I’d like to get some opinions before trying it out.

The idea would be to make a Selector that also is a CoroutineDispatcher, coroutine launched on it would have some custom context with information about what event (read write) should trigger the coroutine dispatch

The dispatcher could also catch user coroutine suspention to detect if the consummer is busy allowing it to unregister the interest , making backpressure managment easy

The performance of the whole system is defined by how efficiently I can register a coroutine and its interests to the selector (I really struggled last time with this, I really find the Selector implementation on the JVM suboptimal).

I tried using a Channel to pass registration request to the selector, but it was very unefficient, and then tried ConcurrentDequeue , better but still slow, and then settled on no data passing at all with some synchronization barriers between the caller and the Selector, which works fine but feels clumsy, I might try to implement my barrier differently (with a spinlock maybe).

I have no idea how such code would behave and if someone has opinions on this I’d very much like to ear them