Must-go-fast: Sequences, Flows, Bounded Channels?

benjaminhill · April 26, 2019, 11:05pm

I have what I thought was simple, but I’m finding out is very complex (thank you for your patience!)

I’d like an app to complete as fast as possible, without running out of memory. It takes in a video file, decodes the frames (thank you ffmpeg!), averages the frames, and writes them back out.

fun main() {
    MyImage.videoFileToFrames(sourceFileName = "molt.mp4")
        .chunked(30)
        .map { it.median() }
        .framesToVideoFile("molt_30x.mp4")
}

Yay Kotlin being so fun: Sequences make this VERY easy.

BUT - I’m not doing things very efficiently, the “median” step take a lot of CPU time and isn’t spread across my desktop’s cores.

So I tried making them use “async { it.median() }” which of course blew up my memory as everything tried to load at once. I needed a pipe that wasn’t infinitely wide…

So I tried sending the async calls through a Channel<Deferred>(NUM_CORES). Which stalls out, because I’m not sure if I should have

main() = runBlocking(Dispatchers.IO) {...
async(Dispatchers.IO) { ...
runBlocking(Dispatchers.IO) { it.await() }

Then Flows came along, which sounded great, but not sure how (or IF) I should make use of them here…

What I really want: Some way to convert that initial code chunk to a way that makes use of all cores (as reasonable), but doesn’t blow up with an out of memory error…

darksnake · April 27, 2019, 5:20am

The basic way to do parallel collection evaluation via coroutunes is to do this:

coroutineScope{
  yourData
    .map{async(Dispatchers.IO){doYourWork(it)} }
    .map{it.await()}
}

This way your first create a number of deferred results on different cores, then collect them.

We are currently discussing parallel processing with flows, but there is not out-of-box support yet.

Another thing is that if you are working on JVM (not MPP), then simple old Java parallel stream could serve you better than coroutines. It was designed for simple parallel processing.

benjaminhill · April 27, 2019, 5:58am

I started with that way, but it blew out the memory, because I couldn’t figure out an easy way to say “limit to a max of myNumCores async jobs pending at any given time” - which is why I started trying to muck around with capacity constrained Channels to get that sort of max-at-once limiter.

If there is an easier way to bake that into a sequence - wonderful!

darksnake · April 27, 2019, 6:02am

I think that basic Java 8 stream().parallel() does exactly what you need. Of course, in this case you need to organize your data acquisition as stream as well. The conversion between streams and sequences is performed by stdlib-jdk8 functions.

nickheitz · April 27, 2019, 11:40am

Wouldn’t Dispatchers.Default, rather than Dispatchers.IO do the trick?

darksnake · April 27, 2019, 6:14pm

You should not use Default for blocking tasks. And this is obviously the case.

nickheitz · April 27, 2019, 6:58pm

Ok, only it seemed that the OP wanted to maximise core usage for something both CPU and memory intensive. It was my understanding of the docs that “cpu intensive” was what Default dispatcher is for.

I do admit to being relatively new to coroutines, however.

darksnake · April 27, 2019, 7:12pm

The use of CPU is more or less the same. IO could create additional threads for new tasks though. The Default is not recommended for blocking task because you can accidentally block the whole coroutine framework. The correct way is to create a separate thread pool with number of threads equals the number of effective cores.

benjaminhill · April 27, 2019, 10:03pm

ok, so I create the pool with the # of cores. Maybe I’m misunderstanding Sequences - can’t it still race ahead if not constrained?

I got the following working, but it felt like a hack bridging Sequences and Blocking Channels. If I was sure that the final “record” step would always stay ahead of the game then sure, no worries. But if reading is medium-fast, intermediate steps are (might be fast might be VERY slow), and writing is kinda-slow, then I think I need to be careful and have something like the capacity bound blocking channel.

But, as you noticed - it feels kludgy.

runBlocking(Dispatchers.IO) {
    // 12 = cores, could be done dynamically.
    val rc: ReceiveChannel<Deferred<UByteArrayImage>> = produce(capacity = 12) { 
        UByteArrayImage.videoFileToFrames(sourceFileName = "molt.mp4")
            .chunked(30)
            .forEach {
                send(async { it.median() })
            }
        LOG.info { "Done with read. " }
    }

    rc.asSequence().map {
        runBlocking { it.await() }
    }.framesToVideoFile("out.mp4")
    LOG.info { "Done with recording." }
}

darksnake · April 28, 2019, 6:08am

You should read a bit more about coroutines handling. You should never call runBlocking inside the coroutine.

benjaminhill · April 28, 2019, 6:21pm

I thought I had, drat. The compiler said it was outside of a coroutine body without it.

@elizarov - I saw your most recent post of Kotlin Flow mentioned backpressure. Is that this problem, or am I barking up the wrong tree thinking that Flow will solve this exact issue?

elizarov · April 28, 2019, 10:32pm

That is different problem. What you want is to do parallel map, but to limit its concurrency at the same time (so it will not run out of memory) and we don’t yet have nice primitives for that available in our libraries. As was pointed out, Java parallel streams can do that.

benjaminhill · April 29, 2019, 1:20am

A. Very clear answer, I’ll try parallel streams. (Tried them. Maybe incorrectly) . It did limit the processing, but was harder to do all the fun sequence things like “chunked”

B. Drat!! I was really hoping back-pressure was what I was looking for. Oh well. Thank you for checking!

benjaminhill · April 29, 2019, 4:18pm

I found Parallel flow processing · Issue #1147 · Kotlin/kotlinx.coroutines · GitHub and think I am re-debating the same need. Moving this conversation to there. Thank you for the feedback -and I look forward to whatever Kotlin eng decides the canonical solution should be!

darksnake · April 29, 2019, 6:22pm

The thread you are referencing is about multi-platform. If you are targeting JVM, Streams already provide all what you need. It is really hard to do better.

benjaminhill · April 30, 2019, 5:28pm

Sounds reasonable. Thank you!

akaigoro · May 1, 2019, 4:26pm

it is not a different problem. To limit concurrency, we need to limit the size of the input queue of the thread pool, and to limit the size of queue we can employ backpressure.

Topic		Replies	Views
Are flows/coroutines slow? Libraries	5	1016	July 16, 2023
Using Channels with threads for a data pipeline Support	3	1068	January 1, 2019
Kotlin Coroutines are super confusing and hard to use Language Design	6	9213	December 6, 2018
Guidance for suspend functions	2	6792	October 6, 2017
Understanding Kotlin's coroutines performance Native	4	3613	April 3, 2019

Must-go-fast: Sequences, Flows, Bounded Channels?

Related Topics