Why does is this code executed faster in a thread pool with 1 thread compared to one with more

I’m currently starting out with coroutines and using them with spring webflux. I implemented an IO heavy endpoint but it seemed to get slower after my reactive implementation. Because of that, I tried something with coroutines that confused me even more.

val scope = newFixedThreadPoolContext(2, "exampleThreadPool")

@OptIn(ObsoleteCoroutinesApi::class)
fun main() {
    runBlocking {
        val tasks: List<suspend () -> Unit> = (0..10_000).map {
            {
                runTask()
            }
        }

        val time = measureTimeMillis {
            tasks.parSequence(scope)
        }

        println("computation took $time ms")
    }
}

suspend fun runTask() = coroutineScope {
    println("running before call on  ${Thread.currentThread().name}")
    listOf("a", "b", "c").map { x ->
        delay(2500)
        println("running task  $x on ${Thread.currentThread().name}")
    }

    println("running after call on  ${Thread.currentThread().name}")
}

The runTask function is supposed to simulate a task that suspends multiple times. This task is run 10.000 times on the defined scope with the help of Arrows parSequence function. What troubles me is that when I run it on 1 thread it takes roundabout 7700ms. When I run it on a thread pool with multiple threads it takes always more time. For 2 7800ms for 4 7800ms. When increasing the number of tasks to 100.000 the results are even slower. For 4 threads 10100ms and for 1 8800.

Shouldn’t the single thread thread pool be slower because at some point in time it just blocks?
I guess I am missing something important about this concept. So thanks for your help in advance :slight_smile:

1 Like

You need to remember that creating and managing threads is not free. In general you should parallelize only CPU-intensive tasks that benefit from being run in parallel. In your case, the task is not using CPU and even does not emulate using CPU (for that you would have to use Thread.sleep instead of delay). This means that all tasks are finished exactly 7500 ms after the start. Everything else is the overhead of creating run jobs and dispatching them. The more threads you use, the more overhead.

6 Likes