Zippy parallel vector additions in Kotlin (seems like coroutines + locking = slower)

benjaminhill · November 20, 2018, 6:35pm

Disclaimer: this may not have a good answer.

I’m working with BufferedImages, which means slooooow code, but code that screams “you have more CPU cores, please run me in parallel!”

Sometimes I’m getting a BufferedImage pixel->per-pixel-hue:IntArray… which means calling a function for each pixel:

(raster.dataBuffer!! as DataBufferInt).data.asIterable().map { pixel ->
  getHue(
	  red = pixel shr 16 and 0xFF,
	  green = pixel shr 8 and 0xFF,
	  blue = pixel and 0xFF
  )
}

Other times I’m performing some “vector math” on the RGB values results to get running sums between frames:

(colorImage.raster.dataBuffer!! as DataBufferByte)
	.data.asIterable().chunked(if (colorImage.alphaRaster == null) 3 else 4)
	.forEachIndexed { pixelLocation, channels ->
	    // ignore alpha channel 3 if it exists
	    red[pixelLocation] += channels[2].toInt() and 0xFF
	    green[pixelLocation] += channels[1].toInt() and 0xFF
	    blue[pixelLocation] += channels[0].toInt() and 0xFF
	}

Kotlin is fantastic for this: nice chunked operators, simple map syntax. I’m very happy.

But images are big, so it is slow. I’ve got a large collection of stuff to iterate over, and I’ve tried making it go faster using parallel maps with coroutines, but it slows way down. I’m guessing this is due to locking, or memory bottlenecks, or “you can’t beat a for(i in 0..size) { ... }” loop.

Is there a way in Kotlin to burn up all my cores in such a way that would result in faster completion time? I’m fine with using 4x more electricity if it means the app goes 2x as fast.

lamba92 · November 20, 2018, 11:27pm

You may try:

runBlocking {

	(colorImage.raster.dataBuffer!! as DataBufferByte)
		.data.asIterable()
		.chunked(if (colorImage.alphaRaster == null) 3 else 4)
		.mapIndexed { launch { pixelLocation, channels ->
		    // ignore alpha channel 3 if it exists
		    red[pixelLocation] += channels[2].toInt() and 0xFF
		    green[pixelLocation] += channels[1].toInt() and 0xFF
		    blue[pixelLocation] += channels[0].toInt() and 0xFF
		} }
		.forEach { it.join() }
}

Topic		Replies	Views
Must-go-fast: Sequences, Flows, Bounded Channels? Support	16	6688	May 1, 2019
Summing Arrays faster when split across coroutines	2	1542	June 11, 2020
Difficulty seeing speedup in using coroutines Support	12	150	March 19, 2025
Property/inline performance Support	2	1023	August 6, 2020
Corutines context speed	0	984	January 9, 2018

Zippy parallel vector additions in Kotlin (seems like coroutines + locking = slower)

Related topics