As far as general design, I’m still not 100% on the proper solutions for parallelizing IO operations.
At a basic level, we are building a heavyweight cache.
- We are pulling data via HTTP requests from external services
- We are storing that data in a database
- We are retrieving from the database and aggregating the data for our consumers
Each of those steps can be parallelized, but using the same Dispatcher for all of them has proven to be problematic. Typically we will have a few hundred simultaneous requests for external data.
One way to structure this is to run the “get and save” in parallel:
coroutineScope {
val data = getDataFromService(parameters)
repository.saveData(data)
}
Another way is to parallelize all of the gets, then save the collected results in a batched database call:
coroutineScope {
data = allRequestParameters.map( async{ getDataFromService(it) } ).awaitAll().flatten()
repository.saveData(data)
}
In general, the second is the more performant of the two because of the batched database requests.
However, we’ve run into some problems with a shared Dispatcher, because our repository save has a few async tasks.
suspend fun saveData(data) {
coroutineScope {
val saveDataTask = async { writeToDatabase(data) }
val saveMetadataTask = async { writeMetadataToDatabase(data) }
listOf(saveDataTask, saveMetadataTask).awaitAll()
}
}
This will end up stalling the database writes for awhile, until we’ve gotten nearly all of the external service data back. And using that same dispatcher means our consumers have to wait for all of these to complete before any endpoint can return data.
Because of this, does it make sense to have a different dispatcher for the database tasks than the http client tasks?
Likewise, I’m familiar with the “reader threadpool, writer threadpool” model for the database operations. Does it make sense to have a “reader dispatcher” and a “writer dispatcher”?
Just trying to make sense of the right models to use in Kotlin.
Right now it seems to make sense to use dedicated dispatchers for each of these types:
- a dispatcher for external service API calls (possibly a different dispatcher per service)
- a database reader dispatcher
- a database writer dispatcher
Please let me know if I’m thinking about this completely wrong.