I have to perform kind of web scraping parsing and storing to DB, In order to improve the speed I’m using coroutines, by a fortunate accident all my work can be expressed in calls to several function with just one argument an Int.
Like findNewsPaper(n:Int) or findSportPaper(n:int)
This allows me to pack the work in bundles of some size and repeat that operation as much times as needed. So I’m launching job arrays of a function that receives an int
suspend fun runInThreads(sizeblock:Int=10, nblocks:Int=10, maxnumber:Int=100, lng: Lang,fun:(Int,Lang)->Unit ) {
var ntotal = 0
var totalTime = 0L
val ctx = newFixedThreadPoolContext(40,"hello")
// val ctx=CommonPool
// val ctx= newSingleThreadContext("nsct")
for (n in 0..nblocks - 1) {
if (ntotal < maxnumber) {
val time = measureTimeMillis {
val jobs = arrayListOf<Job>()
for (i in 0..sizeblock - 1) {
if (ntotal < maxnumber) {
ntotal++
jobs+=launch(ctx){
afun(i + n * sizeblock, lng)
}
}
}
jobs.forEach { it.join() }
}
totalTime += time
logger.error("n threads ${Thread.activeCount().toString().padEnd(3)} step = $n {time.milisToMinSecMilis()} ($time ms ) /Item = ${time / sizeblock} ms last = $ntotal time = ${totalTime.milisToMinSecMilis()} avg = ${totalTime/ntotal}")
}
}
ctx.close()
logger.error("threads ${Thread.activeCount()} blocks ${nblocks.toString().padEnd(3)} size ${sizeblock.toString().padEnd(3)} total items ${ntotal.toString().padEnd(3)} total time ${totalTime.milisToMinSecMilis()} ($totalTime ms ) /Item = ${totalTime / ntotal} ms")
}
you test the function by passing number of jobs per block, number of blocks, the total number and a function
in this case it just does a delay(500)
runBlocking {
runInThreads(20, 20, 800, lng, { i: Int, LJ: Lang -> testThreads(i, LJ) })
}
I tested the speed for the different Context types:
where blocks 100 size 5 means 5 sets of job arrays of 100 elements each
// calling fun with delay 500 thread test CoomonPool
threads 14 blocks 400 size 1 total items 400 total time 3 min, 20 sec, 559 ms /Item = 501 ms
threads 15 blocks 200 size 2 total items 400 total time 1 min, 40 sec, 289 ms /Item = 250 ms
threads 15 blocks 150 size 3 total items 400 total time 1 min, 7 sec, 211 ms /Item = 168 ms
threads 15 blocks 100 size 4 total items 400 total time 1 min, 40 sec, 266 ms /Item = 250 ms
threads 15 blocks 20 size 20 total items 400 total time 1 min, 10 sec, 194 ms /Item = 175 ms
threads 15 blocks 4 size 100 total items 400 total time 1 min, 8 sec, 149 ms /Item = 170 ms
threads 15 blocks 1 size 400 total items 400 total time 1 min, 7 sec, 145 ms /Item = 167 ms
// calling fun with delay 500 thread test newFixedThreadPoolContext(40,“hello”)
threads 52 blocks 400 size 1 total items 400 total time 3 min, 20 sec, 559 ms /Item = 501 ms
threads 52 blocks 200 size 2 total items 400 total time 1 min, 40 sec, 301 ms /Item = 250 ms
threads 52 blocks 150 size 3 total items 450 total time 1 min, 15 sec, 228 ms /Item = 167 ms
threads 55 blocks 100 size 4 total items 400 total time 0 min, 50 sec, 172 ms /Item = 125 ms
threads 55 blocks 20 size 20 total items 400 total time 0 min, 10 sec, 65 ms /Item = 25 ms
threads 55 blocks 4 size 100 total items 400 total time 0 min, 6 sec, 51 ms /Item = 15 ms
threads 55 blocks 1 size 400 total items 400 total time 0 min, 5 sec, 71 ms /Item = 12 ms
//Calling to a lonely tread newSingleThreadContext(“nsct”)
threads 13 blocks 20 size 20 total items 400 total time 3 min, 20 sec, 443 ms /Item = 501 ms
threads 13 blocks 1 size 400 total items 400 total time 3 min, 20 sec, 471 ms /Item = 501 ms
It seams obvious that for the kind of task I have to do I shoud definitely go for newFixedThreadPool, I’m confused about CommonPol are those results consistent or I’m doing something wrong.
Thanks