Kotlin idiomatic implementation of

phill · March 7, 2023, 10:52am

I have written the following function to find a histogram median. The loop with an external running total looks ugly compared to other more kotlin-esque code. Is there a way to make it more ‘elegant’ without compromising performance?

I suspect that there is a function on collections that might do this, but I am not sure which. I would like to be told of it and how to use it… that is the way I learn.

/**
 * Assuming that the receiver is a histogram where the value at each index is the count of
 * occurrences of the index in the data set, return the median value of the dataset.
 */
public fun IntArray.findHistogramMedian(): Int {
    val midCount = sum() / 2
    var total = 0
    forEachIndexed { index, i ->
        total += i
        if (total >= midCount) return index
    }
    return lastIndex
}

Thanks to MikibeMiki for suggesting foldIndexed. I have never used folding before, perhaps because I have not perceived a need… A lot cleaner, and no slower:

/**
 * Assuming that the receiver is a histogram where the value at each index is the count of
 * occurrences of the index in the data set, return the median value of the dataset.
 */
public fun IntArray.findHistogramMedian(): Int {
    val midCount = sum() / 2
    foldIndexed(0) {index, total, count ->
        if (total >= midCount) return index
        total + count
    }
    return lastIndex
}

MikibeMiki · March 7, 2023, 11:03am

You can try replacing forEachIndexed with foldIndexed:

phill · March 7, 2023, 11:15am

Thanks - that helps. I have edited my question.

broot · March 7, 2023, 12:08pm

We can also do this:

fun IntArray.findHistogramMedian2(): Int {
    val sums = runningReduce(Int::plus)
    val midCount = sums.last() / 2
    return sums.indexOfFirst { it >= midCount }
}

It is shorter and at least to me easier to read and we don’t have to jump out of the fold operation. However, it creates an additional list.

Also, I’m not sure if this is the correct way to calculate the median. Note that for e.g. [1, 1, 1] it returns 0 which is clearly incorrect. I think you need to handle two cases separately: odd and even number of items.

phill · March 7, 2023, 12:48pm

Thanks Broot, Yes that is certainly shorter and clearer, but I worry about the extra list. It is pretty much what I did first. This code is invoked in a median filter in an image so the code is best being as fast as we can make it.

You may also be right about the odd/even in medians. I think that the median in a set of numbers of even size can fall between two dissimilar values and may not be an integer… But I need to assign the value to an integer array, so am going to have to round somewhere somehow. Seems to work better (and faster) than a gaussian in its present form but I am always open to doing things better.

phill · March 7, 2023, 1:40pm

Though you do have a point with your example @broot. Thanks. Dividing 3 by 2 gives 1, which then matches the zeroth index.
Either I need to divide by 2.0 to get 1.5 … matching index 1. Or I need to use the inequality operator.

martellilaurent · March 13, 2023, 8:14am

Yes that is certainly shorter and clearer, but I worry about the extra list

As in many cases, you’ ll have to tradeoff between memory and CPU : either you traverse the list twice and compute the sums twice, or you do it once and save stuff in memory to speed up afterwards. For the later, an improvement would be a dichotomy search on the sums array, since it is in ascending order by construction, which will give n+log2(n) instead of 2n. But I guess you’ll have to measure against your typical datasets, because the multiplicative constants for each term will probably make the difference.

Topic		Replies	Views
Reductions / cumulative sum	17	6732	July 5, 2018
Duration could support sum, sumBy Language Design	5	1651	May 19, 2021
Sum of vectors Support	15	2356	June 25, 2022
Kotlin: Idiomatic & Declarative way to find count of a loop without mutation Language Design	6	1526	January 30, 2020
`for` statement performance improvement (Kotlin/JVM) Language Design	31	6944	August 31, 2021

Kotlin idiomatic implementation of

Related topics