list.binarySearch


#1

Hi!

I was playing with extension functions and tried to do binarySearch function on lists. And I realized that there is an extension function already that is builtin in Kotlin. However, it is very slow: -

public fun <T: Comparable> List<T?>.binarySearch(element: T?, fromIndex: Int = 0, toIndex: Int = size): Int {
rangeCheck(size, fromIndex, toIndex)

var low = fromIndex
var high = toIndex - 1

while (low <= high) {
    val mid = (low + high).ushr(1) // safe from overflows
    val midVal = get(mid)
    val cmp = compareValues(midVal, element)

    if (cmp < 0)
        low = mid + 1
    else if (cmp > 0)
        high = mid - 1
    else
        return mid // key found
}
return -(low + 1)  // key not found

}

I have edited the function and removed the function call (compareValues) and the replacement function is way faster

test_buildin_binary_search	       8s35m
test_binary_search   		 584ms

^based on my test I will attach it down.

the new one: - (EDITED to match the same semantics of the original)

fun rangeCheck(size: Int, fromIndex: Int, toIndex: Int) {
    when {
        fromIndex > toIndex -> throw IllegalArgumentException("fromIndex ($fromIndex) is greater than toIndex ($toIndex).")
        fromIndex < 0 -> throw IndexOutOfBoundsException("fromIndex ($fromIndex) is less than zero.")
        toIndex > size -> throw IndexOutOfBoundsException("toIndex ($toIndex) is greater than size ($size).")
    }
}

fun <T : Comparable<T>> binarySearch(list: List<T?>, element: T?, fromIndex: Int = 0, toIndex: Int = list.size): Int {
    rangeCheck(list.size, fromIndex, toIndex)

    var low = fromIndex
    var high = toIndex - 1

    while (low <= high) {
        val mid = (low + high).ushr(1) // safe from overflows
        val midVal = list.get(mid)
        val cmp = element?.compareTo(midVal as T) as Int

        if (cmp < 0)
            low = mid + 1
        else if (cmp > 0)
            high = mid - 1
        else
            return mid // key found
    }
    return -(low + 1)  // key not found
}

It is not an extension function so far. But I believe that the inside function call is the reason of making it slow.

Q1. why compareValues is not an inline function? maybe due to design decisions? where it is not always a good idea to inline it?

Q2. Is there a keyword that gives the programmer the freedom to inline function “f1” for example where f1 is not inline function?

I mean why there is no keyword like “inlineit” and it is used as: -

 fun f1() {
    println("something")
 }

 fun main() {
     inlineit f1() // line 1
     f1() // line 2
 }

Where in this case f1() will be inlined only when we use the keyword inlineit in front of its name, in above code f1() in line 1 is inlined but in line 2 is not.

The junit test: -

class BinarySearchTests() {
    private val max = 10_000_000
    private val times = 10_000_000
    private val largNumbers = (0..max).toList()
    private val r = Random(System.currentTimeMillis())

    @Test
    fun test_binary_search() {
        var found = false

        repeat(times) {
            val item = r.nextInt(max)

            val r = binarySearch(largNumbers, item)

            if (r != null) found = true
        }

        assert(found)
    }


    @Test
    fun test_buildin_binary_search() {
        var found = false

        repeat(times) {
            val item = r.nextInt(max)

            val r = largNumbers.binarySearch(item)

            if (r != -1) found = true
        }

        assert(found)
    }

}

Thank you


#2

You actually did not implement the same semantics. The original/first function allows for null values, where your function doesn’t. You also omit the range checks (perhaps they are overpedantic, but it is a difference).

Q1. The idea is that the JVM will do the inlining if appropriate, but not if it isn’t.

Q2. A function determines whether it is inline or not, this is not determined at the call site (of course you can do an inline refactoring, but I appreciate that this is not the same).


#3

Yes pdvrieze, you are right, I have edited the new code to match the original semantics and still way faster

test_buildin_binary_search	8s35m
test_binary_search   		584ms

I don’t know how inline re-factoring will behave (would it replace the content of the called function to the caller or is it based on the IDE where it puts some notations beside the line of refactoring and replace it before build). I think a keyword like inlineit will be kind of cleaner and maybe fancier.

Thank tou for your replay.


#4

First, the comparison used in your binarySearch function still can’t handle null elements.
The expression element?.compareTo(midVal as T) as Int will fail if element is null or if midVal is null.

Second, even if we close our eyes to nulls for now the comparison is just plain wrong, because it compares element with midVal rather than the opposite (midVal with element) as the original function does.

The correct comparison that can handle nulls would be:

        val cmp = when {
            midVal == element -> 0
            midVal == null -> -1
            element == null -> 1
            else -> midVal.compareTo(element)
        }

I’ve corrected the comparison and benchmarked the tests with JMH and got the following results:

Benchmark                           Mode  Cnt  Score   Error  Units
BinarySearch.binary_search          avgt   30  0,898 ± 0,042  us/op
BinarySearch.builtin_binary_search  avgt   30  0,889 ± 0,040  us/op

Here is the benchmark code: https://pastebin.com/0uMNBRsJ