Remove repeating values

Hi! My question is not about removing duplicates from a collection. It’s about removing duplicates only when they occur in series in a collection.

For example:
[1,2,2,3,2,1][1,2,3,2,1]
["A","A","B","A","C","C","A","C"]["A","B","A","C","A","C"]

I want to do this in a recursive / functional way and have implemented this as so:

fun removeRepeats(list: List<Any>): List<Any>{
    val last = if(list.size>2) removeRepeats(list.drop(1)) else list.takeLast(1)
    val out = if(last.first() == list.first()) last else listOf(list.take(1),last) .flatten()
    return out
}

I’m wondering if this was the best way to write this function recursively. I don’t know if the if statements are really necessary.

Thanks!

I’m not sure if doing this with recursion is a good idea. Please be aware Kotlin doesn’t support linked lists that you can split into head and tail. In your implementation you make a full copy of the list twice with each iteration. I suggest using functions like filter, map, reduce, etc. instead. Example implementation:

list.take(1) + list.zipWithNext()
    .mapNotNull { (prev, curr) -> curr.takeIf { curr != prev } }

This implementation copies the data at the last step, to include the first item. This can be avoided by making the code less readable. We can either use sequences, so replace both list with list.asSequence() and then add toList() at the end. Or we can use fold with mutable list as an accumulator, but we then have to somehow access the previous item.

Some time ago I suggested a change to Kotlin stdlib to include something like runningReduceNotNull(). With such a function, this would be really easy to implement: https://youtrack.jetbrains.com/issue/KT-51251/runningFoldNotNull-and-runningReduceNotNull

2 Likes

Well, “best” is quite subjective - or better said dependent on the criteria you strive for.

Under the given condition that it has to be recursively (which I would normally not recommend for this kind of function), I have three improvement suggestions:

  1. The function should handle the empty list correctly. An early return is helpful for that and is great for readability.
  2. The function should return a list with more specific element type so that the element type does not get lost when calling the function. Generics can handle that.
  3. I would rename the variable “last” to “tail”.

Here is one solution that takes these three improvements:

fun <T> removeRepeats(list: List<T>): List<T> {
    if (list.size < 2) {
        return list
    }
    val head = list.first()
    val tail = removeRepeats(list.drop(1))
    return if (head == tail.first()) tail else (listOf(head) + tail)
}

Edit: broot was faster. :slight_smile:

Another additional simplification could be to define the function as extension function:

fun <T> List<T>.removeRepeats(): List<T> {
    if (size < 2) {
        return this
    }
    val tail = drop(1).removeRepeats()
    return if (first() == tail.first()) tail else (listOf(first()) + tail)
}

For academic purposes, here is another solution that prevents creating a lot of copies and is still recursive:

fun <T> MutableList<T>.removeRepeats(reversedIndex: Int = 0) {
    if (size < reversedIndex + 2) {
        return
    }
    val index = size - 1 - reversedIndex
    if (get(index) == get(index - 1)) {
        removeAt(index)
        removeRepeats(reversedIndex)
    } else {
        removeRepeats(reversedIndex + 1)
    }
}

The solution does the changes in-place, especially it also changes the original (mutable) list. If it has to work without changing the original list (that probably isn’t mutable), it can be wrapped to copy exactly once:

fun <T> removeRepeats(list: List<T>): List<T> {
    val result = list.toMutableList()
    result.removeRepeats()
    return result
}

However, this whole solution is technically still recursive, but it really is a loop that uses function call stack for the iterations - so it is actually more of an indexed loop than a classical recursion.
Therefore, it probably is only of theoretical interest as a solution that is recursive without creating a lot of copies of the list - but it is still inferior to a not-recursive solution that directly uses an actual loop or for clarity and readability uses predefined functions (that internally do the loop) like the one that @broot showed.

@tlin47 For academic purposes, it should be noted that removing items in the middle of a list may perform a lot of copies…

1 Like

Indeed, depending on the list implementation, any change on the list may perform copying. And removing in the middle of the list is one of the operations where it is quite likely - although for most common list implementations it only requires moving around some elements instead of copying the full list (creating a copy and refilling). This results e.g. in a difference in space vs. time complexity.

A solution that guarantees a minimal number of shifts would have to be tailored to the underlying list implementation - which would also take away the generality.
But like said, the given solution was not one to be actually used, more to prove a point about that it can be done recursively without having to copy in every step (it moves around elements only when needed) - but thereby getting a less-readable solution.
In any case preferable would be a solution that focuses on readability.