Head-tail splitting for sequences: avoid re-entering sequence generator without casting Sequence to Iterator?

Hi

let’s imagine a Sequence instance is given and we want to split it into a pair consisting of sequence’s head element and the tail rest of the sequence.

Q: Is it possible to implement such thing without casting Sequence to Iterator?

Q: If it is not currently possible purely in Sequences, what Sequence’s feature is missing?

The code to start with:

/**
 * could we avoid re-entering sequence generator without casting Sequence to Iterator?
 */

fun <T> headTail(seq: Sequence<T>) : Pair<T, Sequence<T>> {
    return Pair(seq.first(),seq.drop(1))
}

fun main() {
    val seq = sequence {
        println("...seq generator init...")
        for (i in 1..4)
          yield(i)
    }
    
    println("gonna call headTail...")
    val ht : Pair<Int, Sequence<Int>> = headTail(seq)
    println("headTail() returned. Gonna print ht.first...")
    println(ht.first)
    println("print ht.first done. Gonna print.second.toList()...")
    println(ht.second.toList()) // <--  sequence generator is re-entered here
}

Sequence is essentially something we can iterate over. All operations on it are implemented by iterating. It doesn’t understand the concept of iterating starting from the middle. We can’t get a “handle” to some point and iterate from there. So this is different to linked lists where the tail has its representation as an object.

If your main concern is that you wanted to consume the tail only once, but just because of splitting we need to consume the source sequence twice, then this can be easily achieved - as you said, by using an iterator:

fun <T> headTail(seq: Sequence<T>) : Pair<T, Sequence<T>> {
    val iter = seq.iterator()
    return Pair(iter.next(), iter.asSequence())
}

If your case is that you would like to consume the tail multiple times and do not have to skip the first item every time (or skip multiple items if you split multiple times), then I believe this is not possible without copying the data into a new list.

1 Like

“If your case is that you would like to consume the tail multiple times and do not have to skip the first item every time (or skip multiple items if you split multiple times), then I believe this is not possible without copying the data into a new list.”

yep, this is very close to my main case I came from:

basically, the main case was:

I have an ordered finite sequence of N objects.
and I would love to generate all N lists without i-th objects on the i-th list.
Each such a list should be generated in O(1).

E.g. for the input list:
a b c d e
The following lists should be generated:
b c d e
a c d e
a b d e
a b c e
a b c d

Actually those lists without i-th element are then fed to the same process, so the thing with list-slicing becomes even more crazy.

The key for creating each such list in O(1) is to be very flexible and effective on list-slicing.
(If it is possible at all)

I don’t think it has anything to do with sequences. Or with splitting into the head and tail - this is possible only for the first item, but not for items in the middle.

We can “remove” an item from a list with a constant time and space complexity, by creating a view. Simple implementation would be like this:

fun main() {
    val list1 = listOf(1, 2, 3, 4, 5)
    val list2 = list1.withDeletedIndex(2)

    println(list2) // [1, 2, 4, 5]
}

fun <T> List<T>.withDeletedIndex(index: Int): List<T> = DeletedItemListView(this, index)

private class DeletedItemListView<out T>(
    private val list: List<T>,
    private val deletedIndex: Int,
) : AbstractList<T>() {
    override val size get() = list.size - 1
    override fun get(index: Int): T = if (index < deletedIndex) list[index] else list[index + 1]
}

This is just a basic implementation. We should probably add some boundary checks, we can add support for deleted ranges, we can optimize deleting items from the DeletedItemListView etc. You can also check persistent collections in kotlinx.collections.immutable which probably do something similar.

This approach introduces another problem though. If we repeatedly remove items, we create a chain of views, so every access is linear to the number of removed items. I don’t think we can entirely solve this problem. If we assume we can remove any arbitrary items, then in the end of the day we get into situation where for example we picked 500 random items from the original list of 1000 items. There is no way we can find 200th item without copying the data (which is linear) or searching for it (also linear).

So it really depends on your specific case. How deep do you go. If you need to go through all possible cases or only some of them, etc. If you need to keep multiple such views at the same time or maybe you work with a single one at a time, etc. For example, if you need to create every possible subset of the original set, then this is actually possible.

1 Like

exactly my case

Again, I doubt it is possible to meet all these requirements at once:

  • Have a list from a random subset of items from another list.
  • Do not copy the data.
  • Constant-time access to a random index.

If we don’t need to work on multiple such sublists at once, we ever work on a single one, then I think it could be possible to do:

  • Everything except the random access - we won’t be able to ask for 100th item, only iterate over the sublist (linked list).
  • Maybe (but only maybe) meet all requirements, but within O(log n), not constant - by keeping some kind of an index as a tree.

Anyway, what’s the point of constant-time requirement when working with lists? We can’t even read such a sublist without going linear.