Feature request: Head And Tail (implementation provided!)

jacek.s.gajek · August 23, 2021, 10:19am

I’m really surprised it’s not in a standard library!

/**
 * Returns a pair (head, tail), where the head is the first element of the sequence and tail is a sequence containing the rest of it.
 */
fun <T> Sequence<T>.headTail(): Pair<T?, Sequence<T>> {
    val iterator = this.iterator()
    val head = if (iterator.hasNext()) iterator.next() else null
    return head to object : Sequence<T> {
        override fun iterator(): Iterator<T> = object : Iterator<T> {
            override fun hasNext(): Boolean = iterator.hasNext()
            override fun next(): T = if (!iterator.hasNext()) throw NoSuchElementException() else iterator.next()
        }
    }
}

fun <T> Iterable<T>.headTail(): Pair<T?, Iterable<T>> {
    val iterator = this.iterator()
    val head = if (iterator.hasNext()) iterator.next() else null
    return head to object: Iterable<T> {
        override fun iterator(): Iterator<T> =
            object : Iterator<T> {
                override fun hasNext(): Boolean = iterator.hasNext()
                override fun next(): T = if (!iterator.hasNext()) throw NoSuchElementException() else iterator.next()
            }
    }
}

al3c · August 23, 2021, 1:29pm

I don’t think I ever needed to have first and tail as a pair, however you can simplify those functions in:

fun <T>Sequence<T>.headTail() = firstOrNull() to drop(1)

And that should work the same for Iterables.

jacek.s.gajek · August 23, 2021, 2:00pm

This will throw exception for most sequences as they can be consumed only once. But there were bugs in my code which I’ve just fixed.

ilya.gorbunov · August 23, 2021, 6:44pm

How do you plan to work with the returned pair? If you’re going to split the tail again into head and tail recursively, it would introduce more and more iterators wrapping each other, thus turning a linear sequence iteration to a quadratic one.

jacek.s.gajek · August 24, 2021, 6:24am

I don’t plan to call it recursively, but that’s a good point. I used to code in Erlang and that would be a standard way to traverse a collection. (that’s why I was surprised that it wasn’t in a stdlib). I’ve fixed the code so the iterator doesn’t store the head

thumannw · August 24, 2021, 8:04am

The point Ilya wanted to make is that the idiomatic way of traversing a collection in procedural languages is different then in purely functional ones. You should post your usage of headTail and we can see whether it can be replaced by a more idiomatic/effective way.

jacek.s.gajek · August 24, 2021, 12:03pm

As I said, I do not use it for that. Please read my response

thumannw · August 24, 2021, 6:34pm

So, for what else do you use it then? The Kotlin designers wont add anything to stdlib if there is no frequent use case for it, thats the deal.

jacek.s.gajek · August 24, 2021, 6:39pm

Split a sequence of CSV rows which are streamed from a file into a header and data. In my scenario I need both of them. There is some kotlin csv lib which serves results as a nice Sequence<List<String>>.

val (header, records) = rows.headTail()

looks super clean to me.
I cannot use rows.first() because it’s a terminal operation and it closes the stream.

jacek.s.gajek · August 24, 2021, 6:48pm

By the way, it’s not a big deal. headTail could potentially be dangerous because some people might misuse it as a way to iterate a sequence.

However, as an academic exercise I’m curious if it’s possible to implement such operator in a way it could be used in a recursive function (without breaking a linear complexity).

broot · August 24, 2021, 7:21pm

I’m not sure if I see your point entirely. In Java/Kotlin we have two kinds of objects that could be iterated over. Iterator can only be iterated once. Iterable and Sequence are iterable multiple times (at least potentially, I know they may be not), because their API is basically a single method to start a new iteration.

Iterator is very simple and so getting head and tail is as easy as calling next() and the tail is the iterator itself. Doing the same for Iterable/Sequence is harder, because it requires to implement an object that will be able to instantiate new iterators, pointing at the right data. But this is also provided in stdlib with drop() function mentioned above.

It seems you are concerned that source Sequence can only be iterated once and for this reason you decided to intentionally degrade all sequences into once-only. Note that your tail can only be used once, even if the source iterable/sequence can be iterated multiple times.

I mean that if your case is:

Take the first item.
Iterate over the rest, but only once.
Source iterable/sequence won’t be used anymore.

Then for me it seems you are really dealing with iterator, not with a sequence. And you should use Iterator interface which would be much easier. Of course, if you then need to process the tail with sequence utils then you can just call iter.asSequence(). Of course 2, if you start with a sequence and you need to process the tail as a sequence, then it is useful to have some util that converts sequence to iterator and then back to sequence. Such operation is just not very natural for sequences and this is maybe why it is missing in stdlib.

jacek.s.gajek · August 24, 2021, 9:06pm

Looks reasonable. I’ll look if I can simplify the code as you suggest.

That recursive-capable (head, tail) is an interesting exercise though. I’ll try to solve that when I have time. For example, in Erlang a sum would be written like this:

export iterate/1

iterate(List) -> iterate(List, 0)

iterate([], Sum) ->
    Sum;
iterate([ Head | Tail], Sum) ->
    iterate(Tail, Sum + Head).

In kotlin I imagine

fun iterate(seq: Sequence<Int>) = iterate(seq, 0)

private tailrec fun iterate(seq: Sequence<Int>, sum: Int) {
   val (head, tail) = seq.headTail()
   return when (head) {
       null -> sum
       else -> iterate(tail, sum+head)
   }
}

Goal: implement headTail in a way it preserves linear complexity (given the sequence is once-only and we don’t use an iterator directly ofc)
Again, it’s just an exercise not a code which anybody should use

PS.: Pattern matching is really cool It’s literally the only thing I miss from Erl.

broot · August 24, 2021, 10:16pm

This is actually pretty easy:

fun <T> Sequence<T>.headTail(): Pair<T?, Sequence<T>> {
    val iter = iterator()
    val head = if (iter.hasNext()) iter.next() else null
    return head to iter.asSequence()
}

Or to make it easier to understand:

fun <T> Sequence<T>.headTail(): Pair<T?, Sequence<T>> {
    val iter = iterator()
    val head = if (iter.hasNext()) iter.next() else null
    return head to object : Sequence<T> {
        override fun iterator() = iter
    }
}

I may be wrong, but I believe this is linear.

There is one difference between this code and your own. Your headTail() for each step creates a new Sequence object that creates a new Iterator that references the previous one. When accessing 5th item we have 5 iterators already. iterator.hasNext() asks 5th iterator, then it delegates to hasNext() of 4th iterator and so on.

In my code we create a new Sequence for each step, but all sequences use exactly the same iterator. They all use the original iterator from the source sequence, so iterator.hasNext() no longer delegates.

But you need to understand such exercise does not make too much sense. Lists in Erlang were designed specifically to make possible to split them into head and tail. They were optimized for such use case. Sequences in Kotlin were not, so whatever you do, you’ll get bad performance. You can sum items with recursion by utilizing data structures that are suitable for this purpose. That means either Iterator, Deque/LinkedList (by modifying it) or Array/ArrayList and passing the current index. Or you can implement your own data structure similar to Erlang’s lists.

jacek.s.gajek · August 26, 2021, 12:10pm

Well but isn’t the result sequence useless if it’s constrained to be evaluated once? Each subsequent iterator() call will throw IllegalStateException. I wouldn’t create new iterator instances if there wasn’t a reason for that. You can test that by calling .constrainOnce(). Also .asSequence() calls iterator() too.

Edit: I tried it in a scratch fail and it didn’t throw exception. I don’t understand why

kyay10 · August 26, 2021, 1:22pm

Because there was only a single iterator call. The iterator instance simply was (re)used in the return sequence

broot · August 26, 2021, 5:23pm

It works like this:

You invoke: val (head0, tail0) = sourceSeq.headTail()
1. sourceSeq.iterator() is invoked, returning an iterator (sourceIter).
2. Sequence tail0 is created, storing a reference to sourceIter.
val (head1, tail1) = tail0.headTail()
1. tail0.iterator() - it returns sourceIter stored in tail0 earlier.
2. tail1 is created, storing sourceIter.
val (head2, tail2) = tail1.headTail()
1. tail1.iterator() → sourceIter.
2. tail2 is created, storing sourceIter.
And so on.

sourceSeq.iterator() is invoked only once, then sequences pass the same iterator object from one to another.

Of course, this is really one huge hack. If you start iterating over e.g. tail2 you will modify all tails at once. It is only usable for this specific purpose, but I think we can’t do better. If we assume we start with a sequence that is iterable only once then without copying the data we really have just a single iterator, we can’t acquire additional iterators, so we can only go from the beginning to the end.

jacek.s.gajek · August 26, 2021, 7:43pm

Thanks for the explanation. Well, I’ve learnt something now. That’s what exercises are for, even if they’ve no practical purpose

Thanks again for taking time to discuss. Definitely not going to push it as a feature request though.

By the way, maybe [ H | T ] in Erlang is optimized somehow but all collection operators like map or filter are eager (which I find weird).

broot · August 26, 2021, 8:56pm

One final note: it should be possible to implement headTail() to work with both single-use and multi-use sequences. And it would preserve single/multi property in generated tail sequence. Such implementation would be a much better candidate for stdlib.

Still, maybe this is just me, but it seems weird to have such function in stdlib. Sequences are not a general purpose data structures. They were added specifically to perform lazy transformations. Usually, we start by wrapping some collection, iterator, etc. with a sequence, we perform transformations and then convert it back to a collection. Splitting into head and tail seems like an operation typical for collections, not for sequences.

Of course, in your case of CSV file it makes sense to implement some utility functions to make the code cleaner.

Topic		Replies	Views
Head-tail splitting for sequences: avoid re-entering sequence generator without casting Sequence to Iterator? Libraries	5	874	September 22, 2023
Iterator help	15	229	August 14, 2024
Kotlin nullable types: dealing with ambiguous results? Language Design	6	3019	December 11, 2017
Which exception should `Iterator.next` throw when there are no elements left?	1	1683	January 20, 2023
Cannot use take/takeWhile on Sequence	2	1265	April 25, 2015

Feature request: Head And Tail (implementation provided!)

Related topics