Why Sequence instead of Iterator?

@orangy said:

Using iterators other than for iteration has been proven to be error-prone.
It is a mutable object, and if you try to use it like a lazy collection
it will inevitably break things if you are not really careful. That’s
why we dropped extension functions on iterators

Today Kotlin has Sequences. But they are basically a wrapper interface for Iterator. I can not understand why they are better than Iterator. Iterators are constructed easily with AbstractIterator. But Sequence construction is messy: one should override iterator() method and then (most commonly) create an Iterator (see Sequences.kt). This is one extra step, yes, but it seems useless.
It is also a pain to convert something form Iterator to Sequence on Java interop. Why should I convert Iterator to Sequence if I want just to filter values? What do I get from this?

Iterators are different from sequences in that iterators could be iterated only once.

Iterators can be created easily, and sequences are just one step away, just wrap (literally) an iterator into a sequence:
Sequence { createIterator() } or iterator.asSequence()

iterator.asSequence() creates an sequence which can only be iterated once. Sequence interface has zero information about how many times it can be iterated.

Moreover sequence is read-only.

What do you mean by read-only? If you iterate over iterator using forEach, you can not reassign a val.

Please, provide a minimum example, where sequence is “read-only” and Iterator is not.

Sorry, I missed MutableIterator interface.

Personally I prefer to build upon Iterator as well – to get something similar to Java’s streams.

I even built a library based on that notion:

It introduces an interface Stream (which you can convert to/from whatever your heart desire) with a single method fun next(): T? (which enables you to use kotlin facilities for nullable to great effect instead of using someting like hasNext()).

Since I’m lazy I’ll copy the description from the repo:

Streams are best described as iterators, but on which transformations (map,
filter, …) can be applied lazily. Usually the same transformations are
applied eagerly on the stream’s source (e.g. kotlin.List) or lazily on the
source (e.g. kotlin.Sequence). Violin streams are close to Java streams, but
(1) allow incremental (not all-at-once) consumption from the stream, and (2) do
not complect parallelization with stream processing.

Some sequences may be iterated only once, and other multiple times. For example, when you create a sequence like Sequence { createIterator() } it can be iterated multiple times.
Most sequence operations, like map, filter etc, preserve this property. We decided not to expose this property in the sequence type hierarchy, as it would complicate sequence operations significantly.

This is quite not correct: sequences are not wrappers around iterator, but rather a source of (a factory of) iterators.

I’ll try to explain this in more detail.
Iterator is a stateful object since it keeps the state of iteration. While it can be iterated only once, nothing protects you from possibly applying several operations on the same iterator instance. It would go unnoticed until you’ll get incorrect results.

On the other hand Sequence is stateless: each time you begin iteration, it creates a new fresh iterator. And if the sequence doesn’t support iterating multiple times (like those sequences that were obtained by wrapping an existing iterator), it just won’t allow you to get an iterator second time.

So we believe that the sequence is better than the iterator, because it provides more clear abstraction. The same reasoning applies to java 8 streams.

2 Likes

I’m using kotlin since two years now, and still run into troubles whenever I start using sequences.

It might be that the problematic abstraction starts around read-once-semantics and Iterable.

Whenever I get an Iterable (or a Sequence) I assume that I can use that to retrieve an Iterator.

But when that Iterable, due to some internal state, is crashing on calling iterator(), then it’s not an Iterable, because, well… it isn’t iterable (anymore).

It should never crash, but maybe return an empty Iterator.

I think that a new abstraction is necessary around stateful data-sources that lose/consume information everytime you iterate them. Just like InputStream for bytes.

1 Like

Like explained, that abstraction is Sequence (rather than Iterable). Note that a sequence can also wrap an infinite data source (which while possible for iterables is against it intent).

That Sequence fails-fast on multiple iteration is in theory a good thing.

Kotlin is awesome. I have used it for six projects, on the JVM, on Android, on the Browser and on Node.JS.
But our code features very few Sequences, because using them goes like this:

  1. Build a pretty pipeline using Sequence that relies on consume-once data somewhere.
  2. Have a Sequence-can-only-be-iterated-once error.
  3. Isolate the problem by throwing some .toList() in there
  4. Work backwards to regain performance, remove .toList() again
  5. Have back-pressure errors with Sequence not advancing output
  6. Introduce .toList() again to make Sequence process
  7. Need to peek on the Sequence without consuming it.
  8. Have no idea how you’re supposed to do that.
  9. Work around it by riddling the code with .sequenceOf(...) .toIterable() .asSequence()
  10. Realise that you have no idea if some method in the stdlib still copies your data and if so where.
  11. ~ lunch break ~
  12. Decide to rewrite the algorithm using Iterable to understand it better.
  13. Quickly remember that Iterable kinda sucks as well and decide to use Iterator directly.
  14. Finish the original task using less code, time and more peace-of-mind.
3 Likes

To do lazy list evaluations has advantages but also distinct disadvantages. In particular it requires a different way of thinking, and in cases may need “memory”. In your particular case it may be useful to introduce a Sequence.buffer()=toList().asSequence extension method to introduce buffers where needed.

Overall sequences are more abstract than lists. The biggest lesson would be that you don’t want to store sequences, but collections instead. The same holds for Java streams.

I agree that debugging experience is not ideal, i.e. it’s hard to isolate which sequence in a chain causes the problem, however, would the situation be better if the second iteration of these sequences resulted in an empty iterator as you propose? I doubt you’d be ever able to spot the error in that case until it went to the production.

I don’t follow since this point. Could you illustrate what the problem with backpressure is?

You might want use Sequence.onEach function for that. Another option is to introduce the buffer extension as @pdvrieze suggested.

That is a good idea, only that in my case I don’t really need a buffer.

You are right, that suggestion was a thought about Java original Iterable , which might have been less confusing if it’s contract where to never throw.

Some of my use-cases where parsers which take lines or character of a Sequence, go into different states and hand the sequence to the next parser which consumes parts and then returns.

For that to work I need to peek at the Sequence and sometimes remove just the first element. I tried various constructs like .take(1) which I incorrectly assumed to be like .pop().

Maybe it was just that. I had some bugs a couple of times where I assumed that MutableList<*>.drop(1) would drop an element from the list.

Still I haven’t found a concise notation for the problem above:

val myInput: Sequence<String> = ...
val first = myInput.take(1).toList()
when (first) {
   "a" -> handlerA(myInput) 
   else -> handlerB(myInput)
} 

It seems that for that what you want is some sort of a buffered sequence where you can basically split the sequence. You would need to write some custom classes that represent this buffering behaviour by remembering seen elements. Normal sequences don’t support peeking, but your PeekableSequence could by buffering peeked elements and replaying those before getting elements from the original stream.

I ended up implementing a PeekableIterator that simply caches one element. It works perfectly and is way easier to understand.

Maybe sequences are just not for me, I tried reactive-stream-programming in other languages a couple of times, and had the similar issues.