`Sequence.sorted*` methods


#1

I was surprised to find the following methods in the standard library in the kotlin.sequences package:

  • sorted
  • sortedBy
  • sortedByDescending
  • sortedDescending
  • sortedWith

These methods convert a sequence to a mutable list, sort the list, and then return a sequence based on the iterator of the sequence. This will cause performance/memory issues when used on large/infinite sequences. Can these be deprecated or even removed or am I missing something?

It seems to me that they should not belong and that if someone wants to sort a sequence they need to explicitly convert the sequence to a list and then sort it. Are there other methods on Sequence that won’t work on large/infinite instances?


#2

It should be noted that all these operations are lazy, i.e. the sequence isn’t materialized and sorted until an iterator is requested from it.

A plenty of them: toList, groupBy, all/any, partition, forEach, etc — all terminating methods that involve sequence iteration will not complete normally when invoked on an infinite sequence.


#3

Right, I guess I meant to ask whether or not there were other methods that took a Sequence and then return a Sequence that internally convert to a list. toList, groupBy, all/any, partition, etc. naturally exhaust the sequence as their names imply such and forEach may be infinite if the sequence is infinite, again, this is intuitive.

I find it odd though that sorted takes a Sequence and returns a Sequence as this is impossible. In order to sort a sequence you must convert it to a list but some may overlook this. Can Sequence.sorted be changed to return a List instead? This would be safer in my opinion as it would become clear from the signature the potential performance/memory impact of calling such a method and then it really just becomes a shortcut for sequence.toList().sorted().


#4

Another function with the similar behavior is Sequence.minus(Sequence). When an iterator is obtained from the resulting sequence the second sequence is materialized to set and then that set is used to filter lazily the first sequence.

Some time before release we had Sequence.sortToList or something like this, we had removed it as it was quite cumbersome and that naming didn’t scale well to cover all of sorted overloads.

Changing the return type is not an option now, after the release.


#5

Another sequence operation that can involve significant overhead is distinct (or distinctBy), which involves a Set internally.

Such operations can be described as stateful, as they retain state regarding earlier elements that is used when processing a given element (and therefore trigger consumption of the source of the sequence and potentially involve internal overhead). Stateless operations can process elements without regard to the number or value of any earlier elements (and therefore do not trigger consumption of the source of the sequence and do not involve significant internal overhead).

In the official documentation for the java stream APIs, all intermediate operations are categorised as stateful or stateless. This categorisation is important for the developer because it indicates whether the operation is cheap or potentially expensive (especially if the stream is being processed in parallel). This documentation also defines intermediate vs terminal operations, stateful vs stateless intermediate operations and the significant of them.

It is perhaps worth noting that the JDK 8 design includes stateful intermediate operations that return streams (similar to the design decision in kotlin to include stateful intermediate operations that return a sequence).

Could something similar be added to the documentation for the kotlin.sequences package?

For example, definitions of intermediate vs terminal operations and stateful vs stateless intermediate operations (as well as indicating in the method level documentation what type each operation is).


#6

Good idea, I think we could categorize each sequence operation as stateful, stateless, or terminal.


#7

We’ve improved docs for sequence operations in this regard in 1.1.2, see for example this section and linked pages: https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.sequences/index.html#classification-of-sequences