Suppose you want to do the very common operation of applying a filter and a mapping to a list, and you want the code to be very efficient. The most obvious first solution is to write:
val processedItemNames = items
.filter { it.isProcessed }
.map { it.name }
This is the most concise code, but with a catch that both filter
and map
create new lists. After the inlining is done, the code becomes:
val processedItems = ArrayList<Item>()
for (element in items) if (element.isProcessed) processedItems.add(element)
val processedItemNames = ArrayList<String>(processedItems.collectionSizeOrDefault(10))
for (item in processedItems)
processedItemNames.add(item.name)
However, what if we used sequences instead?
val processedItemNames = items
.asSequence()
.filter { it.isProcessed }
.map { it.name }
.toList()
This will no longer create the intermediate list! However, the price is that the operations are no longer inlined, because the Sequence
object needs to be prepared to execute the lambdas at arbitrary times. Under the hood, this is what the operations actually look like:
val itemsSequence = Sequence { items.iterator() }
val processedItemsSequence = FilteringSequence(itemsSequence, true) { it.isProcessed }
val processedItemsNamesSequence = TransformingSequence(processedItemsSequence) { it.name }
val processedItemsNames = ArrayList<String>()
for (item in processedItemsNamesSequence) {
processedItemsNames.add(item)
}
Neither of these solutions reach what we would likely consider the most performant approach:
val processedItemNames = ArrayList<Item>(items.size.coerceAtMost(10))
for (item in items) {
if (item.isProcessed) {
processedItemNames.add(item)
}
}
Here, we create exactly one additional list, and there are no created lambda objects, and only a single Iterator
. However, the code doesn’t look nearly as nice to read or write, and is not as immediately understood to be correct.
And yes, we could use filterNotNull
:
val processedItemNames = items
.filterNotNull { item -> item.takeIf { it.isProcessed }?.name }
However, that applies only to the specific example of filter
followed by map
when there are otherwise no nulls, and not to the general case of sequence and list operations, of which there are many.
I think that one of the goals of the language should be to reduce the conflict between writing performant code and writing understandable code as much as possible. So, how could we solve this problem in Kotlin? It may require some new constructs.
Suppose we had the following:
inline interface Stream<T> {
fun forEach(operation: (T) -> Unit)
}
inline class IterableStream<T>(private val iterable: Iterable<T>): Stream<T> {
override fun forEach(operation: (T) -> Unit) {
iterable.forEach(operation)
}
}
inline class FilteringStream<T>(private val sourceStream: Stream<T>, private inline val predicate: (T) -> Boolean): Stream<T> {
override fun forEach(operation: (T) -> Unit) {
sourceStream.forEach { item: T ->
if (predicate(item)) {
operation(item)
}
}
}
}
inline class TransformingStream<T, R>(private val sourceStream: Stream<T>, private inline val transformer: (T) -> R): Stream<R> {
override fun forEach(operation: (R) -> Unit) {
sourceStream.forEach { item: T ->
operation(transformer(item))
}
}
}
inline fun <T> Iterable<T>.asStream() = IterableStream(this)
inline fun <T> Stream<T>.filter(predicate: (T) -> Boolean) = FilteringStream(this, predicate)
inline fun <T, R> Stream<T>.map(transformer: (T) -> R) = TransformingStream(this, transformer)
inline fun <T> Stream<T>.toList() {
val list = ArrayList<T>()
forEach { item -> list.add(item) }
}
So, it’s a complicated proposal, but what’s the result? Take an implementation by Stream
s:
val processedItemNames = items
.asStream()
.filter { it.isProcessed }
.map { it.name }
.toList()
The first thing we do is inline asStream
, filter
, and map
:
val processedItemNames = (TransformingStream(FilteringStream(IterableStream(items)) { it.isProcessed }) { it.name }).toList()
Then toList
:
val processedItemNames = ArrayList<String>()
(TransformingStream(FilteringStream(IterableStream(items)) { it.isProcessed }) { it.name }).forEach { item -> processedItemNames.add(item) }
Then TransformingStream
:
val processedItemNames = ArrayList<String>()
(FilteringStream(IterableStream(items)) { it.isProcessed }).forEach { item -> processedItemNames.add(item.name) }
Then FilteringStream
:
val processedItemNames = ArrayList<String>()
(IterableStream(items)).forEach { item -> if (item.isProcessed) processedItemNames.add(item.name) }
Then finally IterableStream
:
val processedItemNames = ArrayList<String>()
items.forEach { item -> if (item.isProcessed) processedItemNames.add(item.name) }
We now arrive at close to the optimal code! (The size information could be improved by adding a sizeRange
field, but that would complicate the example.) However, this did require inlining more than what is currently possible in Kotlin, so we would need some more language features.
The inline
operator on an interface indicates that all implementations must be inline
, and the operator is used on all methods that deal with inline classes that contain multiple values. This is because there would be no standard way to pass these objects around as objects, so instead, they don’t exist as objects at all. The FilteringStream
and TransformingStream
constructs exist only in Kotlin, and not in bytecode. This entire concept only works if all information about each Stream
is known at compilation time, so one may not pass any of these Stream
s as arguments or write extension methods on them, unless they’re inline
. Many of the names could be improved, but I think the general concept would be a major improvement towards how we currently deal with collection and sequence operators, to combine both lazy evaluation and inlining.