Language level support for bonafide slices


#1

Coming from the Python/Numpy/Scipy ecosystem, I'm thrilled with how easy it is to pick up Kotlin. The many carefully considered design decisions that have gone into the language make it a joy to program in and, indeed, prototype in (who knew that prototyping in a typesafe language could be fun!). It seems to me that has Kotlin has the potential to be a powerful platform for data science / scientific computing.

One suggestion:

With .. and the possibility of overloading this operator with rangeTo() member/extension functions, Kotlin provides language-level support for ranges. Unfortunately, Kotlin has no language-level support for slices. Such support would go a long way towards making array-oriented programming in Kotlin pleasurable. While it may seem like a slice is nothing but an IntRange, observe that slices can have implicit beginings and endings, whereas intRanges do not. For example, in python,

x=len(foo)/2
print foo[:x]+" “+foo[x:]


introduces a space at the halfway point of the string foo. It’s possible to do the same with an IntRange in Kotlin, but it’s more cumbersome. To wit,

val x = foo.length()/2
println(foo[0…x-1]+” "+foo[x…foo.length()-1])

The cumbersomeness arising from using an IntRange to do the job of a slice really becomes apparent when one engages in array-oriented programming. Here one often uses slices to refer to a subset of the data in a multidimensional array. For example, given a 2d Numpy array X, X[:5, :] is the first five rows of X, X[:, 5] is the sixth column of X, and X[:, ::-1] is X with the order of its columns reversed. A would-be author of “NumKot2d”, a hypothetical 2d Numpy equivalent library in Kotlin, can easily overload the get() and set() operators in Kotlin to implement a basic 2 dimensional array like so:

class FloatArray2d(val rows : Int, val cols : Int){
  val arr = FloatArray(rows * cols)

  fun get(r : Int, c : Int) : Float{
  return  arr[r * cols + c ]
  }

  fun set(r : Int, c : Int, value: Float) {
  arr[r * cols + c] = value
  }

Because Kotlin currently lacks language-level support for slices, NumKot2d might use IntRanges to support array oriented programming, which would lead to unfortunate expressions like X[0…4, 0…X.cols-1] to denote the first five rows of X, and X[0…X.rows-1, 5] to denote to the sixth column of X. Here is X with the order of its columns reversed: X[0…X.rows-1, X.cols-1 downTo 0]. Cumbersome and ugly, no? NumKot2d may, of course, sidestep the use of IntRanges altogether by providing the following class  

class Slice(val start : Int? = null, val end : Int? = null, val : incr : Int? = null)


This would allow for expression like X[Slice(end=5), slice()] for the first five rows of X, X[Slice(), 5] for the sixth column of X and X[Slice(), Slice(incr=-1)] for X with its columns reversed. There are two problems here:

  1. Slicing becomes a library level feature instead of a language level feature (different libraries may implement slicing differently, leading to confusion)
  2. The expressions above, while a step up, fall short of the succinctness and clarity of X[:5, :], X[:, 5] and X[: ,::-1]. For more points of comparison, visit https://github.com/scalanlp/breeze/wiki/Linear-Algebra-Cheat-Sheet#indexing-and-slicing and compare the expressions in the Breeze column with the expressions in the numpy column.


My concrete suggestion: Language-level support for end-exclusive slicing with the : operator in python/numpy is pleasing to the eye, works superbly with 0 indexed arrays and 0 indexed multidimensional arrays, and I daresay, is a significant reason for the popularity of numpy/scipy/python amongst data scientists. Just copy it!

Two additional points:

  1. Given that Kotlin is a 0-based-indexing language the “end-inclusivity” of ranges seems quite problematic. When setting up loops over arrays I often find myself being off-by-one. If Kotlin implements language-level support for (end-exclusive) slicing it can have end-exclusivity where it’s important (in array indexing) and end-inclusivity where it’s needed (wherever that may be).
  2. I admire the restraint of Kotlin’s designers. So if it’s a choice between language level support for slicing and language level support for ranges, I hope you favor the former. Python does not have language-level support for ranges (hence the range() builtin), and seems to be none the worse for it.


#2

Hi,

I’m very glad you like Kotlin. Thanks.

Due to syntactic reasons it’s rather hard for us to copy Python’s slicing syntax: colon (":") is already used for types, and this is hard to change.
What we could do, probably, is introduce unary ranging operations, so that one could write “…5” and “5…”, and these are not IntRanges, but structures coveying the fact that only one end is fixed. And it will be “x[…5]” to get a prefix of an array. The bare “:” can be emulated by a constant of some sort, maybe we can even make bare “…” a constant, not sure abot this, but may be.

I can’t say that I fully understand your point about a language feature being better than a library feature. Building things into the language is irreversible and gives no room for improvement, unlike libraries.

Speaking of inclusivity and off-by one problems: it’s largely a matter of habit. So it may be that we were wrong choosing ranges to be inclusive, maybe we should reconsider that.

Also, please take a look at this example that shows how to write a library that emulates slicing in Kotlin.


#3

Here are some reasons why end-exclusivity (aka half open intervals) and 0-based indexing work well together: http://gestaltrevision.be/wiki/python/zerobased. And, FWIW,  here's what Guido has to say on the matter: https://plus.google.com/115212051037621986145/posts/YTUxbXYZyfi. Speaking as someone who works with multidimensional 0-based arrays day in and day out, I can attest that end-exclusivity makes my life a lot easier.

Regardless of whether Kotlin implements language level support for end-exclusive slicing, I think we can agree that .. is the wrong tool for the job given its conventional use in mathematics to express end-inclusive ranges. I think we can also agree that the … operator does not lend itself to concise expressions of the increment, especially when the increment is negative. This can make the slicing of multidimensional arrays using … rather unweildy.

Due to syntactic reasons it’s rather hard for us to copy Python’s slicing syntax: colon (":") is already used for types, and this is hard to change.

":" has a history of being used to represent slices http://en.wikipedia.org/wiki/Array_slicing and thus has a readymade audience. From a purely syntactic perspective, i.e. discounting implementational details, there’s no room for confusion if “:” is only interpreted as a slice operator when it occurs in an argument of a get() call. In Python, for example, list(1,2,3,4)[:2] works, whereas x=2:3 throws an error. I understand, of course, that the special handling of get() argments introduces a “wrinkle” into the language, and that wrinkles are to be avoided as far as possible. In this case, might the cost be worth the benefit?

I can’t say that I fully understand your point about a language feature being better than a library feature. Building things into the language is irreversible and gives no room for improvement, unlike libraries.

I guess I subscribe to the philosophy that when it comes to a feature as fundamental as slicing, there should be one–and preferably only one–obvious way to do it, and, if necessary, that way should be baked into the language.


#4

See also: http://devnet.jetbrains.com/thread/455158


#5

Why don't you just copy Ruby style for ranges. Ruby has range operators ".." and "...". .. stands for inclusive end item, and ... is exclusive on end item.

So one could write “…5” and “5…” and also “…5” and “5…”.
As this this might be good when using indexes, this could be enhanced by adding another operator that could use length, eg. “..", co one could write "..5” and “5._.” meaning first 5 and last 5 elements, respectively.


#6

It's one of the options we are considering


#7

Any update on changing ranges to be end-exclusive? Languages like Rust, Python, and Scala to name a few all have end-exclusive ranges (Scala also has end-inclusive, but they have to be explicitly specified). They fit much better with 0-based indexing, as well as with libraries written for Java.

The most obvious example is iterating over a collection’s size, and having to specify 0..size - 1 instead of just size.


#8

You can write collection.indices to solve that.


#9

Well they did add until for this as in

0 until size

is equivalent to:

0 .. size - 1

#10

Thanks @mikehearn. I do realize that, but I was also referring to the .. construct in general.

@dalewking Thanks, I didn’t know there as the until infix operator. I suppose that solves it then.

What about support for slicing, though? Because that’s where .. would come into play.


#11

I would say that the Kotlin way favors more using Sequence than creating something like slices at least for slices that you are only reading from.

For eaxample,

myArray.take(5)

Is a sequence of the first 5 elements of the array analagous to myArray[:5]

and

myArray.drop(5)

is a sequence of elements after the first 5, analgous to myArray[5:]

If you want an actual new array that is a copy of the orginal array you can use something like:

 myArray.copyOf(5)
 myArray.copyOfRange(5, myArray.size)

The copyOfRange really should default the end parameter to the size of the array.

But since Kotlin has to interoperate with the JVM, there is no way to create a slice that interoperates with Java as an array.

It would certainly not be that difficult to write some slice methods.


#12

Makes a lot of sense. Thanks!