Coming from the Python/Numpy/Scipy ecosystem, I'm thrilled with how easy it is to pick up Kotlin. The many carefully considered design decisions that have gone into the language make it a joy to program in and, indeed, prototype in (who knew that prototyping in a typesafe language could be fun!). It seems to me that has Kotlin has the potential to be a powerful platform for data science / scientific computing.
One suggestion:
With ..
and the possibility of overloading this operator with rangeTo() member/extension functions, Kotlin provides language-level support for ranges. Unfortunately, Kotlin has no language-level support for slices. Such support would go a long way towards making array-oriented programming in Kotlin pleasurable. While it may seem like a slice is nothing but an IntRange, observe that slices can have implicit beginings and endings, whereas intRanges do not. For example, in python,
x=len(foo)/2
print foo[:x]+" “+foo[x:]
introduces a space at the halfway point of the string foo. It’s possible to do the same with an IntRange in Kotlin, but it’s more cumbersome. To wit,
val x = foo.length()/2
println(foo[0…x-1]+” "+foo[x…foo.length()-1])
The cumbersomeness arising from using an IntRange to do the job of a slice really becomes apparent when one engages in array-oriented programming. Here one often uses slices to refer to a subset of the data in a multidimensional array. For example, given a 2d Numpy array X, X[:5, :] is the first five rows of X, X[:, 5] is the sixth column of X, and X[:, ::-1] is X with the order of its columns reversed. A would-be author of “NumKot2d”, a hypothetical 2d Numpy equivalent library in Kotlin, can easily overload the get() and set() operators in Kotlin to implement a basic 2 dimensional array like so:
class FloatArray2d(val rows : Int, val cols : Int){
val arr = FloatArray(rows * cols)fun get(r : Int, c : Int) : Float{
return arr[r * cols + c ]
}fun set(r : Int, c : Int, value: Float) {
arr[r * cols + c] = value
}
Because Kotlin currently lacks language-level support for slices, NumKot2d might use IntRanges to support array oriented programming, which would lead to unfortunate expressions like X[0…4, 0…X.cols-1] to denote the first five rows of X, and X[0…X.rows-1, 5] to denote to the sixth column of X. Here is X with the order of its columns reversed: X[0…X.rows-1, X.cols-1 downTo 0]. Cumbersome and ugly, no? NumKot2d may, of course, sidestep the use of IntRanges altogether by providing the following class
class Slice(val start : Int? = null, val end : Int? = null, val : incr : Int? = null)
This would allow for expression like X[Slice(end=5), slice()] for the first five rows of X, X[Slice(), 5] for the sixth column of X and X[Slice(), Slice(incr=-1)] for X with its columns reversed. There are two problems here:
- Slicing becomes a library level feature instead of a language level feature (different libraries may implement slicing differently, leading to confusion)
- The expressions above, while a step up, fall short of the succinctness and clarity of X[:5, :], X[:, 5] and X[: ,::-1]. For more points of comparison, visit https://github.com/scalanlp/breeze/wiki/Linear-Algebra-Cheat-Sheet#indexing-and-slicing and compare the expressions in the Breeze column with the expressions in the numpy column.
My concrete suggestion: Language-level support for end-exclusive slicing with the : operator in python/numpy is pleasing to the eye, works superbly with 0 indexed arrays and 0 indexed multidimensional arrays, and I daresay, is a significant reason for the popularity of numpy/scipy/python amongst data scientists. Just copy it!
Two additional points:
- Given that Kotlin is a 0-based-indexing language the “end-inclusivity” of ranges seems quite problematic. When setting up loops over arrays I often find myself being off-by-one. If Kotlin implements language-level support for (end-exclusive) slicing it can have end-exclusivity where it’s important (in array indexing) and end-inclusivity where it’s needed (wherever that may be).
- I admire the restraint of Kotlin’s designers. So if it’s a choice between language level support for slicing and language level support for ranges, I hope you favor the former. Python does not have language-level support for ranges (hence the range() builtin), and seems to be none the worse for it.