[PoC/library] Generic Primitive Arrays!

Jolanrensen · June 4, 2021, 7:56pm

On the JVM, it is oftentimes most preferable to use primitive arrays (IntArray, DoubleArray etc.) instead of “normal” arrays (Array, Array etc.) simply because of speed and memory overhead.
For example, on my laptop, reversing 10,000,000 sized Array takes ten times as long as reversing a similar sized DoubleArray.
However, say you want to create something like a new array reverse function, but you want any type of array to work with it. You could create something like:

fun <T> reverse(array: Array<T>) { ... }

However, this only accepts “normal” arrays, not primitive ones.
In fact, since primitive arrays don’t inherit from any other class, the only way to allow all types of primitive arrays in a function is to ask for Any and throw an exception when a non-primitive array is provided. Or you simply create a function for each type of primitive array (which is exactly what the Kotlin standard library is doing).
I thought that this had to be better, so I created this: GitHub - Jolanrensen/KotlinGenericPrimitiveArrays: Provides a wrapper around all primitive arrays using a sealed interface and value class as to provide a generic way to work with any type of generic array. All the 1.5.10 standard library functions are included as well.
It’s just a proof of concept for now and a lot of the documentation is missing (it’s not even a library, just a kotlin file), but bear with me.

PrimitiveArray<T : Comparable<T>> is a sealed interface where the only implementations are the value classes: PrimitiveDoubleArray, PrimitiveIntArray, etc.
Since these implementations are value classes (previously inline classes), you won’t have any object allocation after compile time and since they have one common ancestor, a function can simply ask for a primitive array of any type. These classes (and the interface) have all the functionality of arrays you’d expect and more; I ported over all Kotlin’s Array extension functions to PrimitiveArray (docs will follow) and since I could make the interface implement anything I wanted, I also made it implement Collection, this also makes them Iterable.

Small example:

fun <T : Comparable<T>> reverseArray(array: PrimitiveArray<T>) {...}

val myArray: PrimitiveDoubleArray = doubleArrayOf(1, 2, 3, 4).asPrimitiveArray()
val myOtherArray: PrimitiveBooleanArray = booleanArrayOf(true, false, false, false).asPrimitiveArray()

reverseArray(myArray)
reverseArray(myOtherArray)

It needs to be tested and maybe I’ve missed some functions, so let me know if you find anything that’s missing or if you have any tips!

kyay10 · June 4, 2021, 11:04pm

I honestly really, truly admire the idea, and it’ll absolutely work in the future. But currently, whenever you cast a @JvmInline value class as its superinterface, it actually gets automatically boxed . This’ll absolutely be optimised in the future (I hope), but right now your best bet is to find (or create) a compiler plugin that auto-optimises that for you. From the docs:

As a rule of thumb, inline classes are boxed whenever they are used as another type.

interface I

@JvmInline
value class Foo(val i: Int) : I

fun asInline(f: Foo) {}
fun <T> asGeneric(x: T) {}
fun asInterface(i: I) {}
fun asNullable(i: Foo?) {}

fun <T> id(x: T): T = x

fun main() {
    val f = Foo(42)

    asInline(f)    // unboxed: used as Foo itself
    asGeneric(f)   // boxed: used as generic type T
    asInterface(f) // boxed: used as type I
    asNullable(f)  // boxed: used as Foo?, which is different from Foo

    // below, 'f' first is boxed (while being passed to 'id') and then unboxed (when returned from 'id')
    // In the end, 'c' contains unboxed representation (just '42'), as 'f'
    val c = id(f)
}

If you’re curious, I’m currently working on a compiler plugin for a partially-related reason (some advanced lambda inlining that can lead to zero-cost idiomatic structs and other niceties KT-44530) and part of the goal of that plugin requires optimising value classes that are casted as their interfaces, and so I’ll be implementing that very soon. Stay on the lookout if you want, but yeah I’m very sorry to be the one to break the news about this unfortunate missing optimisation…

darksnake · June 5, 2021, 12:25pm

KMath uses similar approach. Still it is not a panacea for several reasons:

The drop of performance you observe is due to boxing of primitives in an array. Your approach does not actually help with boxing, it just allows JVM to use aggressive unboxing more easily. It will work fast only if the type of generic is known in compile-time.
What you need to be wrapped is not array, but a List interface since Array is mostly discouraged in Kotlin outside of platform interop. Again, you can simplify aggressive unboxing, but it won’t work in general case and does not provided truly generic behavior.

There were a number of studies of this problem by JB team and our team as well. Right now there are not universal solutions. There are several possible ways to solve this. For example this issue.

darksnake · June 5, 2021, 12:27pm

As always, the discussion is welcome in mathematics channel in Kotlin slack.

Jolanrensen · June 5, 2021, 1:17pm

It’s a value class of a primitive array I’m using, not a value class of a primitive type, so boxing is not happening inside the array. The speed benefits of my approach are actually measurable, so it does make a difference!
It is interesting to see your work as well though! I did not know this is what happened when you create a value class of a primitive type, so good to know!

Jolanrensen · June 5, 2021, 1:24pm

I see! The DoubleBuffer, IntBuffer etc are almost exactly the same as mine, although mine has some more helper functions.
You’re right that it is not truly generic, for instance a PrimitiveArray<String> won’t work. But that’s okay, since PrimitiveArray is a sealed interface, no one can create a PrimitiveArray<String> implementation. Only the instances I created are possible.

darksnake · June 5, 2021, 1:54pm

Buffers in KMath do not implement Iterable by design because there is a plus operator already present on iterable and it is not what we want. But my point was not about String, but about PrimitiveArray. The thing is that as soon as you don’t know what specific type is there inside during compile time, you won’t get performance benefits anymore. So basically you just change DoubleArray to PrimitiveArray<Double> and it works the same.

The inheritance hierarchy for arrays is one of the few design flows in kotlin. Array should have inherited read-only lists. Still, it could not be solved by a library.

By the way, have you tried to check performance with specialized ArrayLists (like ArrayList<Double>)?

kyay10 · June 5, 2021, 4:50pm

Oh so your concept is to ensure that the primitives inside the array aren’t boxed, I thought you needed the actual array object itself to not be boxed, too. Well fair enough then, I love this approach to it, but yeah as others said maybe making a List equivalent could be useful

Jolanrensen · June 5, 2021, 7:37pm

Yes! I’ll probably change it to implement List as List also implements Collection and Iterable right?

Jolanrensen · June 7, 2021, 11:33am

Alright, it now implements List as well. I did choose to not make it a mutable list for now, because for that the array would have to shrink and grow, like an ArrayList. So no add() and remove() functions, but you can still get() and set() just like any other array.

Jolanrensen · June 7, 2021, 11:50am

Can you give an example where this would happen? Because in all cases I can come up with, since PrimitiveArray is a sealed, you never lose the performance benefits. It can only be created with a single type where a primitive array is used to store and access the data.
Like my reverse function example, you still get the performance benefits, even if you use a generic type.
Aside from more easily breaking the JVM heap space, ArrayLists seem to be just as slow as Object Arrays I found, so yes, PrimitiveArray is still a lot faster.

darksnake · June 7, 2021, 1:01pm

Use non-specialized PrimitiveArray like val array: PrimitiveArray<Number> = doubleArrayOf()...
Check the generated bytecode. If you have Double.valueOf() in it, you still have everything boxed, VM just optimizes it away.

Jolanrensen · June 7, 2021, 1:23pm

Alright! First of all, typing PrimitiveArray<Number> is currently not possible at all, because Number does not implement Comparable, which PrimitiveArray currently requires. If I take out that requirement I get this:

val base: DoubleArray = doubleArrayOf(1.0, 2.0, 3.0)
val primitiveArray: PrimitiveArray<out Number> = base.asPrimitiveArray()
val test: Number = primitiveArray[0]

(out seems to be needed to make it compile)
Compiled and decompiled this looks like:

double[] base = new double[]{1.0D, 2.0D, 3.0D};
int $i$f$asPrimitiveArray = false;
PrimitiveArray primitiveArray = PrimitiveDoubleArray.box-impl(PrimitiveDoubleArray.constructor-impl(base));
Number var8 = (Number)primitiveArray.get(0);

So it seems that when you use a non-specialized one and you want to take a supertype of a value out of the array, it takes the current one out and upcasts it. So no Double.valueof().

darksnake · June 7, 2021, 1:27pm

I am not saying create a number array but use a generic one. The (Number) cast actually performs boxing. It is better, than ArrayList since it boxes the value on read, not on write and this box could be optimized by the JVM more easily, but it still is VM-level optimization.

The problem is that in general, you need generic ancestor in order to do generic operations, but once they are generic, they are no longer optimized. I agree, that all array should have a common ancestor though.

Jolanrensen · June 7, 2021, 1:57pm

Well yes, that is true, but you don’t always have to cast to a generic ancestor right?

Small example, I was using a Longest Common Substring implementation, but I needed it both for Chars and Ints. The implementation uses an array for its data storage and the user needs to supply this array (since it’s calculated in place). It used to be just a CharArray, but I didn’t want to change it to Array<T> since that would create an Object array when running and thus be slower. With PrimitiveArrays I can simply provide the LCS implementation a PrimitiveCharArray or with a PrimitiveIntArray and it works as fast as creating 2 separate LCS implementations, one with a CharArray and one with an IntArray. No casting occurs since LCS<Char>() only accepts PrimitiveArray<Char> as arguments (where thanks to the sealed interface PrimitiveCharArray is the only option) and returns a PrimitiveArray<Char> as result.

This means you can reuse the same code for any type of primitive array, since each of the primitive arrays thus now have a common ancestor, the sealed interface PrimitiveArray.

elizarov · June 8, 2021, 12:00pm

The solution that we could implement in a foreseeable future is described here KEEP/value-classes.md at master · Kotlin/KEEP · GitHub

Jolanrensen · June 8, 2021, 12:51pm

Very interesting read!
Excited to see what Project Valhalla will bring onto the table.
Until then, my PrimitiveArray interface gets very close to what is described as VArray in the KEEP. The big difference being that VArray<Color> could compile to int[] which I cannot do due to the way value classes currently work.
I also noticed in my version, where for example filterTo does not use a reified T, the bytecode shows a .doubleValue() meaning it still gets boxed. So like the KEEP says, it should only accept reified values and thus ideally, classes should also allow reified types.

pdvrieze · June 8, 2021, 1:03pm

It may be that it is possible to use invokedynamic with on-demand code generation to alleviate some of the issues with value type arrays. (of course invokedynamic requires at least JVM 7). Of course arrays are broken/limited in many ways on the JVM (even non-primitive ones).
The ability to have inline secondary constructors would be a nice syntactic sugar, but in the meantime it is easy enough to implement them as operator invoke on the companion (which the implementation would have to look like anyway), those also have the added benefit that they don’t need to invoke the constructor as first part of the invocation. The only place where it would be different is in inheritance - separating out creation and initialisation.

Jolanrensen · June 8, 2021, 1:59pm

Alright, all functions now use reified T’s as well

Topic		Replies	Views
Use primitive arrays in place of Array<T> Support	22	3241	July 12, 2022
Proposal for primitive generics Language Design	19	5925	June 17, 2018
Why do the primitive array classes need to be exposed? Language Design	6	4499	February 24, 2016
Support mapNotNull for primitive arrays? Libraries	3	1198	May 18, 2019
How do you extend java methods with primitive types (as parameter or return type)? Support	5	3671	April 3, 2018

[PoC/library] Generic Primitive Arrays!

Related topics