Proposal for a future release: GC-friendly structures

Regarding to the "structure" proposal, I just remembered how I think about it when I was designing my old language. Maybe in kotlin 2.0 or 3.0, could be planed something like this.

In games, the GC pressure is pretty bad, because it can break the user experience so bad. So you have to do lots of optimizations, and in critical paths you will probably totally avoid using overrided operators that creates objects.  For example to add a displacement to a point, and you will end doing it component by component.

Here is a proposal of how it is possible to create a struct concept similar to C# that would totally avoid creating objects for structures.
C# structures have the following properties: constant size, can’t inherit, their fields types are limited to primitives and other structures, so no pointers/reference types at all, are not part of the GC, because you pass always by value instead of by reference.

In order to be able to use it, you need to solve some problems:

  • How to store it inside a class
  • How to store it as a method local
  • How to pass it to functions
  • How to return it from functions and how to store them in arrays

And always without allocations or just with constant allocations per thread.

------------------------------------------------------------------------------------------------------------------
Example
struct Point(x:Int, y:Int) {
  // pass struct to function, and return from functions
  operator fun plus(that:Point) = Point(this.x + that.x, this.y + that.y)
}

class View {
  var position = Point(0, 0) // store struct inside a class

  fun setPosition(x:Int, y:Int) {
  this.position = Point(x, y) // mutate field
  this.position.x++ // mutate field in struct
  }
}

fun test() {
  // Calling and storing in “a” local
  val out = Point(1, 2) + Point(3, 4)

  // Array storage
  val arrayOfPoints = arrayOf(out, Point(5, 6))
}


Transformation:

// Helper
object StructHelper {
  // Optimizable pseudocode example (in thread storage, a ByteBuffer for a stack or a object pool):
  //@JvmStatic val structStack = ThreadLocal<ByteBuffer> { }
  //@JvmStatic val structStackIndex = ThreadLocal<Int> { }
}

// pass struct to function, and return from functions:
// :: each struct argument expand its fields recursively (including other structs). That means that you cannot use vararg with structs.
// :: instead of returning an object, stores it in the ByteBuffer or extracts a mutable Point object from the TLS stack, mutates it and the caller handles it.
// with the ByteBuffer approach you are just creating one object, with the ObjectPool you are just creating as many objects as the max call depth returning Points.
fun Point_plus(this_x:Int, this_y:Int, that_x:Int, that_y:Int):Point {
  val temp_x = this_x + that_x
  val temp_y = this_y + that_y
  val out = tlsPointPool.pull()
  out.x = temp_x
  out.y = temp_y
  return out
}

class View {
  // store struct inside a class
  // expanded fields
  var position_x = 0
  var position_y = 0

  fun setPosition(x:Int, y:Int) {
  // mutate field
  this.position_x = x
  this.position_y = y
  this.position.x++ // mutate field in struct
  }
}

fun test() {
  // Calling and storing in “a” local
  val temp1_x = 1
  val temp1_y = 2
  val temp2_x = 3
  val temp2_y = 4
  val temp = Point_plus(temp1_x, temp1_y, temp2_x, temp2_y) // Or just inlined instead of using temps
  // We are moving data to locals from the returned object (or reading memory in the tls ByteBuffer stack case)
  val out_x = temp.x
  val out_y = temp.y
  // We are restoring the object to the pool just after copying it to locals, in the case of the ByteBuffer stack we would increment the stack pointer again
  tlsPointPool.push(temp)

  // Array storage
  // Arrays here are just ByteBuffer internally
  // Store them means to
  class GenericStructArray(var length:Int, val elementSize:Int) {
  val data = ByteBuffer(length * elementSize)
  fun getOffset(offset:Int):Int = offset * elementSize
  }
  fun Point_WriteArray(array:GenericStructArray, index:Int, x:Int, y:Int) {
  val offset = arraygetOffset(index)
  array.data.setInt(offset + 0, x)
  array.data.setInt(offset + 4, y)
  }
  val arrayOfPoints = GenericStructArray(1, elementSize = 8) // two 32-bit ints: 8 bytes
  Point_WriteArray(arrayOfPoints, 0, out.x, out.y)
  Point_WriteArray(arrayOfPoints, 1, 5, 6)
  // Reading from the array would be the same as calling a function returning a structure (using the pool or the bytebuffer stack)
}
------------------------------------------------------------------------------------------------------------------

A java-friendly proposal:

In order to simplify the callee, making it unaware of the object pool, and making it compatible with java, we could do this instead:

fun Point.plus(this_x:Int, this_y:Int, that_x:Int, that_y:Int, out:Point = Point()):Point {
  val temp_x = this_x + that_x
  val temp_y = this_y + that_y
  out.x = temp_x
  out.y = temp_y
  return out
}

or

// Last argument is a mutable memory holder that will hold the result, if not specified it will alloc a new object, if specified a preallocated mutable object it will use it
fun Point.plus(a:Point, b:Point, out:Point = Point()):Point {
  out.x = a.x + b.x
  out.y = a.y + b.y
  return out
}


In as3 we used to do this: an immutable-like API, that allowed us to use a static preallocated temp object, or a per instance preallocated temp object depending on the life of the object.
In this case the language could use a Pool instead, and provide that preallocated object internally. And make it available to Java so in Java anyone could provide an object to holder t he result or a null that would allocate the object.
So the caller would be something similar to this:

val temp1 = TlsPointPool.pull()
val temp2 = TlsPointPool.pull()
val temp3 = TlsPointPool.pull()
temp1.setTo(1, 2)
temp2.setTo(3, 4)
Point.plus(temp1, temp2, temp3)
val temp3_x = temp3.x
val temp3_y = temp3.y
TlsPointPool.push(temp3)
TlsPointPool.push(temp2)
TlsPointPool.push(temp1)

Building pools into the language sounds like a risky thing to do, otherwise it's an interesting approach to consider.

Great :) For structures is possible to totally avoid the ObjectPool, but you will still need one instance per thread:

Since you can do field expansion:

// Callee
fun Point.add(this_x:Double, this_y:Double, that_x:Double, that_y:Double, out:Point):Point {
  out.x = this_x + that_x
  out.y = this_y + that_y
  return out
}

// Caller
val temp = Point.add(1, 2, 3, 4, Point.$tlsInterchangeInstance$)
val temp_x = temp.x
val temp_y = temp.y

val temp2 = Point.add(1, 2, 3, 4, Point.$tlsInterchangeInstance$)
val temp2_x = temp2.x
val temp2_y = temp2.y


Just after every call, the next thing you do is to copy that temp instance fields into locals. And since that object is per thread, anybody else is going to change it before you preserve it into your own locals. So it will work even with several recursive calls using that single tls instance.
In Javascript it would be just a static access, since webworkers have their own heap and it is single-threaded.

For games, being able to use structures with operator overloading without creating garbage is just great. Also with SIMD.js it would be possible to detect some operations and convert into SIMD operations.

Inlining those operations would totally avoid the requirement of accessing that tls instance, and probably it would enable automatically SIMD operations in the case JVM supports them, or when converting to C++ using robovm, that uses LLVM internally and has SIMD optimizations:

// Callee
inline fun Point.add(this_x:Double, this_y:Double, that_x:Double, that_y:Double, out:Point) {
  out.x = this_x + that_x
  out.y = this_y + that_y
}

// Caller
////// INLINED: val temp = Point.add(1, 2, 3, 4, Point.$tlsInterchangeInstance$) // inlined
val temp_x = 1 + 2
val temp_y = 3 + 4

////// INLINED: val temp2 = Point.add(1, 2, 3, 4, Point.$tlsInterchangeInstance$) // inlined
val temp2_x = 1 + 2
val temp2_y = 3 + 4

Bear in mind that Java 10 (hopefully) will support value types for things like Point, Complex, Pair, etc.

By then there will be still android devices with android 4.4 running dalvik with a java 6 api. Maybe by then, new devices are running dart. One of the greatest features of kotlin is being pragmatic and being able to target java6, the most widespread java version, while bringing features like lambda expressions.

Yes, my point is that any mechanism should ideally involve emulating Valhalla value types on older JVMs rather than having an entirely separate mechanism. Otherwise Kotlin would end up with two different value type-esque mechanisms, which would suck.

I suspect that rather than a Kotlin specific feature, the right way to tackle this is a global program optimisation, as on Android there’s no dynamic code loading so you can do that sort of thing.

For instance, every time you see a type that is tagged as a value type, the bytecode rewriter simply inlines all the entries as individual variables into whatever context it’s used, and rewrites all the method prototypes as well. No need for TLS pools or object re-use as it’s all on the stack. Arrays could likewise be decomposed into several arrays for each fields.

Of course, you lose the moment you put such a type into a generic container, which is a big part of what Valhalla is fixing. But for Point3D type structs, perhaps it does not matter much.

The nice thing about doing it all at the bytecode level is … works for other languages too. Not just Kotlin.

I see. I understand your point. I don't like too much post-processing bytecode stuff, but that could probably work. Do you know if there are efforts in that direction? Also that optimizer won't be that safe, since ref types behave different that value types. How would that know that you won't modify that instance in other place?

Value types would have to be checked to ensure they have only final fields.

I am unaware of any attempts to do this. It may be an interesting research project. Bear in mind that escape analysis (and partial escape analysis, in Graal) do auto-scalarisation: they convert objects into variables on the stack, effectively, when the object can be proven to not escape. So another approach would be to try and re-use the PEA code from Graal (it’s written in java) and figure out what the compiler would have scalarised, then apply the same thing at the bytecode level. Thus teaching escape analysis to older JVMs that don’t know how to do it themselves, like Dalvik. But that’d be an advanced project.

1 Like