Mapping immutable data class instance


#1

Hi all,

Lets say I have a:

data class Person(val firstName:String, val lastName:String)

val p = Person("Dart", "Vader")

and say I want to make a copy of ‘p’ but with lowercase latters:

val p1 = p.copy(firstName = p.firstName.toLowerCase(), lastName = p.lastName.toLowerCase())

But what if I want to make a universal function or framework which does not know about particular type A and names of its fields? I’d like to have something like:

p.map { k, v ->
 v.toLowerCase()
}

Another interesting case (but having vital difference from previous):

data class Vector<T out> (val x:T, val y:T)

val vDouble = Vector(1.0, 2.0)
val vInt = toIntegerVector(vDouble)

fun toIntegerVector(v:Vector<Double>)  = Vector<Int>(toInt(v.x), toInt(v.y))

to convert vector of doubles to vector of integers, of course, we might implement a specific conversion function, but if would have a possibility to map over type wrapped by other it would be far more generic and provided an additional way to make generalized frameworks. Example:

data class Vector<T out> (val x:T, val y:T)
val vDouble = Vector(1.0, 2.0)
val vInt = vDouble.mapT(::toInt)

This feature may sound strange and too specific, but, in some languages this feature is a corner stone (see Functor, Haskell).


Auto generated extension functions & properties
#2

You can use reflection to implement all of those transformations in Kotlin.


#3

Yes of course, but I always considered reflection as a last resort. And I believe this feature would become popular enough and would you suggest to make its own implementation of such a thing by each Kotlin user? Ok, it could be implemented in some library (which will be known by nobody I guess), but I am worry also about the performance. Am I wrong thinking that implementation using reflection gives additional overhead? (hmm… may be I am just not the best programmer with reflection).


#4

If you worry about performance overhead, you can use bytecode instrumentation to generate the mapping code at compile time, which will be as efficient as what the Kotlin compiler could potentially generate.

And by the way there is already a library that does something quite similar: http://dozer.sourceforge.net


#5

That is an interesting new use-case for Kotlin Serialization. See general discussion here: Kotlin Serialization

In fact, existing Kotlin Serialization prototype is powerful enough so that what you are describing can be implemented with few lines of code. The following code actually works in the prototype compiler with serialization support:

@KSerializable
data class Person(val firstName:String, val lastName:String)

object LowercaseTransformer : ElementValueTransformer() {
    override fun transformStringValue(desc: KSerialClassDesc, index: Int, value: String): String =
            value.toLowerCase()
}

fun main(args: Array<String>) {
    val p = Person("Dart", "Vader")
    println("Original person: $p")
    val q = LowercaseTransformer.transform(p)
    println("Transformed person: $q")
}

Produces

Original person: Person(firstName=Dart, lastName=Vader)
Transformed person: Person(firstName=dart, lastName=vader)

It already works for quite complex structures inside the Person class, including nested data objects, lists, maps, arrays, etc. ElementValueTransformer is a helper class that serializes the object into internal representation (flattened array list of elements) and deserializes it back, while applying the specified transformation. It can be also used to clone serializable objects.

It is just a prototype, though. A usual disclaimer is that, of course, everything is subject to change (and will change).


#6

Yes, it is interesting suggestion. But there are couple moments with such approach. The first one is performance:

It seems additional garbage for GC will be produced (I am fed up with GC on Android).

And another, moment, it seems you are not going to cover case for map Vector<Double> <--> Vector<Int>, i.e. mapping with changing a type of values.

BTW, thank you for the reference!


#7

Yes, it is going to produce additional garbage in the current prototype. It might be possible, that in the future serializer and deserialiser could be chained to directly transfer data to one another without an intermediate data structure to store all values in. Thank you for the idea.

This particular example shows transformation of the class into the instance of the same class, but serialization infrastructure could also be used to transform one class into another, as long as their serial representation is the same or if you also provide a transformation that adapts it. For example, one class has Double fields, while the other has BigDecimal fields and you provide the corresponding transformation.

I cannot prototype your example with a generic Vector class yet, because generic user-defined classes are not yet supported in the current prototype, but I’ll keep it as a use-case.


#8

Reflection has been quite heavily optimised in HotSpot at least. For Android, I don’t know, but you cannot at any rate assume that a reflection based approach is always slow, especially when using some specialised third party languages like reflectasm.


#9

Be sure, I’ll implement this feature for myself in some way. But I posted this feature request in a category “Language design” but not “Library” being sure that the data class can be more convenient in more ways than those scarce generated set of features like (copy(…), componentN), e.g. why don’t to consider data class properties also as a Map? That is a set of key-value pairs? And hence, why don’t to have an ability to make a copy-transform operation by mapping key-value and making the same data class but with other values. I believed that such a feature deserves to be a “first class citizen” of Kotlin lang. We already have a keyword ‘data’ in a data class, which gives us a special benefits, which could be extended in some sensible way.


#10

What you are describing exactly fits the description of serialization, i.e. transforming an object into some external representation and back, in your case a map. Of course, “mapping data class” could be the language feature that is be used to build serialization on top of it. However, it is hard to recover type-safe serialization mechanism out of mapping transformation, so we are working to integrate type-safe serialization into the language instead and then you can use it to map over objects and things of that variety.

With the existing serialization prototype the actual transformation of a class into a map is implemented in just few lines of code:

class MapOutput(val map: MutableMap<String, Any> = mutableMapOf()) : NamedValueOutput() {
    override fun writeNamed(name: String, value: Any) {
        map[name] = value
    }
}

It is all static and type-safe (no reflection). The reverse transformation from a map into a class is defined like this:

class MapInput(val map: Map<String, Any>) : NamedValueInput() {
    override fun readNamed(name: String): Any {
        return map[name]!!
    }
}

Now if you have a serializable `Person’ data class you can just do

val person = Person(...)
val out = MapOutput()
out.save(Person, person)
val map = out.map // that is your resulting map

and back with

val inp = MapInput(map)
val deserializedPerson = inp.load(Person)

without ever touching reflection in the process.


#11

Roman, your serialization framework effort is really cool and valuable! But don’t get me literally. When I saying that data class could be considered as Map, I am not going to construct a real map (garbage). It is like implementing interface Collection, Set or Map by some data structure, it is not necessary to keep internally real map. I would expect to consider data class as a Map or collection of key-value pairs in some more lightweight sense. But! If your serialization framework would cover such a thing it would be great.

And once again, the key word ‘data’ in a ‘data class’ I thought mean that it is maximally friendly to handle program data, for example if I want to store data to the DBMS then in any way I must to enumerate fields of my object.


#12

As you can see from the code, serialization framework does not force anyone to use a map or any other data structure for that matter. It does not care how you actually represent the object. It just lets you enumerate object elements. You give it an a array, and it’ll iterate all array elements for you, you give it a data object, and it’s iterate all its properties for you. I’m really sorry if I’m missing something in your use-case, but I don’t see how it is different from what you are asking for.

For example, you can take the code snippet I’ve given in the above answer and write your own implementation of NamedValueOutput class that does whatever you want in writeNamed function. Serialization framework is just spitting the data elements (i.e. converts the object tree into a serial sequence of writeXXX invocations) and it is absolutely up to you what to do with them.


#13

My point is that my use-case is not so far from data class ‘copy’ function which is generated currently in Kotlin. Why do not to consider that copy function is also as a partial case of your serialization framework? Or even of Dozer library (as some desperate guy advised above)?


#14

Providing map access to the properties in a “data” class is certainly possible, but has a number of issues:

  • Efficient implementation would need some sort of efficient string matching approach. Probably the best would be a sorted array of property names. A binary search would then get the index and the get and set operators would use the retrieved index in a big switch table to access the actual properties. Overall this would be quite heavyweight to be included in all data classes
  • Accessing data members as map elements is fundamentally unsafe from a type perspective.
  • This is clearly a use case that, if applicable, would apply to a minority of data classes.
  • Looking at your original question, what you would need is a copy constructor that applies a transformation on certain types only but doesn’t create garbage. There are only two ways of doing so, either to use reflection(with it’s own garbage), or to generate such a constructor (or free function with the class name) as bytecode. For the generated version you’d still want to avoid type interrogation or intermediate wrappers so the generation code needs to be very flexible. Overall it seems something more fit for a library.

#15

That’s why I prefer the functor like way:

data class Vector<T out> (val x:T, val y:T)
val vDouble = Vector(1.0, 2.0)
val vInt = vDouble.mapT(::toInt)

but this requires support from language to generate an extension function mapT. And even more interesting case is:

 data class FieldPoint2D<T out, V out> (val x:T, val y:T, val value:V)
 val f1 = FieldPoint2D(1.0, 2.0, true)
 val f2 = f1.mapT(::toInt)
 val f3 = f1.mapV(::toInt)

That is mapX generated for each generic type.

I am sorry but this is clearly a subjective statement. Sometimes minority is determining factor.


#16

That’s interesting use-case. I’ll see if this can be supported, too, a part of serialization framework. Don’t be scared by the serialization name. It is just a type-safe data mapping mechanism. It is called serialization because that is what you typically use it for. Hopefully, it will become a part of the language one day and will be quite as efficient as other language constructs. Maybe some other language primitives will let this kind of thing be easily implementable one day, too, without any reflection. Maybe even serialization itself could be efficiently implemented on top of some more general data mapping/lensing mechanism one day.