We don’t currently have an elegant idea on how alternative representations might work. The best we can do in this respect is to support optional fields with defaults that will be used is a field is absent in the serialized representation (if the serialization representation is flexible in that respect like JSON). We also plan to give serailizers access to all the annotations that are defined on serial fields, so if you annotate your fields (properties? elements?) with something like @SinceVersion(3), the you can have your serializer implementation check what version is currently being read/written and skip fields that should not be present in this version.
One feature that is really necessary for using serialisation in secure contexts is ensuring objects can only be resurrected via their constructor and/or public setters. Is the idea that the generated code works this way, or can be made to?
Let me elaborate on security a little bit. There are two kinds of serialization.
- In static serialization you invoke something like
MyClass.load(someInput)
and only classes explicitly and statically referenced to by MyClasse get loaded. There is no reflection or loading classes by name. - In dynamic serialization, like Java Serialization, you invoke something like
someInput.readObject()
. Any class name can appear on stream and it will get dynamically found on class path at run-time and get loaded.
Any dynamic serialization scheme is inherently insecure. There is no way to make it secure by limiting resurrection to constructors and/or public setters only, since in a big application there is always a chance of class somewhere on your classpath that does something weird and even if you limit loaded classes by whitelist, there are still issues. You can google about Java Serialization security issues.
However, dynamic serialization is extremely useful in closed-world settings. Every modern JVM-based big-data distributed-computing framework uses it.
We plan to support both static and dynamic serialization in Kotlin.
Yes, I am familiar with serialisation security thanks. You obviously need to pair ‘dynamic deserialisation’ with a whitelist, some frameworks like Kryo support that already, the default Java framework is getting support for that in Java 9.
Sometimes you don’t know ahead of time exactly what classes might be deserialised, any time you have a plugin architecture where plugins can serialise data into a stream is an example of that. If you allow plugins to extend the whitelists and take other precautions to prevent invalid streams being deserialised, the security of the two approaches ends up similar - put another way dynamic deserialisation is writing the same code that static deserialisation would, but it’s generated just in time instead of ahead of time.
We are a bit into an uncharted terminology here. I’ve labled as static
serialization anything where you explictly know what type you are reading at every point. This is typical in how you usually deserialize JSON into type-safe form, for example. It can be implemented via runtime relection (with Jackson, for example), but, out-of-the-box, Jackson still does static deserialization as it is fully driven by the type definitions in your code.
Deserialization is dynamic if you don’t need to know your types in advance. Out-of-the box Kryo is fully dynamic, unless you explictly configure a whitelist. It is extremely convenient for closed-world applications. It makes Kryo a fine replacement for Java serialization in Spark, for example.
Whitlists do blur the line between two aproaches. Protobuf’s Any
is also on a border-line, even though I’d consider it sitll a fully static deserialization approach, because Any
type does not get deserialized by the protobuf framework, but is kept an an array of bytes for an application code to deserialize if needed.
The approach that I currently pursue with respect to sercurity is to default on the safe side, e.g. make the serialization fully static by default, but still support both pre-compiled deserialization code and run-time (reflection-based) deserialization for 3-rd party library classes that you statically reference.
Dynamic serialization will be supported with an opt-in and you could do either full-world, black-list and/or white-list approaches, so in Kotlin serailzation white-list will be considered a variant of dynamic serialization. Both classes with pre-compiled deserialization code and run-time (reflection-based) deserialization shall be supported.
Of course, relection will be supported only on the platforms that support reflection and reflection always have adverse performance effects, so the primary effort is going to be focused on producing pre-compiled serlialization/deserialization code for all serializable classes.
That sounds like a very reasonable approach.
I’ve been thinking a bit about the problem of flexible/multiple serialization output/input formats. Probably the best way to do that would be to leave the specifics to a user defined handler that could support multiple formats. For simple POJO’s (or annotation only serialization) this would mean that the handler would just get the fields to store (name, type, value) and would do whatever it wants (format v1, format v2, json, xml,…).
As long as there is the ability to do this, no specific support for multiple formats needs to built into the system. There will probably also be serialization formats that are incompatible with others so they would not support multi-format.
A use-case that I’ve ran into is in HTTP (POST) request handling: creating an instance of a form backing bean and populating it with values of the POST request (and applying validation rules), and serializing a form backing bean into HTTP form parameters for making a POST request.
I don’t know if this is a kind of use-case that is considered for the Kotlin seralization framework?
Especially in the former case there will many cases of receiving invalid data, yet where it is extremely useful to have control over the error handling. For instance, in order to be able to report back to the user which fields were invalid and for what reason.
Moreover, it is useful to have access to the invalid object in code.
Right now my form backing classes have nullable fields witth @NotNull validation annotations which looks weird, next to other annotations regarding what data is considered validly formatted.
Controller code handles the invalid form, Spring has a binding-result instance which is populated with the errors and these are rendered in HTML. The controller code however also has an instance of the (invalid) form bean, which is part of the Model, and which is also used to populate the HTML with previously entered values (in same template as a valid instance would be rendered).
There is here a disconnect between compiler-valid and application-valid state of objects, where one needs to make application-invalid states valid for the compiler in order to report on those states in a user-friendly way.
It is not a very huge issue to me, especially since it so far has come up for me only in HTTP form handling.
I do not see a clean solution for such use cases - but if Kotlin serialization could / would address this in a clean way without impacting the application (Spring) error handling, it would be nice.
There might also be other cases where control over the error handling, and having (partially) invalid objects available to application code during error handling, could be useful.
Is there any update to this effort?
Will we see a preview soon?
Thank you
I can confirm that we still plan to implement cross-platform Kotlin Serialization as a part of overall effort into cross-platform (JVM/JS/Native) Kotlin. There is no update nor the timeline we’ve committed to, yet. Stay tuned.
You can play with the prototype implementation as explained here: GitHub - elizarov/KotlinSerializationPrototypePlayground
See README.md in the repo on how to get started and what are limitations. It is very far from being feature-complete and supports only JVM backend, but it shows the general direction.
I’m using the serialization prototype in a sandbox projects of mine. I found it fairly easy to just use gradle with the installed Kotlin 1.1.2-2 – instead of installing a substitute plugin – and add the gradle plugin you provide on the prototype site to the compiler.
There’s just one issue with that: I had to add the gradle plugin to the buildscript dependencies as well to get it to compile, this is not listed in the readme.md, which might get people confused.
Aside from that I think it’s pretty neat to have most of the repetitive stuff generated, implementing the binary output was straightforward!
It seems to be related to the fact that in my project I’m configuring Kotlin via plugin DSL:
plugins {
id "org.jetbrains.kotlin.jvm" version "1.1.2"
}
If you do it this way, then you don’t need to have buildscript
section at all.
I’ve added that info to the readme. Thanks a lot.
I’m wondering, is there a generic way to obtain the KSerializer? I’m
currently detecting it through some magic (basically checking for
primitives and for the companion object to be a KSerializers), but this way
I can’t get to the serializers for Lists and Maps.
There is no direct way for obtaining serialisers for generic classes like List<T>
yet and serialisation of user-defined generic classes is not supported yet either. Here is the planned approach. Assume that you have a generic class and some other serializable class:
@Serializable class MyBox<T>(val value: T)
@Serializable class MyData(val a: Int, val b: Int) // whatever
In order to obtain it’s serialiser for a particular type substitution you’d use a plugin-generated function serializer
on its companion object like this: MyBox.serializer(MyData)
Ideally, we’d like the following to work, too:
val box: MyBox<MyData> = JSON.parse(s)
However, the latter requires quite complex changes into the inner workings of Kotlin inline functions with reified type parameters. See also https://youtrack.jetbrains.com/issue/KT-15992
I shared some ideas about serialisation here: Generated JSON-Serialisation for Kotlin | by Fabian Zeindl | Medium
It’d be great to be able to combine deserialization with immutability and non-nullability in an elegant way. This gets a little ugly at the moment when you add properties to a class over time but need to be able to deal with objects that were serialized before the new properties were added. For example:
class MySerializableClass : Serializable {
// We start off with just this property.
var someValue: String? = null
// Later we add an immutable, non-null property.
val listOfThings: List<Thing> = LinkedList()
// Deserializing an older serialized object without a listOfThings property
// set it to null by default. The way we're supposed to initialize them is
// in a readResolve() method, but we can't.
fun readResolve(): Any {
if (listOfThings == null) { // Warning - listOfThings isn't nullable
listOfThings = LinkedList() // Error - listOfThings is immutable
}
return this
}
}
We end up having to make all newly-added properties nullable and mutable because readResolve()
needs to be able to do a null test and initialize them. This means sprinkling our code with ?.
even though we know the property will, in reality, never be null.
It gets a little uglier: In production code, in order to make it harder for ourselves to forget to deal with older objects as well as to get rid of code duplication, we don’t autoinitialize the property but instead only initialize in our readResolve()
method, which is also called from our init {}
block.
class MySerializableClass : Serializable {
var someValue: String? = null
var listOfThings: List<Thing>? = null
init {
readResolve()
}
fun readResolve(): Any {
if (listOfThings == null) {
listOfThings = LinkedList()
}
return this
}
}
One possible cleaner solution would be some way to mark a method as a post-deserialization initializer, and have the compiler automatically insert code to initialize any properties that don’t already have values. So we’d end up with
class MySerializableClass : Serializable {
var someValue: String? = null
val listOfThings: List<Thing> = LinkedList()
@PostDeserialization
fun readResolve(): Any = this
}
Alternately, or in addition, this could cause the method in question to be treated as an initializer or constructor in cases where you want to do something other than just initialize a property with a simple default value, e.g., if you need to compute the value of a non-nullable transient property that would normally be computed in a constructor.
This is off-topic here (since the thread is about “Kotlin Serialization”), but there is also an open issue on better support of “Java Serialization” in Kotlin that is supposed to somewhat address the problem you’ve outlined https://youtrack.jetbrains.com/issue/KT-14528 Feel free to add your comments there.
Admittedly I used Java-style serialization examples in the hopes of making my comment more concrete, but I actually don’t think the things I’m talking about are specific to Java serialization at all. These seem like they apply generally to any Kotlin serialization system, even when running in JavaScript or native environments:
- You can encounter a serialized representation of an object that lacks one or more non-nullable properties, e.g., because the properties were added recently. They need to be initialized to some non-null value.
- If the serialization system doesn’t call constructors (reasonable since constructors can have side effects) and supports some notion of transient properties, then transient
val
s need to get their values from somewhere. - Ideally you’d like to solve both problems with a minimum of code duplication or boilerplate.
In Kotlin serialization we are generating a “deserializing constructor” that takes care to properly initialize missing fields with the corresponding initializer from the source code.