Kotlin Design decision for having duplicate fields/primitive in decompiled version

I recently stumbled onto a subtle potential Kotlin problem. Let me play out the problem for you.

Imagine my code goes like this:

sealed class SuperClassOrchestrator(
    open val medicineName: String,
    protected val dummyField: Int = 0
){
    fun orchestrate() {
        println("SuperClassOrchestrator: ${this.medicineName}")
        orchestrateComponent()
    }
    abstract fun orchestrateComponent()
}

data class SubClassComponent(override val medicineName: String) : SuperClassOrchestrator(medicineName+"__SUFFIX"){
    override fun orchestrateComponent() {
        println("SubClassComponent: $medicineName")
        println("SubClassComponent: ${super.medicineName}")
    }
}

If I try running this statement:

SubClassComponent("Complex Medicine Name").orchestrate()

it prints the following:

SuperClassOrchestrator: Complex Medicine Name
SubClassComponent: Complex Medicine Name
SubClassComponent: Complex Medicine Name__SUFFIX

This goes on to prove that Kotlin is maintaining two copies of common fields. The following decompiled class file supports that hypothesis even further. [ Removed unrelated code for convenience e.g. metadata/imports etc].

public abstract class SuperClassOrchestrator {
  @NotNull
  private final String medicineName;
  
  private final int dummyField;      //why is this always private? & not protected when explicitly mentioned?
  
  private SuperClassOrchestrator(String medicineName, int dummyField) {
    this.medicineName = medicineName;
    this.dummyField = dummyField;
  }
  
  @NotNull
  public String getMedicineName() {
    return this.medicineName;
  }
  
  protected final int getDummyField() {
    return this.dummyField;
  }
  
  public final void orchestrate() {
    String str = "SuperClassOrchestrator: " + getMedicineName();
    boolean bool = false;
    System.out.println(str);
    orchestrateComponent();
  }
  
  public abstract void orchestrateComponent();
}

& the decompiled subclass version goes like this:

public final class SubClassComponent extends SuperClassOrchestrator {
  @NotNull
  private final String medicineName;  //why do we need this? It's supposed to be inherited
  
  public SubClassComponent(@NotNull String medicineName) {
    super(medicineName + "__SUFFIX", 0, 2, null);
    this.medicineName = medicineName;               //why do we need this? It's supposed to be inherited
  }
  
  @NotNull
  public String getMedicineName() {
    return this.medicineName;
  }
  
  public void orchestrateComponent() {
    String str = "SubClassComponent: " + getMedicineName();
    boolean bool = false;
    System.out.println(str);
    str = "SubClassComponent: " + super.getMedicineName();
    bool = false;
    System.out.println(str);
  }
}

Why it did not sit well with me:

  1. In subclass, even when I am explicitly telling override, Kotlin just ignores that anyways, & ends up creating new private field. It feels like oxymoron to me, override should not need creation of private fields.

  2. If Kotlin ends up maintaining two copies of same field when it should not, it has a slim chance of inconsistency like shown in above example, which was never the case in Java/any OO languages.

  3. Even if we declare a field as protected val e.g. in our superclass, Kotlin ends up creating private final anysways, but makes the getter protected!!! This feels counter-intuitive since code gives an impression of field overriding, what Kotlin ends up doing is Getter method overriding.

How it hurts us:

Any library we use, which is heavily reliant on reflection, won’t be able to discern the fact that Kotlin maintains copies of common fields. To give an example, when we tried this with Swagger, the common fields showed up in both Super class as well all Subclasses.

There’s another scenario where I can’t easily reason about the design choices. e.g. if we’re using Long, I was expecting Long wrapper class to appear in the decompiled version, but it didn’t.

Screenshot 2022-02-23 at 4.31.01 PM

Which is not the case if we use Long? in Kotlin. Then it falls backs to using Long wrapper class in decompiled version as depicted in above screen cap.

This leads to another interoperability problem, e.g. for an api contract in Spring boot

  1. accepts null value as well, even Kotlin type is Long which is not nullable
  2. Spring deserializer converts it to ZERO, since it heavily uses reflection & finds long primitive type instead of wrapper class [IMHO]

Now in these scenarios, the possible solutions could be using Kotlin flavor of 3rd party libraries or so. But that feels like half measure. Anyways, the answer I’m looking for are of the following questions:

What was reasoning behind these design choices [explained in above scenarios] ? What were the trade-offs for it? Was it a technical limitation which forced Kotlin to take that path?

I think all your confusion originates from a single misconception: you assumed properties are fields. But in fact, they are methods. Fields are involved only optionally, as a “side-effect” of using properties with backing fields.

This is why override overrides methods, not fields. This is also why protected affects the getter, not the field. In Kotlin, we are not really supposed to share fields between classes, but getters/setters.

3 Likes

You are explicitly telling it to create a new private field by declaring the val without a get(). That’s the default getter implementation.

It’s not inconsistent and you can do the exact same thing in Java. Consider this example where SubFoo.getMyField() returns “sub” and SubFoo.getSuperMyField() returns “super”.

    public static class SuperFoo {
        private String myField = "super";
        public String getMyField() { return myField; }
    }
    public static class SubFoo extends SuperFoo {
        private String myField = "sub";
        @Override
        public String getMyField() { return myField; }
        public String getSuperMyField() { return super.getMyField(); }
    }

It’s not possible to override fields or change the code to access them. They are a simple fast value access. That’s why Java best practice is to never expose them publicly and wrap them with getters. That’s why Kotlin’s properties exist, so you are forced to work only with the getter and the field if needed is never directly accessible outside a get/set definition for the property.

Same thing can happen in Java. If you declare private fields with the same name in the super and sub classes.

I don’t believe it’s possible to represent a non-nullable boxed primitive type in Kotlin. You may want to ask about this Spring boot/Long issue in it’s own post and then someone with some Spring boot experience can maybe help you out.

1 Like

The official Kotlin website claims:

“Compatible with the Java ecosystem. Use your favourite JVM frameworks and libraries”

which led me to believe it’s a drop in replacement i.e. if I have a Kotlin code base & Spring/Java libraries, I won’t encounter any behavioural difference.

Extending that thought, is it okay to summarize that we need to live with following limitations if we have to implement a class diagram like this using sealed class,

  1. Kotlin internally ends up maintaining N copies of fields, albeit redundant, in parent as well as subclasses. & We Should overlook these fields as implementation details under the hood of Kotlin, & we should focus only on property/getter/setters?

  2. Kotlin & Java libraries are Not drop in replacement in the above sense of maintaining two fields under the hood. If the java library is using java notion of protected field, hence looking for only one parent field, we’ll encounter into these kind of issues e.g. seeing multiple copies of same fields in swagger UI.

  3. To get around these problems, we should wait for Kotlin flavour of those libraries, which should bridge the gap.

P.S. I’ve kept sealed class constraint deliberately, since that construct gives us the power of ADTs.

The line above should read (split for readability):

data class SubClassComponent(medicineName: String) :
      SuperClassOrchestrator(medicineName+"__SUFFIX") {

Now the SubClassComponent does not declare a property (so there will not be a field). The constructor simply passes its argument (after a transformation) to the constructor of SuperClassOrchestrator.

1 Like

data class SubClassComponent(medicineName: String) :
SuperClassOrchestrator(medicineName+"__SUFFIX")

Is it supported? I thought it will complain with compilation error: “Data class primary constructor must have only property (val / var) parameters”

We need data class to leverage its equals & hashCode method.

You are right. Sorry, I almost never use data classes.

I think that properties expressed in primary constructor will hold a field by default. But modifying SuperClassOrchestrator to use an abstract property instead should solve the problem.

Something like this :

sealed class SuperClassOrchestrator(protected val dummyField: Int = 0) {
    abstract val medicineName: String
    ...
}

Decompiles to this :

public abstract class SuperClassOrchestrator {
   private final int dummyField;

   @NotNull
   public abstract String getMedicineName();
}

And I think this is clearer. It explicit the fact that base class won’t hold any value, that 's the role of the implementations to provide value.

With that, subclass definition is done more simply like that :

data class SubClassComponent(override val medicineName: String) : SuperClassOrchestrator() {
    override fun orchestrateComponent() {
        println("SubClassComponent: $medicineName")
    }
}

This attacks the problem from a different angle & solves it. It feels like pushing down the field to sub classes. If you look at any inheritance class diagram, it gives the idea of

  • parent class is having a ownership of parent/shared fields
  • sub classes can inherit them as their own, but they don’t keep a copy of same field.

If we follow abstract property approach, are we not drifting away from the semantics of inheritance class diagram?

The problem is that data classes, like records in java, are not designed to share state with a parent. They’re designed to represent an “independent” data structure / data model.

You can see that clearly in the decompiled equals method of your sub-class component : it does not use dummyField, and it does not call super.equals method (even if you implement it), so testing equality of two dataclasses completely ignores state inherited from parents (and can produce very suprising/wrong behavior).

How data classes/records should be represented in a UML class diagram is not very clear to me, but I recon this is puzzling.

But, if we step back a little, we’ll see that the main reason to inherit state in languages like java and C++ is mostly to avoid duplicating field definition code, but with kotlin syntax, this kind of burden is greatly reduced, and we can stay closer to the “inherit behavior, not state” credo (if you consider getters a behavior, and backing fields state, which is not very intuitive).

Sometimes, I wonder if classes diagrams should be tweaked/derived to something closer to an “API interaction diagram”.

Edit: when you look closer, you’ll notice that the property is still declared on the super class, even if its implementation is fixed in the data class. In a sense, it does not really change the UML definition : subclass still inherit from parent property, it just provides its own implementation for it.

2 Likes

Honestly, I don’t understand your point about duplicated fields. I never had to do it in my work. Could you provide any use case where this is needed? Because from my experience there are mostly 3 cases:

  1. We implement the property in base class and only use it in the subclass (but not override its implementation).
  2. We provide abstraction of the property in the base class and implement it in the subclass.
  3. We implement the property in the base class and we would like to somehow change its behavior in the subclass, e.g. print some logs on prop access, change the value on the fly, etc.

These are typical use cases no matter if we use Kotlin or Java. And all of them are achievable in both languages. You try to do something different: store the field in both base class and subclass. This is also doable in both languages, but what is the point in it? Besides some really rare cases.

Regarding the compatibility with Java. Kotlin has its own “ways” to do things, so in some cases what is natural in Kotlin will be unnatural in Java and vice versa. Notable examples are no static members and no checked exceptions in Kotlin. Therefore, Kotlin is not fully compatible with Java implicitly. But it provides tools to explicitly tune things up to make it more compatible with Java (but less Kotlin-ish).

For example, we can annotate a property with @JvmField to make it work like a field - getters and setters won’t be generated and protected will affect the visibility of the field. Similarly, we can use @JvmStatic to create static members. And so on. Still, I doubt we can produce every and possible bytecode that is doable using Java.

Regarding frameworks: well, even in Java there was always (at least for last 15 years) a recommendation to not share fields, but getters and setters. We create private fields and public methods for looong time. For frameworks like dependency injection, serialization, etc. it is also discouraged to use fields directly, but getters/setters or constructor params. I believe most modern frameworks can be configured to use accessor methods instead of fields. If we configure them like this, they should be fine with the Kotlin code.

1 Like

I was not aware of this. But I can totally resonate with that thought process, to stream line everything with accessor instead of fields. But there’re following points which still baffles me a bit:

  1. What did Kotlin stand to lose if my “SubClassComponent” didn’t have the shared field along with its accessors altogether? it’ll still follow the principle of inheriting behaviors/accessors, right?

    In other words, what could’ve triggered Kotlin to have an extra backing field in subclass? Because the subclass definition looks like:

    override val medicineName: String

which kind of explicitly tells to override a field accessor, not to have own copy of it in subclass. I believe “val” instructs kotlin to create a field with accessor, but they’re contradicting each other, isn’t it? If I’m overriding it, why would I need to make copy the backing field, & have another set of accessor in subclass? Also, what use case it’ll be serving to have own backing fields/accessor combo, apart from being independent?

If we follow the principle of data class being independent, shouldn’t Kotlin report compilation error when “data class” & “override” & “open val” keywords are used in combination? Because override mean we’ve introduced a dependency on a parent component.

OR should I interpret override little differently in Kotlin world? “override” keyword dictates to override the accessors, & since field sharing is prohibited in Kotlin, our subclass has to introduce the backing field as well.

Defining a property explicitly requested this. When you define a non-abstract property and don’t implement getters/setters manually, this is like saying:

Hi Kotlin, I would like to have a getter (setter) for my data, so please generate it for me and store the data somewhere in the class (in practice - in a field)

When you override such property, again without providing getters/setters, you again ask to generate getters/setters and place the data somewhere in the class. So this is like saying:

Hey Kotlin, I know the base class already stores this data somewhere, but I would like to override this behavior and store the data differently and separately from it.

In other words: fields in Kotlin are considered internal, implementation details of the class. Other classes should not be interested in them.

This is exactly my question. I don’t see too many use cases for this and this is why it is very rare to override properties like you did above. Even open non-abstract properties are rather rare. Usually, props are either final or abstract (by "final` I mean non-overridable).

I think not. Data classes are a specific case. Usually, it is better to keep them simple, so we should be careful with adding too much behavior or extending from other classes. But I think it shouldn’t be entirely forbidden. Extending could be still useful. We should just not overuse it.

2 Likes

Okay. Now I see why things are the way it is in Kotlin. Let me take a stab at summarizing the key takeaways for entire discussion:

So, ideally sealed classes are supposed to have abstract properties defined as mentioned in another discussion:

  1. Kotlin Conversions [Kotlin → bytecode] :

    “Long”–> “long”, “Long?” → Long
    “String” → “String”, “String?” → String

    Non nullable Long gets converted to long, since it doesn’t need to hold null values, & long java primitive type is exact candidate for that. For other classes, we don’t have that leverage, hence it converts to same class.

  2. The problem of swagger (java lib) reporting same fields both in super & sub class still might persist even when we’re using “abstract val” in base classes. But that’s the price we’ve to pay because:

  1. Extending the previous point, that’s exactly why Spring Deserializer [Another java Library] sees “long” in bytecode & erroneously transform incoming “null” value to default of “long”, whereas the expected behaviour was to fail, since Kotlin code is using non-nullable type.

  2. Using “abstract val” in sealed class is most recommended, to stay on the right side of things. Otherwise you’ll end up having two backing fields which won’t have much use case in real life scenarios.

  3. Kotlin doesn’t encourage sharing fields. It promotes sharing properties/accessors


if we want to address those java lib related problems, either

  1. Wait for/use kotlin flavor of those libraries
  2. Hook into the extension points of those Java libraries, & place your own workaround.

Thank you folks for all the insights. Learnt a few intricacies of Kotlin/java. If you want to add/modify something to the list of takeaways OR have some other concerns, please feel free to post.

1 Like

I don’t really see how this is related to sealed classes. Or even to Kotlin. This is a generic OOP thing.

No matter if we use Kotlin, Java or C++, depending on the use case we may prefer to create abstract accessors in the base class and implement them in the subclass. Or implement them in the base class and use them (but not override) in the subclass. Both these cases are doable in Kotlin and in Java. Kotlin doesn’t force and doesn’t encourage you to use one over another. Also, in both Kotlin and Java we can duplicate fields if we want. We just don’t do this very often. I don’t think Kotlin differs in that matter in any way from other OOP languages.

There are more cases when non-nullable long will be boxed, e.g. generics.

Why is that? Abstract val doesn’t generate any fields.

Or see if you can reconfigure them to use accessors instead of fields. Many such libraries actually do this by default.

Thank you for pulling me in right direction. I see your point now.

What I meant was, Kotlin “Long” gets compiled to a byte code having “long”. In a broad sense, it’s unboxing while compilation happens.

The reason is, while it is alleviating the problem of reporting duplicate fields both in parent & subclass, but it’s doing so by pushing the fields in subclasses.

In other words, swagger schema UI’s removing common field from parent construct, & pushing to all of it’s subclasses. Now while this won’t have any functional implications to it, I was coming from the thought process of an UML class diagram, which correctly depicts the generalized fields in parent class, & specialized field in subclass.

When we have abstract fields in parent class, I feel swagger library won’t be able to find that field in parent class (bytecode) anymore, hence will report zero fields in parent class.

Swagger can have different artefacts out of it. We can create Postman collection out of it, as well as generate some code stubs/client generator. So, I was focussing on client generator part of it & was under the following assumptions:

  1. Swagger should give mirror image semantics of what is represented by class diagram.
  2. When someone generates client out of it, client code should mimic exactly same object hierarchy with generalized fields in parent class & so forth.
1 Like