NPEs when overriding vals, but not in constructor / init block

While researching this problem, I found out that it can occur when you try to access an abstract property in the constructor (or the init block) of the abstract superclass. However, in my case, the NPE occured in a regular function call (from the superclass).

So, here’s the problem:

I’m currently working on a custom KXS format called NBT, which is a binary format that also has a stringified version called sNBT. The encoding / decoding should be able to handle both NBT and sNBT.

I decided that the architecture should consist of three categories:

  • Nodes: classes that can hold names and values of different types
  • Encoders: object declarations that can encode Nodes to ByteArray
  • Decoders: object declarations that can decode ByteArrays to Nodes

The NBT format specifies 13 so-called tag types, and for each one there’s a node class, an encoder and a decoder. All three have a property called type, which is an enum for all 13 different tag types.

The nodes, encoders and decoders all have one sealed superclass defining type as a public abstract val. Each sub-class or -object overrides it with its corresponding type.

One intricacy: The decoding functions are defined in the superclass, because payload decoding is abstracted away and the rest can be common code which refers to some abstract vals.

Now, when I call decode() on, for example, a CompoundTagDecoder, the superclass implementation will be executed, but it yields an NPE when trying to access type (which is NOT nullable).

Here’s some relevant code:

  1. The Decoder superclass, NBTTagDecoder<T>:
public sealed class NBTTagDecoder<T> {
    internal companion object {
        internal val snbtRegex = "([a-zA-Z0-9\\s]+):(.*)".toRegex()
    }

    public abstract val type: TagType // *1
    public abstract val payloadDecoder: TagPayloadDecoder<T>

    public open fun decode(bytes: ByteArray, offset: Int): OffsetResult<out NBTTagNode<T>> {
        val (name, intermediateOffset) = getName(bytes, offset) // *3
        val (data, newOffset) = payloadDecoder.decode(bytes, intermediateOffset)
        return OffsetResult(nodeSupplier(name, data), newOffset)
    }

    public open fun decodeSNBT(str: String, offset: Int): OffsetResult<out NBTTagNode<T>> {
        val res = snbtRegex.find(str, offset) ?: throw SNBTDecodingException(type, str.substring(offset))
        val (data, newOffset) = payloadDecoder.decodeSNBT(res.groupValues[2], 0)
        return OffsetResult(nodeSupplier(res.groupValues[1], data), res.range.first + newOffset)
    }

    private fun getName(bytes: ByteArray, offset: Int): OffsetResult<String> {
        if (bytes[offset] != type.id) throw NBTDecodingException(type, bytes, offset) // *2
        val stringLen = (bytes[offset + 1].toInt() shl 8) or bytes[offset + 2].toInt()
        val intermediateOffset = offset + 3 + stringLen
        return OffsetResult(
            bytes.sliceArray(offset + 3 until intermediateOffset).decodeToString(),
            intermediateOffset
        )
    }

    internal abstract val nodeSupplier: (String, T) -> NBTTagNode<T>
}

Markers:
*1: Here, the abstract val is defined
*2: In this private member function, a NPE is thrown by the expression type.id
*3: Here’s where that private member function gets called

  1. The entry call:
CompoundTagDecoder.decode(compound, 0)

Note that CompoundTagDecoder doesn’t override decode(), this just refers to the superclass impl. The compound variable, in this case, is of the type CompoundTagNode.

  1. The compound decoder object, CompoundTagDecoder:
public object CompoundTagDecoder : NBTTagDecoder<Collection<NBTTagNode<*>>>() {
    override val type: TagType = TagType.COMPOUND
    override val payloadDecoder: CompositePayloadDecoder = CompositePayloadDecoder
    override val nodeSupplier: (String, Collection<NBTTagNode<*>>) -> NBTTagNode<Collection<NBTTagNode<*>>> = ::CompoundTagNode
}

As you can see, the type property is initialized statically with the TagType.COMPOUND enum value.

Intuitively, all of this just feels so wrong. I have objects which should be initialized very early on. They themselves have overridden properties which are initialized with constant terms. And then, somewhere after some code already ran (like the creation of the compound node), one of those overridden properties is null, even though it is not nullable?

I don’t know whether that matters, but the TagType also references back on the encoders and decoders (for various complicated reasons):

public enum class TagType(
    public val id: Byte,
    public val displayName: String,
    public val encoder: NBTTagEncoder<*>,
    public val decoder: NBTTagDecoder<*>,
    internal val heuristicSize: Int,
) {
    END(0, "TAG_End", EndTagEncoder, EndTagDecoder, 0),
    BYTE(1, "TAG_Byte", ByteTagEncoder, ByteTagDecoder, 1),
    SHORT(2, "TAG_Short", ShortTagEncoder, ShortTagDecoder, 2),
    INT(3, "TAG_Int", IntTagEncoder, IntTagDecoder, 4),
    LONG(4, "TAG_Long", LongTagEncoder, LongTagDecoder, 8),
    FLOAT(5, "TAG_Float", FloatTagEncoder, FloatTagDecoder, 4),
    DOUBLE(6, "TAG_Double", DoubleTagEncoder, DoubleTagDecoder, 8),
    BYTEARRAY(7, "TAG_Byte_Array", ByteArrayTagEncoder, ByteArrayTagDecoder, 128),
    STRING(8, "TAG_String", StringTagEncoder, StringTagDecoder, 128),
    LIST(9, "TAG_List", ListTagEncoder, ListTagDecoder, 256),
    COMPOUND(10, "TAG_Compound", CompoundTagEncoder, CompoundTagDecoder, 1024),
    INTARRAY(11, "TAG_Int_Array", IntArrayTagEncoder, IntArrayTagDecoder, 128 * 4),
    LONGARRAY(12, "TAG_Long_Array", LongArrayTagEncoder, LongArrayTagDecoder, 128 * 8);

    // ...
}

Also, here’s the NPE message, although it’s quite trivial:

Exception in thread "main" java.lang.NullPointerException: Cannot invoke "msw.extras.nbtlin.tree.TagType.getId()" because the return value of "msw.extras.nbtlin.tree.decoding.tag.NBTTagDecoder.getType()" is null
	at msw.extras.nbtlin.tree.decoding.tag.NBTTagDecoder.getName(NBTTagDecoder.kt:31)
	at msw.extras.nbtlin.tree.decoding.tag.NBTTagDecoder.decode(NBTTagDecoder.kt:19)
	at TestKt.main(test.kt:68)
	at TestKt.main(test.kt)

Note that there’s no mention of CompoundTagDecoder in the stack trace, since all called functions are defined in the superclass, NBTTagDecoder.

Can anyone help me to understand this behaviour? For me, it just makes no sense at all. And much more importantly: How do I fix this? Currently, I am not able to decode anything because of this problem.

Thanks in advance!

Info: The entire codebase is common multiplatform code, compiled with Kotlin 1.4.30, with language level 1.5 enabled (preview features, like package-wide sealed class hierarchies). The (NPE-yielding) test was run on the JVM, using OpenJDK 15

Yes, it does and your best bet is to redesign it :wink:
Consider this simplified runnable example:

enum class SomeEnum(val something: Any) {
    ITEM(SomeObject)
}

object SomeObject {
    val item = SomeEnum.ITEM
}

fun main() {
    println(SomeEnum.ITEM.something)
    println(SomeObject.item)
}

Simplifying, the initialization logic goes as follows:
SomeEnum is initialzed first. It has the reference to SomeObject, so while the initialization of SomeEnum is not yet finished, we need to initialize SomeObject.
SomeObject however has reference to SomeEnum, so we need to initialize it first. However, its initialization is already ongoing, and it cannot be started again. This leads to using null instead of uninitialized enum value.

This is a known problem - see some related issues:
https://youtrack.jetbrains.com/issue/KT-44634
https://youtrack.jetbrains.com/issue/KT-10455

Recent article about circular references:
https://blog.haroldadmin.com/circular-refs-kotlin/

2 Likes

Try

override val type: TagType get() = TagType.COMPOUND

This moves the reference to TagType.COMPOUND out of initialization.

Thank you so much for your helpful and comprehensive answer! It makes sense to me now.

I will attempt a redesign, probably the way @nickallendev suggested (thank you for that).

Do you know whether this current behaviour of circular references is intended behaviour or whether it is a bug? I think at least having a compiler warning / error would be helpful at this point in order to not break null safety. Also, it would theoretically be possible to initialize circular references properly, but the real question is whether this should even work…

Does anyone have any insight on the design choices around this issue?