Json serialization with limited memory consumption

I have a simple decoder

    val json = Json {
        ignoreUnknownKeys = true
    }

return jsonString?.let { json.decodeFromString<List<T>>(it) }

that works well until I got an error:

Fatal Exception: java.lang.OutOfMemoryError
Failed to allocate a 33554448 byte allocation with 6291456 free bytes and 18MB until OOM, target footprint 37003824, growth limit 50331648

So, I tried to use a stream:

        jsonString?.let {
            ByteArrayInputStream(it.toByteArray()).use { inputStream ->
                return json.decodeFromStream<List<T>>(inputStream)
            }
        }

But Json can’t be parsed partially:

json can’t be parsed in stream:
kotlinx.serialization.json.internal.JsonDecodingException: Unexpected JSON token at offset 8046: Expected quotation mark ‘"’, but had ‘e’ instead

As I see, decodeFromStream is a native method for Json. So, how to use it properly?

I don’t see how using an InputStream would be any better here. How big is the JSON? If it is really big, like e.g. hundreds of megabytes, then you may need to process one item at a time.

My app’s minSDK is 21, so a lot of low-end Android devices could be used. The error I got is from an autotest with this device params:

RAM free: 1.21 GB
Disk free: 630.43 MB

And my biggest JSON is about 30Mb as for now. Is it possible to process it partially without splitting into smaller files? What about other formats like YAML etc.?

Yes, it is possible, but this is usually less convenient to do and requires more work than simply mapping to an object. You need to look for a “streaming parser”, for example jackson supports this: https://www.baeldung.com/jackson-streaming-api . Maybe it would be possible to stream subsequent json objects one at a time while mapping them to a class - but I don’t know, I never tried that. Also, it would probably make sense to not store the JSON string itself in the memory, but parse it while reading from a file or network.

Before trying the streaming parser, you could also try to parse to a native JsonArray/JsonObject or something similar - not to your own class. Maybe it is lighter on memory, e.g. it only keep pointers to the data in the json string.

Does it mean Kotlin’s serialization has a lack of such functionality? What does decodeFromStream do then? My file is a compressed json, so I need to decompress it first - that’s why it should be stored in memory anyway. Yes, it’s possible to save it into a temp file and free some memory before deserialize, but does it worth it?

Since you’re producing a list, I’m assuming that the JSON you’re parsing is an array. If so, I believe Json.decodeBufferedSourceToSequence is what you’re looking for (you’d have to use a minimal amount of okio to create a buffered source, but that’s trivial). From the docs:

Transforms the given source into lazily deserialized sequence of elements of type T using UTF-8 encoding and deserializer. Unlike decodeFromBufferedSource, source is allowed to have more than one element, separated as format declares.

Elements must all be of type T. Elements are parsed lazily when resulting Sequence is evaluated. Resulting sequence is tied to the stream and can be evaluated only once.

1 Like

What do you mean?

(you’d have to use a minimal amount of okio to create a buffered source, but that’s trivial)

The JSON file is already downloaded and can’t be split.

I suppose they meant to look into the documentation of okio and figure it out yourself. Come on, it takes 5 minutes of reading to find out how to do it :slight_smile:

And okio works with both network and files, so it should probably fit here.

1 Like

How is it compressed? You can decompress zip and gzip formats while streaming (using ZipInputStream and GzipInputStream), without storing either the compressed or decompressed data fully in memory.

Yes, I looked into the manual but was not sure if it’s not about HTTP request because of the okio lib. Kotlinz serialization have decodeToSequence - is it useless here?

I use Inflater/Deflater. The problem is not with the decompression but with deserialization of the final json string. I tried to deserialize it buffered like in the first post but the problem is - JSON format became broken when you subtract a random amount of chars as a buffer.
Currently I changed the input from String to ByteArray, maybe it’ll provide a little less memory consumption.

Ahh, right, as the kotlinx.serialization has a similar decodeToSequence function, then I guess this is the way to go - we don’t need okio.

Regarding the compression. You can’t just cut the data into random pieces and hope it will work. You need to decompress by streaming. I believe Inflater can’t do that, you need to use InflaterInputStream or other similar utils (as mentioned by @gidds), depending of your compression format.

I didn’t notice before that decodeToSequence existed. That’s the way to go then, since it’s clear that it uses the same mechanism that decodeBufferedSourceToSequence uses. Of course, if you ever wanna go multiplatform, okio would be the way to go. Okio is just a general IO library, and so it has nothing to do with HTTP or networking.

1 Like

I already do that. But still it’s not possible to decode a random chunk. I thought, maybe decodeToSequence / decodeBufferedSourceToSequence can do it?

        val output = ByteArrayOutputStream()
        val buffer = ByteArray(1024)

        while (!inflater.finished()) {
            val count = inflater.inflate(buffer)
            output.write(buffer, 0, count)
        }
        inflater.end()
        return output

Sorry, I think I’m incompatible with you :wink: I tell you you can’t pick random chunks and you can’t use Inflater, but instead you should stream using InflaterInputStream. You say you already do this, and you show a sample code where you
 use Inflater and you say something about picking random chunks.

Also, if I read the above code correctly, this is not really processing chunk by chunk - it decompresses the whole file to the memory before deserializing it.

We mean something along lines:

Json.decodeToSequence<T>(InflaterInputStream(FileInputStream(filename)))

It provides a sequence of items. You can process them one by one and it will read the data straight from the file, decompressing on-the-fly while you consume items. It should keep a very low memory profile, assuming you don’t accumulate these items in the memory, and assuming there is a big number of small items and not a small number of huge items.

For this to work the file has to be just JSON compressed with deflate algorithm - it won’t work with zip, gzip, etc. I mention this, because storing deflated files isn’t an usual way to store data.

Of course, the above example is just an example - you should close streams, etc.

Alternatively, you can read the data in chunks, but you would have to design some kind of a chunked file format and then chunk properly while writing the file.

6 Likes

Didn’t remember it has InflaterInputStream. But found I also used this before:

    fun decompress(bytes: ByteArray): ByteArrayOutputStream {
        val os = ByteArrayOutputStream()
        InflaterOutputStream(os).use { it.write(bytes) }
        return os
    }

Anyway, it still can’t parse on the fly:

@OptIn(ExperimentalSerializationApi::class)
suspend inline fun <reified T> getItems(filename: String, dir: String): Sequence<List<T>>? {
    val file = getItemsFile(filename, dir)
    return file?.let { json.decodeToSequence<List<T>>(decompressStreamed(it)) }
}

fun decompressStreamed(file: File) = InflaterInputStream(FileInputStream(file))
kotlinx.serialization.json.internal.JsonDecodingException: Unexpected JSON token at offset 0: Expected start of the array '[', but had '[' instead at path: $
                 JSON input: [{"id":"XveARg0A","name":"Dijo.....

The method should work somehow, otherwise it was not implemented, right? Maybe I need to change the sequence usage? Currently I do simple conversion:

items = getItems(filename, dir)?.toList()?.flatten()

And yes, I need the items to use in a list to show on a map, so, not sure if it’s ok to provide them one by one.

P.S. Tried to use with stream closing - same effect:

return file?.let {
    InflaterInputStream(FileInputStream(it)).use { inputStream ->
        json.decodeToSequence<List<T>>(inputStream)
    }
}

Two things:

  1. According to the JSON we see, you don’t have a list of lists, but just a list. So you should not use Sequence<List<T>>, but just Sequence<T>. Similarly: decodeToSequence<T>.
  2. You can’t do toList(). It defeats the purpose of what we are trying to do. We use all these streams and sequences specifically so we don’t have to keep a list of all items, because we can’t keep them all in the memory at the same time. By using a sequence, you can process them one by one, for example by using forEach(). But again, you can’t use forEach() and e.g. add items into a mutable list, because you can’t hold them all in the memory.

But a good news is that it looks at least decompression works fine, because we see a correct JSON contents.

1 Like

Yes, extra List was added. There were no shown problems with decompression after I started to use buffer: val buffer = ByteArray(1024). The problem is in deserialization method. I store all the items in memory after it’s completed multiple times and all is fine.
So, if I can’t send items in the list, it looks like Sequence is unnecessary here. It could consume extra resources instead and also I started to get

 java.io.IOException: Stream closed
                 	at java.util.zip.InflaterInputStream.ensureOpen(InflaterInputStream.java:84)

using it this way:

fun decompressStreamed(file: File) = InflaterInputStream(FileInputStream(file)).use { it }