Understading why my program is consuming a lot of memory

I’m learning Kotlin and I’m creating a program that fetches data from Elasticsearch and do some treatment and then save the result to a file. Here is the code:

  private fun handleMutation(
        data: FileOutputStream,
        mutation: Mutation,
        town: Town,
        converter: Converter
    ) {
        data.write((mapper.writeValueAsBytes(converter.convert(mutation))))
        data.write("\n".toByteArray())
        logger.info(
            "Finished Handling mutation with ID {} for town {}",
            mutation.idMutation,
            town.code
        )
    }

 val file = File("data/data.json")
 val data = FileOutputStream(file)
 file.createNewFile()
 val externalData = ExternalData()
 for (town in externalData.fetchTowns("31")) {
     logger.info("Handling town " + town.code)
     runBlocking {
         val converter = Converter()
         val mutations = esFetcher.fetchMutationsByTowns(town.code)
         runBlocking {
             mutations.hits.collect {
                 handleMutation(data, it.second, town, converter)
             }
         }
     }
 }

There is around 450K records in Elasticsearch. When I ran this I see the memory usage grows and the program uses like 6G in just a few minutes and then crashes because of OOM.

I created some heap dump, but I can’t say if there is memory leak or just that the program consume a lot of memory because I’m creating a lot of coroutines.

Do you see anything wrong with this code that can cause a memory leak?

Try to change how many threads you are using in your program and it helps to install java 32 bit because it can’t use more than 4gb of memory on 32 bit.

  1. If your mapper object allows it, directly write into the OutputStream instead of fully writing the mutation content in a byte array.
  2. I am no coroutine expert, but the use of runBlocking in a loop should be avoided.
  3. If externalData.fetchTowns("31") returns an in-memory list, maybe your problem arises from there. If you can make it return a flow, you should be able to fully stream your operation pipeline. That would reduce memory usage I think.
1 Like

Thank you for the answers It turned out that the issue is in the wrapper I used around Elasticsearch client. I’m now using the client without the wrapper and the memory usage stays at 450 MB.