Keeping data in memory instead of using databases

Most of the time you provide an example of user sessions. This is a very specific case, it is much less complicated than usual needs for databases. If working with sessions, we modify only a single “row” of a single entity at a time, so we don’t have to keep the data consistent between multiple rows/entities. In many cases we don’t at all need persistence, we don’t care about corrupted data as we can pretty much trash everything and start from scratch. This is one of reasons why sessions are often stored in a separate DB like Redis or Memcached, not in the “main” DB of an application.

We don’t really have to discuss banking applications where data inconsistency/corruption could mean that we performed a money transfer partially, so we deducted money from account A, but didn’t add it to account B. Even if we talk about something trivial like a discussion forum, we still need guarantees about the change atomicity, data consistency, etc. For example, we first create a new topic and then add the first post to it (which should be done together), but between creating the topic and the post, other requests could see the topic without any posts, which is considered an error. If we write to a file at that time window, we actually stored a corrupted data state and if we restore from it later, it will be corrupted forever. If two users add a post to the same topic at the exactly same time, then depending on the implementation it could work correctly, but one of requests may crash or one of posts may be overwritten by the second one. We can fix such problems by synchronizing threads, but this would be very hard to do properly and DBs provide a ready to use solutions for such problems. Additionally, using mutexes would be probably much less efficient than what DBs do. Also, if you modify any data while it is being serialized to be written to a file, that could crash the process of serialization. So we would probably have to block all write operations every 10s for say 1s. And that also requires adding complexity to the code, because such global write lock won’t be easy to do if everyone have direct access to data structures and can modify them at any time.

So the question is: when you say this approach works for you so far, what size of the application do you mean? Is this a publicly available site with multiple users using it at the same time? Or does it work in your tests, but you didn’t use this is practice or you used it in sites that are rarely used by multiple users at once? Also, do you use this already for the main data of the application or only for user sessions?

2 Likes

Yeah, that’s right. We would need to require Topic to receive an opening post.

Wouldn’t it work fine with a ConcurrentList?

Interesting. Does that apply for jacksonObjectMapper().writeValueAsString(userSessions)? If the object is modified during the write process it crashes?

My apps are used only by a few users. All the data that needs persistence is stored in the userSessions map, and there is no interaction between users. Other data is constant, either stored in code or in CSV files.

Either I overcomplicate things or you oversimplify :wink: Web applications are not like a single list or map and that’s it. In web applications we have complicated graphs of correlated data and usually we don’t only add a new item somewhere and that’s it, but we do much more. Concurrent data structures are not magic bullets to solve all concurrency problems.

But well, maybe this approach will work for you :slight_smile:

I don’t know if writeValueAsString() could crash or not. If the documentation doesn’t say explicitly it can handle the case of the data being modified while being serialized, I would be definitely careful.

It works perfect for now, but let’s see if I’m able to get more users to my apps and how it goes then.

If concurrency is a problem, with different users editing at the same time, it would in many cases be possible to queue up the requests and handle only one request at the time. Example: A SaaS for smaller veterinary clinics. It should work completely fine to queue up other requests while the first one is handled. Maybe this sounds crazy, but if handling of a request takes 100 ms on average, it won’t happen often that two of the 10 veterinarians at a clinic will have a request delayed. And if it happens, the request takes maybe 200ms instead of 100ms. No big deal. So a possible solution is to queue up the request. If serialization of data is a problem in regards to concurrency, that task could also place the other requests in a queue. Of course this solution won’t work for apps where thousands of users are communicating, but for SaaS apps for smaller companies where each company can be completely sepereted (Clinic A doesn’t communicate with Clinic B), this could be a possible solution.

Just throwing it out there, if you’re really wanting to reduce the cost of implementing backend, a BaaS would likely be more effective than a custom solution if you are planning on putting your app into production.

Here’s a survey of a few.

Kotlin should be compatible due to the language interop with most. I believe Appwrite and firebase (maybe AWS Amplify as well?) have idiomatic Kotlin APIs.

I have a “backend”, if you with that mean server side code. I have very little javascript, almost all the logic is done on the server. I don’t use React or any other JS framework, but generate the HTML from the server (like we all successfully did until things got extremely complicated from 2013 and and onward).

Seems like firebase and other BaaS are more for JS apps.

what happened in 2013?

The pattern where you have an SPA on the front-end (React, Svelte, Vue), and a JSON API on the server (plus a lot of micro services if you want to make it even worse).

Not quite you are proposing, but several years ago (before cloud was pervasive) I worked on applications that used Apache Ignite as the backing in-memory database. It allows you to run in-JVM or externally, as a collection of nodes, so you don’t end up with gigantic JVMs. Although it supports persisting data to disk, in our particular case the data was fetched by another process, and we used Kafka for distribution, which also gave us a nice way to re-hydrate the data upon startup.

Our data was stored with a few select elements that we used for querying purposes, and the bulk of the payload was serialized using Protobuf, which is faster and more compact than JSON serialization. Note that Ignite shines as a distributed key-value map - maybe this has changed, but although it supports SQL style queries, performance was not ideal for very large data sets; we ended up implementing a light support for where clauses.

Anyway, my 2 cents.

1 Like

I will try to explain with a SaaS app as example. We run one instance for each customer (company). We load at startup:

val company: Company = try {
    jacksonObjectMapper().readValue(
        File("src/main/resources/company_${System.getenv("COMPANY_ID")}.json").readText(),
        object : TypeReference<Company>() {}
    )
} catch (e: Exception){
    logError("Exception while loading company JSON: $e")
    exitProcess(1)
}

Adding a new user to the company:

company.users.add(User(email, hashedPassword, name))

Editing a user:

val user = company.users.first { it.id == params["id"] }
user.email = params["email"]
//and so on

Deleting a user:

company.users.removeIf { it.id == params["id"] }

No need to save. The changes are in memory and gets saved when the app is restarted, plus each 10th second for backup.

val json = jacksonObjectMapper().writeValueAsString(company)
File("src/main/resources/company_${company.id}.json").writeText(json)

I had an unpleasure with working in https://prevayler.org for two years which is an in-memory database which bases on Java serialization. Server startup was taking about 20 minutes, same shutdown. Servers required a special kind of very expensive RAM to ensure data is not lost. Also there was no CI (what are tests? code reviews?), CD (copy .java files to server with ssh, compile it there and manually run a jar) or any kind of task mangement like Jira (e-mails are the way), but that’s a different story. Never again.

1 Like

That being said, it was 10000 times faster than SQL-based databases. With current caching techniques the differnce probably isn’t so big.

Thanks for sharing this experience. I’m not talking about in-memory database, though, but keeping the data in the regular Kotlin-code (in constants and variables), no db layer.

This is exactly how Prevayler works. You have a singleton “Repository” class which contains just collections, objects etc… You can access it for reading just as regular Java/kotlin objects, that’s why it is so fast. It uses a command pattern to modify data which somehow ensures ACID because commands are immediately serialized. This is actually quite smart.
Like this:

object Repository {
   var users: MutableList<User>;
}
1 Like

I see. Then it’s similar to my solution.

I agree, this is a silly idea, that will bite you in the ass very soon.

2 Likes

We will see. It’s anyways pretty easy to change to using a document database and save the objects as JSON there. Will let you know (if my apps grows (praying to Jesus)) about how this goes.

I don’t think this is true. The problem here is not how do you store the data underneath. It doesn’t really matter if you use JSON or some binary format, document database or local files. The main problem here is that you don’t control when you save/load data. Normally, when using databases we make changes using transactions which provide isolation, atomicity and data consistency. But the code has to cooperate with the process, it has to know when to start and commit the transaction, all data loading/saving has to be explicit. To fix the problem you will need to rework the whole code that touches the data in the DB.

1 Like

This is really a good reason for doing dependency inversion for your persistence. Even if you’re not going total Clean Architecture, a simple Repository pattern interface would protect from switching to another storage solution.

This is the part that makes red flags go up for me. I would argue for creating a quick and straightforward custom storage solution and skipping databases is fine… only if you protect your code from knowing about it. I take this quote to mean you allow your code to know that it’s loading/saving JSON.

Another example of a simple and quick storage solution, Java supports reading and reading/writing files for Properties classes, which is essentially just a Map<String, Serializable>. I see no problem using this as your backing persistence in the beginning because it’s hidden behind the interface and is interacted with only via model classes.

I’d be concerned to see any mention of the underlying persistence choice infecting much of the code (which could also happen when using ORM libraries). I’m not as concerned about making use of DB features or saving on an interval.

It should be pretty easy with my usecase:

get("/") {
  val userSession = getUserSessionFromDb(call)
  userSession.doWhateverYouWantAndItWillAllGetStoredAutomatically()
  saveUserSessionToDb(userSession)
}

For SAAS apps it should also be pretty easy: Users can simply wait in turn, if two requests come in at the same time, the second can wait until the first finishes.