Feature request: Save variable values to disk rather than in memory (or: Why we don't need databases anymore)

Databases is a huge pain. You need to validate the data, transfer to a database object, insert to the database, probably with a SQL query, select from the database with another SQL-query, handle SQL-exceptions etc. But why do we need this? Why can’t I just use a simple list, like repository.users.add(user) when I save a user and repository.users.filter {it.id == userId} when I select a user?

Answer: Because the users are just saved in memory, and not permanently to a disk. That means if I start a new instance of the application, the list would go back to its default value. But it doesn’t need to be like that. Couldn’t there be an annotation that says "save the value of this variable (or constant) to the disk?

Example:

class Repository {

   @SaveToDisk("users")
   val users: MutableList<User> = mutableListOf()

}

It’s exactly like a normal list, just that it’s save to a disk rather than the memory.

The list would potentially have millions of rows, but that’s not a problem with today’s computation speed. This “query”, for example, took 1.5 seconds on 100 million rows

    repository.users
      .filter   { it.firstname == "bob" && it.age > 90 }
      .sortedBy { it.lastname }

   data class User(val firstname: String, val lastname: String, val age: Int)

What do you think? Shall we just get rid of databases and SQL injections?

1 Like

Seems like a library request–meaning no changes to Kotlin (or the stdlib).

Feel free to make it as your own library, or look for someone who also wants to make it, or look for existing solutions that are similar.

4 Likes

I don’t think this can be done as a library as you would need to instantly change the value on the disk when changes to the object is done. When some call users.add() from any of the app instances, this object needs to be changed on the disk. The object value needs to be on the disk rather than in the memory. I don’t see how this can be done with just a library.

Yes, you could have a list structure persisted to disk (saved when the program stops, and reloaded when it starts); it probably wouldn’t take much code at all (using serialisation).

But if that’s all you need, then you don’t really need a database!⠀A proper, grown-up DB manager will do vastly more than that for you.⠀Just for starters:

  • Large amounts of data.⠀You can fit many MBs of data in memory, but most machines would have trouble with more than a few GBs.⠀Some DBs are measured in TBs (or even PBs)…

  • Transactions.⠀Good luck rolling back the last several updates with a simple in-memory structure.⠀And good luck if multiple threads want to access the same data without affecting or even seeing each other’s changes until committed.

  • Resilience.⠀If power or hardware dies suddenly, you need to be able to recover everything up to the last committed transaction, with no chance of corruption or inconsistency

  • High concurrency.⠀(Yes, there are thread-safe data structures — and they’re pretty vital when you’re doing anything multi-threaded — but the sort of fine-grained concurrency that DBs give, especially in the face of transactional integrity, would be prohibitive.)

  • Replication and fail-over.⠀Having the same DB available on multiple servers, all kept in sync, and continuing seamlessly if a server dies.

And much, much more.⠀DB managers are the result of decades of research into doing all that extremely efficiently and safely.

If you don’t need all that, then fine!⠀But don’t dismiss it, because many situations do need that sort of power.⠀So no, we’re not going to “just get rid of databases”…

5 Likes

You could still do it as a library. In fact, I wonder if using H2 (file based DB) would solve most issued for OP–an easy DB to work with on disk that is supported by ORMs.

2 Likes

Yeah, I see there are cases where you still might want to use a traditional database.

I don’t want to just read/write to the disk on start/stop of the program. Than would be pretty insecure, for example if an instance suddenly shuts down without being able to save. Also, it would not work across several instances running in parallel. The variable needs to have a specific reference on the disk, and then always be read from and modified at this location.

If you target JVM, I know at least one library that provides disk-backed implementations of Java collections:

Not sure if it covers all your expectations though.

6 Likes

Thanks, @akurczak . This seems to be pretty close to what I’m looking for. I see that it is possible to implement this with a library. My concern was that a library would need specific types separate from the regular Java/Kotlin types, but thanks to interfaces we can make our own type that implements MutableList and then make that save to the disk on changes (I didn’t know that MutableList was an interface…). Concurrency would still be an issue if we run more than one instance at a time.

Why is this a concern?

val myList: MutableList<Int> = DiskBackedList<Int>()

myList.add(42) // We're using a plain old Kotlin `MutableList`

It’s a common mistake for new coders to code to the implementation instead of the interface. But using a different type shouldn’t impact how consumers see the object. For example (Java Code):

ArrayList<Int> myList1 = ArrayList() // Incorrect in almost all cases
List<Int> myList2 = ArrayList() // Fixed
1 Like

I completely agree with that. I thought MutableList was an implementation, before I checked it closer. Now I see this would work fine as a library. The concern would be if I couldn’t use “store to disk”-types to the regular methods that take java types.

Gotcha. Yeah if it was an implemention instead of an interface than it would not work for normal Java methods.

And to be fair, if you come across a Java method that requires a LinkedList or ArrayList than it won’t work–which is reasonable since that method really really cares about it for some reason and giving it something else would probably break it.

2 Likes

What you are proposing is orthogonal persistence. I think there have been many attempts to implement this. But as mentioned before: things get very hairy (transactions, concurrency, etc.) very fast.

Here is an example of the hairyness:

  • Let’s keep everything in memory, and save the change sets to disk so we can restore the state the next time the program starts.
  • But any change set you add may never fail.
  • But I cannot always predict if a change set will succeed or fail.
  • Okay, we’ll run 2 instances then. We apply the change set to the first, and if it succeeds, apply it to the second. If it fails on the first, we roll back the fist instance (in a possibly expensive way).

The above description is how Prevayler was forced to work for real workloads. I am pretty sure they would have loved being able to run only 1 instance.

If your requirements are not heavy (huge database, many users, many operations per second, etc.), you can certainly build something that is easier than having to interact with a database.

3 Likes