Working on a temporal (keeping and querying snapshots efficiently) NoSQL storage system (Open Source)/ Searching help


#1

Hi, I’m working on a temporal storage system, which is capable of reconstructing a revision in O(n+e) whereas n is the number of nodes stored in the revision and e is the number of nodes, which have been deleted from the previous revision.

I’m currently working on a RESTful API for storing, retrieving JSON data (besides XML). The core is written in Java, but I want to provide the best possible API for Kotlin as in my opinion it’s simply a modern, better Java.

Key features are:

  • log-structured storage system with copy-on-write semantics especially well suited for SSDs (random reads, writes are batched and synced to disk when a commit is issued)

  • implements a novel versioning algorithm called sliding snapshot, which balances read/write-performance and has other beneficial characteristics. We also implement full, incremental and differential versioning at the record-level

  • user defined, typed, versioned index structures as well as a path summary of each XDM- or JSON-resource/document (also versioned)

  • stores hashes of the page-fragments in parent pointers in our main hash array based trie structure in the indirect pages as in ZFS. In the future these can be used to validate the integrity of the whole resource

  • compression of each page, as well as encryption in the future to provide encryption at rest

  • for each XDM/XML-node or JSON-node in our on-disk structure we optionally store the descendant-count, the child-count as well as a hash of the content

  • a diff algorithm, which uses our stable node-identifiers and optionally hashes for comparisons of node-pairs in different revisions

  • a RESTful, asynchronous, temporal API written with Vert.x in Kotlin

  • several temporal XPath axis extensions, which could also be used for JSON in the future

  • several XQuery functions to open, diff, commit… a resource

  • opening revisions of resources either by an ID or via a given timestamp. In case of the given timestamp the revision is searched by binary search and either the revision is found or the revision, which is closest to the given point in time is opened

So, in general I’d love to get any input and especially help with a Kotlin API to for instance store Kotlin functions as stored procedures on the HTTP-server :slight_smile:

My goal is to release version 0.9 with a rudimentary RESTful API for both XML and JSON, but without the query capabilities for JSON as of now. For 1.0 I’d like to fix bugs, add a lot more unit- and integration tests and provide a JSONiq binding to use XQuery for querying (as with the XML data). Plus I want to add a really nice Kotlin-API, maybe even a DSL :slight_smile:

kind regards
Johannes