Does anyone know of a noSQL or SQL database, which allows ordinary clients to query an earlier state? i.e. have access to all (or a reasonable number of) prior revisions of the data within the schema; pretend that the query happened "10 minutes ago" or similar.
boosts welcome, please! :)
@dev datomic springs to mind, but pretty special case and nonfree
@dev I get why you want to give non IT customers this option. It also sounds like an performance, storage nightmare o.O
Would a snapshot based solution be ok, or do you aim for the capability to rollback every single Transaction?
@gom true, we're not planning to offer this to customers, as it would be a nightmare :) Snapshots would be ok, too.
@dev Ok if rolling back to "snapshots" it pretty much sounds like: https://www.postgresql.org/docs/13/continuous-archiving.html#BACKUP-PITR-RECOVERY
(at least this is on my "need to fiddle with" list for similar reasons)
@gom yep! I know that one, but it's more of an admin/backup tool. There is no way an ordinary client can a) trigger a snapshot or b) query inside of one. Unless we build a lot of fiddly tooling around that.
@dev You do not wan't to offer it to customers but clients? I'm a bit confused.
Which kind of access do you want to grant and how easy should it be (maybe describe a workflow?).
Anyway, Postgres omitted "time travel" years ago (storage and performance issues) and recommends using triggers where needed.
Out there are DBS' with time traveling capabilities (Oracle calls it "flashback"), but you likely know that.
If I remember correctly elasticsearch supports versioning
@dev Elasticsearch versioning?
@dev Not time based but prior revisions: CouchDB… (But be aware CouchDB 2 is kinda not the same as CouchDB 1…)
@dev if the data size isn't too large I'd seriously consider sqlite and file based snapshots.
if you find something better, please tell me, I'd love that as well
@tethre if that works for them, sure, please! :)
@dev I guess that could be able with CouchDB. It's been a while since I used that, but as it's an eventual consistent db design, having multiple states accessible is part of its synchronisation mechanism.
@dev mariadb has automatic data versioning https://mariadb.com/de/resources/blog/automatic-data-versioning-in-mariadb-server-10-3/
@dev If you were going to do this in a NoSQL database, you'd probably want to DIY.
In SQL databases, you have ACID by default, so you can reasonably rely on getting very timely updates. It makes sense that they standardized it into SQL:2011, because getting this to work perfectly under the ACID paradigm is fiendishly hard.
Most (but not all) NoSQL databases are BASE, as one of the main reasons to move away from SQL is to get around the intensely strict requirements of ACID.
@dev Under BASE the situation changes, there guarantees in place are extremely loose, so time travel becomes impossible to completely specify in a system independent way.
If you ask for the state of the database 10 minutes ago, it's totally possible that due to a communication/partition issue that not everything from 10 minutes ago is available! You might ask for the results at time T at 2 different real times and get inconsistent results.
So, you need to decide how to cope.
@dev Implementing DIY time travel isn't too hard, just treat your database as a ledger. You never delete anything, instead you simply add new entries to the ledger that "delete" your old, irrelevant data. There's a lot of other advantages to this approach too (ex. you don't have to implement deletion or deal with edge cases created by deletion). Here's a paper with more general details:
Then you can just create queries that get the state at a given time
@dev In fact, you'll need to write at least one time travel query: to get the current state. You probably don't want to get the whole ledger everytime you want to do anything.
As mentioned earlier though, implementing time travel properly under ACID is extremely difficult. If you are dealing with time sensitive information where ACID matters, you need to go research it more before you try to implement anything like this.
@dev On the other hand, if some minor inconsistency doesn't matter or your system is BASE anyway, then it's pretty easy, but you now have to deal with inconsistency.
Maybe it doesn't matter, or you can kick the can down the road (some user might care about inconsistent results, but you don't care, you just need to give the users the best answer available right now).
Also keep in mind that the reason BASE is popular is that you can't properly distribute a relational database.
@dev Well, ok, saying one can't distribute a relational database is a huge oversimplification. There's a lot of asterisks there, but there's no automatic easy way that could be built into the database once and only concern the developers of the database and admins.
The point is that if you are going for a BASE system, you are probably doing so because you need a distributed database and don't need ACID. So, under those conditions you can't provide strong guarantees anyway.
@dev Also, it should be noted that ACID makes DIY time travel/ledger querying relatively easy for a laid-back system. ACID does the hard work for you.
It's only when you need the same strong guarantees that it gets difficult again.
Are you searching for a temporal information system? SQL has its definitions for it, but almost noone implemented it. But if I am right, there was a postgreq plugin.