Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compress Patch Log #44

Open
thkrebs opened this issue Dec 29, 2020 · 6 comments
Open

Compress Patch Log #44

thkrebs opened this issue Dec 29, 2020 · 6 comments

Comments

@thkrebs
Copy link

thkrebs commented Dec 29, 2020

Is there a way to "compress" or truncate a patch log in combination with a TDB2 backup. I imagine after a longer time of operation the patch log can grow rather large.

@afs
Copy link
Owner

afs commented Jan 26, 2021

Hi @thkrebs Sorry for the excessive delay in replying.

There isn't a built-in way currently, For the file backstorage, then a compressed filesystem will work.

There seem to be several related use case.

  1. Reduce the space needed to store patches over the long term without loss of the history by storing compressed. This keeps the replayable history.
  2. Truncate the log at some point in time and reset the system to have that as the start. Previous history is lost but if the use case of a highly available system then keeping the state changes forever is less useful if there is a full backup of the database.
  3. Archive the tail of the log : take patches from some point-in time backwards and put them in a compressed archive which can be moved elsewhere, leaving only the necessary patches in the patch system and a redirection in case the full patch history is ever needed.

Does one of those cases cover it for you or is there another case?

@thkrebs
Copy link
Author

thkrebs commented Jan 26, 2021

Thanks for the comprehensive answer. I think use case 2 seems to be appropriate for me. That was my line of thinking anyhow: Creating a backup which can be used as "starting point". Could you please elaborate what you exactly mean by "reset the system"? How would I implement truncation?

@afs
Copy link
Owner

afs commented Jan 31, 2021

Each client (e.g. instance of a replicated database) knows the version number where it got to.

As the code stands today, there is checking going on when the patch server(s) startup to verify the log. Some patch storages chase from latest to earliest (each patch log entries points to the previous one - they form a one linked list) and the earliest has no previous.

Just truncating the log means changing the earliest entry and it is version 0. It's not arbitrary - it could be but a change (and testing!) is needed to support that. At the moment, the truncate is going to need the client information updated.

A general facility is to have "loglets" - segments of log entries that act to organise the overall log. Then loglets can be offline (archived, deleted).

@tomkxy
Copy link

tomkxy commented Sep 16, 2022

Hi @afs ,

after operating rdf-delta for a while now, I would like be more specific about the use case which was driving the opening the issue.

We operated now rdf-delta/Fuseki for several month. The number of patches we have is now larger than 500.000. This is due a large number of updates which are caused by harvesting data from other data portals.

Obviously, the ever growing number of patches is driving the question what the best way would be to deal with that? Is there an approach you would consider as best practice?

@virtual-machinist
Copy link

@afs do I understand correctly that there's no way right now to truncate the patch log? I use the high availability setup with S3 remote storage. There's a large volume of patches generated each day, and at some point I run out of my space allowance on S3.

@afs
Copy link
Owner

afs commented Aug 29, 2024

There is no provided way to do it.

An installation can delete itself do it provided it it is careful.

If all the servers are say, up-to-date with patches from yesterday, all patches from yesterday and earlier can be deleted (or moved and compressed to some cheap place).

What is lost is the ability to rebuild the system from its start. Taking a Fuseki backup would be a good idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants