Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow efficient trimming of history #77

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

asteppke
Copy link

Currently when using the SQL backend of the ystore the complete document history is stored by default. If the document_ttl parameter is set under the specific condition that the time difference between the most recent update and the current update is larger than document_ttl all history is squashed.

At the moment this does not reduce the size of the database and in case of changes to the document that occur relatively regularly no squashing ever takes place.

This is a draft for discussion of how an effective trimming and limiting of the history size could be achieved. This would address the request to trim the database (#60) and might also influence the decision to disable the saving of the database (jupyterlab/jupyter-collaboration#244).

To achieve we introduce a new parameter, the history_length which limits the age of the oldest entries in the database. All older entries get squashed. Additionally to trim the size of the database deleted entries are vacuumed. This preserves the functionality of the database, e.g. if a client is missing updates we can provide them from the database up to the given age limit. At the same time the total size does not increase infinitely. This is especially important in contexts where the database is counted towards user quotas.

@asteppke
Copy link
Author

@davidbrochart @brichet
Any thoughts regarding this approach or comments how to deal with the growing database otherwise? Thanks!

@davidbrochart
Copy link
Collaborator

Thanks for the PR @asteppke.
It could be nice to support multiple strategies for handling the database size, one of which could be the history length. Alternatively, we could have a strategy where we ensure the database size doesn't exceed a given size, since this is really what we care about in the end. What do you think?

@asteppke
Copy link
Author

Thanks for looking into this approach and your suggestion. I agree that the size of the database is a good input for a trimming strategy.

At first glance it seems that sqlite is supporting to query the database size directly, so that should be easy to add. I will have a look in the next days how to incorporate this and then update the merge request here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants