Allow efficient trimming of history #77

asteppke · 2024-10-10T19:33:44Z

Currently when using the SQL backend of the ystore the complete document history is stored by default. If the document_ttl parameter is set under the specific condition that the time difference between the most recent update and the current update is larger than document_ttl all history is squashed.

At the moment this does not reduce the size of the database and in case of changes to the document that occur relatively regularly no squashing ever takes place.

This is a draft for discussion of how an effective trimming and limiting of the history size could be achieved. This would address the request to trim the database (#60) and might also influence the decision to disable the saving of the database (jupyterlab/jupyter-collaboration#244).

To achieve we introduce a new parameter, the history_length which limits the age of the oldest entries in the database. All older entries get squashed. Additionally to trim the size of the database deleted entries are vacuumed. This preserves the functionality of the database, e.g. if a client is missing updates we can provide them from the database up to the given age limit. At the same time the total size does not increase infinitely. This is especially important in contexts where the database is counted towards user quotas.

for more information, see https://pre-commit.ci

…socket into trim_history

asteppke · 2024-10-28T12:51:58Z

@davidbrochart @brichet
Any thoughts regarding this approach or comments how to deal with the growing database otherwise? Thanks!

davidbrochart · 2024-10-29T15:03:25Z

Thanks for the PR @asteppke.
It could be nice to support multiple strategies for handling the database size, one of which could be the history length. Alternatively, we could have a strategy where we ensure the database size doesn't exceed a given size, since this is really what we care about in the end. What do you think?

asteppke · 2024-10-30T10:36:48Z

Thanks for looking into this approach and your suggestion. I agree that the size of the database is a good input for a trimming strategy.

At first glance it seems that sqlite is supporting to query the database size directly, so that should be easy to add. I will have a look in the next days how to incorporate this and then update the merge request here.

asteppke and others added 5 commits October 10, 2024 21:08

added history_length parameter and respective squashing

44253ac

changed default to never squash

4d11ff0

[pre-commit.ci] auto fixes from pre-commit.com hooks

5fc1bd2

for more information, see https://pre-commit.ci

formatting and prevent issues in case of None

9c56579

Merge branch 'trim_history' of https://github.com/asteppke/pycrdt-web…

7cf3015

…socket into trim_history

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow efficient trimming of history #77

Allow efficient trimming of history #77

asteppke commented Oct 10, 2024

asteppke commented Oct 28, 2024

davidbrochart commented Oct 29, 2024

asteppke commented Oct 30, 2024

Allow efficient trimming of history #77

Are you sure you want to change the base?

Allow efficient trimming of history #77

Conversation

asteppke commented Oct 10, 2024

asteppke commented Oct 28, 2024

davidbrochart commented Oct 29, 2024

asteppke commented Oct 30, 2024