Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor compactions and deletions #40

Open
Simojuv opened this issue May 27, 2024 · 3 comments
Open

Minor compactions and deletions #40

Simojuv opened this issue May 27, 2024 · 3 comments

Comments

@Simojuv
Copy link

Simojuv commented May 27, 2024

I have an use cases that there is a specific amount of data I'd like to keep for each sensor, and delete the oldest records accordingly, essentially keeping just the newest records at all times.

For example:

10 sensors
1 snapshot per sensor every 5 seconds
cleanup every 6 hours
=
43200 old records deleted every 6 hours

As I understand it, and correct me if I am wrong, delete markers are only compacted through a major compaction, so if I wanted to delete this many records without slowing down the database considerably through the tx. files it creates, I'd need to major compact where I'd normally do a minor compaction.
As opposed to if I wanted to add 43200 records, I could just keep minor compacting it together without any performance issues and do a major compact once there are enough tx.files in the db folder.

Is there something I am misunderstanding about deletions or is there no way to compact these aside from a major compaction?

@njaard
Copy link
Owner

njaard commented May 27, 2024

If your database is only used to store data for those 10 sensors, and you only want to store new data for all of them, then a major compaction and a minor compaction will effectively be the same, so you might as well just do a major one.

If your database also stores other data that you are not deleting, and that is supposed to grow without bound, then I would recommend splitting the data that you delete into its own database and doing the major compaction on that.

Makes sense? Let me know!

@Simojuv
Copy link
Author

Simojuv commented May 28, 2024

I guess my initial problem was why a minor compaction doesn't also compact together delete markers as it would make sense for it to work the same way with additions and deletions.

The whole issue arose when the device had reached the limit of sensor data to hold and needed to start deleting old records.
The logic behind compacting the database was that upon every 100 transactions I'd do a minor compaction and upon every 100 minor transactions I'd do a major one, which seems to work really well as I am only adding to the database.
Once the deletions start occuring, the delete markers pile up until a major compaction cleans them up, and by then the database (and compact) has slowed considerably, since the folder has a ton of tx. files (delete markers which a minor compaction does not clear).

Running major compactions every 100 or so transactions is really slow in comparison to minor compactions.

@njaard
Copy link
Owner

njaard commented Jun 5, 2024

Yes, it's a good idea; you could merge delete transactions into one during a minor compaction. That's not presently implemented.

However, based on my understanding of your use-case, it seems to me that you could just do a major compaction every time you do a deletion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants