Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

db: compute min-overlap ratio over 'compaction units' #1152

Closed
jbowens opened this issue May 17, 2021 · 1 comment
Closed

db: compute min-overlap ratio over 'compaction units' #1152

jbowens opened this issue May 17, 2021 · 1 comment

Comments

@jbowens
Copy link
Collaborator

jbowens commented May 17, 2021

Currently, when picking a compaction, we compute a heuristic that prioritizes files that are large relative to the overlapping files in the output level. This heuristic helps minimize write amplification. Once a file to compact is picked, the input level files are expanded to include any adjacent files that may be included without overlapping additional files within the output level. This expansion may significantly decrease the overlapping ratio of the compaction, and the resulting compaction produces less write amplification than the per-file ratio indicates.

It should be possible to write an iterator that iterates over whole compaction units. The compaction picker can be adjusted to iterate over compaction units rather than files, and calculate the min-overlapping ratio heuristic on a compaction unit basis. This seems like it should help reduce write amplification.

@jbowens
Copy link
Collaborator Author

jbowens commented Jul 25, 2022

Obsolete with the imminent removal of atomic compaction units

@jbowens jbowens closed this as completed Jul 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant