-
Notifications
You must be signed in to change notification settings - Fork 6.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Properly set smallest key of subcompaction output #4723
Conversation
Summary: It is possible to see a situation like the following when subcompactions are enabled: 1. A subcompaction boundary is set to `[b, e)`. 2. The first output file in a subcompaction has `c@20` as its smallest key 3. The range tombstone `[a, d)@30` is encountered. 4. The tombstone is written to the range-del meta block and the new smallest key is set to `b@0` (since no keys in this subcompaction's output can be smaller than `b`). 5. A key `b@10` in a lower level will now reappear, since it is not covered by the truncated start key `b@0`. In general, unless the smallest data key in a file has a seqnum of 0, it is not safe to truncate a tombstone at the start key to have a seqnum of 0, since it can expose keys with a seqnum greater than 0 but less than the tombstone's actual seqnum. To fix this, when the lower bound of a file is from the subcompaction boundaries, we now set the seqnum of an artificially extended smallest key to the tombstone's seqnum. This is safe because subcompactions operate over disjoint sets of keys, and the subcompactions that can experience this problem are not the first subcompaction (which is unbounded on the left). Furthermore, there is now an assertion to detect the described anomalous case. Test Plan: run the following command a few times: ``` make db_stress && TEST_TMPDIR=/dev/shm ./db_stress --max_background_compactions=8 --subcompactions=0 --memtablerep=skip_list --acquire_snapshot_one_in=10000 --delpercent=4 --delrangepercent=1 --snapshot_hold_ops=100000 --allow_concurrent_memtable_write=1 --compact_files_one_in=10000 --clear_column_family_one_in=0 --writepercent=35 --readpercent=25 --write_buffer_size=1048576 --max_bytes_for_level_base=4194304 --target_file_size_base=1048576 --column_families=1 --compact_range_one_in=10000 --open_files=-1 --max_key=10000000 --prefixpercent=25 --ops_per_thread=1000000 ``` Reviewers: Subscribers: Tasks: Tags:
Note that this bug is pretty obscure, since it only affects users of both subcompactions and DeleteRange. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@abhimadan has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clear description, it was really helpful for refreshing my memory about the problem. It's interesting that subcompaction boundaries can be chosen such that a range tombstone spans multiple. I think it requires the boundary to be chosen based on a file endpoint in the other level (i.e., the one that doesn't have the range tombstone).
db/compaction_job.cc
Outdated
// lower_bound. We also know that smaller subcompactions exist, because | ||
// otherwise the subcompaction woud be unbounded on the left. As a | ||
// result, we know that no other files on the output level will contain | ||
// keys at lower_bound. Therefore, it is safe to use the tombstone's |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible though that another output level file's end key is at lower_bound with kMaxSeqnum, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah good point, that's definitely possible. I'll adjust this to say that "real" keys can't be at lower_bound in other output files.
db/compaction_job.cc
Outdated
#ifndef NDEBUG | ||
SequenceNumber smallest_ikey_seqnum = kMaxSequenceNumber; | ||
if (meta->smallest.size() > 0) { | ||
GetInternalKeySeqno(meta->smallest.Encode()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops forgot to assign the result to smallest_ikey_seqnum
@abhimadan has updated the pull request. Re-import the pull request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@abhimadan has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Summary: It is possible to see a situation like the following when
subcompactions are enabled:
[b, e)
.c@20
as its smallest key[a, d)@30
is encountered.smallest key is set to
b@0
(since no keys in this subcompaction'soutput can be smaller than
b
).b@10
in a lower level will now reappear, since it is notcovered by the truncated start key
b@0
.In general, unless the smallest data key in a file has a seqnum of 0, it
is not safe to truncate a tombstone at the start key to have a seqnum of
0, since it can expose keys with a seqnum greater than 0 but less than
the tombstone's actual seqnum.
To fix this, when the lower bound of a file is from the subcompaction
boundaries, we now set the seqnum of an artificially extended smallest
key to the tombstone's seqnum. This is safe because subcompactions
operate over disjoint sets of keys, and the subcompactions that can
experience this problem are not the first subcompaction (which is
unbounded on the left).
Furthermore, there is now an assertion to detect the described anomalous
case.
Test Plan: run the following command a few times: