-
Notifications
You must be signed in to change notification settings - Fork 456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sstable: convert v1 range-del blocks to v2 on-the-fly #68
Comments
Do you mind elaborating what you mean by this? I recall in RocksDB we distinguished the old and new formats by giving the range tombstone meta-block a different name. Or maybe we just talked about doing that; let me double-check. |
Seems just talked about it. I think our justification was we can check whether tombstones are already fragmented with a linear scan, which is only done when range tombstone meta-block gets loaded to block cache, and that shouldn't be super frequent. |
I did this in Pebble. See https://github.com/petermattis/pebble/blob/master/sstable/table.go#L159-L160. Perhaps that isn't necessary, but there it is.
I thought the range tombstones in RocksDB could be unsorted as well, which requires more than a linear scan to fragment. |
In the newer versions they'll be stored sorted and fragmented. So it first does a linear scan checking if it's already sorted. Only if not it'll sort. It looks like it fragments either way. |
Ah, got it. I suppose that is an ok approach too. Regardless, we need to address the TODO in some fashion, either with the RocksDB approach or the plan I had previously envisioned. |
RocksDB generates range-del blocks that contain range tombstones that are not fragmented and sorted. Pebble generates range-del blocks containing tombstones that are fragmented and sorted. The fragmented and sorted range-del blocks are suitable for serving from directly, while the unfragmented/unsorted blocks are not. We now transform RocksDB format range-del blocks into Pebble format blocks on the fly. The transformation is done via a hook that allows passing a block through a function after it has been read from disk but before it is added to the cache. Fixes #68
RocksDB generates range-del blocks that contain range tombstones that are not fragmented and sorted. Pebble generates range-del blocks containing tombstones that are fragmented and sorted. The fragmented and sorted range-del blocks are suitable for serving from directly, while the unfragmented/unsorted blocks are not. We now transform RocksDB format range-del blocks into Pebble format blocks on the fly. The transformation is done via a hook that allows passing a block through a function after it has been read from disk but before it is added to the cache. Fixes #68
RocksDB generated range-del blocks contain range tombstones that are not fragmented and sorted. Pebble always generates range-del blocks that are fragmented and sorted, making them suitable for serving directly. Pebble distinguishes between RocksDB generated (v1) and Pebble generated (v2) range-del blocks by storing the latter under a different property.
The code still has a TODO to convert v1 to v2 blocks when encountered:
I'm not sure if that last sentence regarding tombstone truncation is still accurate given the implicit truncation that happens in
mergingIter
and the explicit truncation for compactions being done in #67.The text was updated successfully, but these errors were encountered: