Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sstable: convert v1 range-del blocks to v2 on-the-fly #68

Closed
petermattis opened this issue Apr 9, 2019 · 5 comments · Fixed by #146
Closed

sstable: convert v1 range-del blocks to v2 on-the-fly #68

petermattis opened this issue Apr 9, 2019 · 5 comments · Fixed by #146
Assignees

Comments

@petermattis
Copy link
Collaborator

RocksDB generated range-del blocks contain range tombstones that are not fragmented and sorted. Pebble always generates range-del blocks that are fragmented and sorted, making them suitable for serving directly. Pebble distinguishes between RocksDB generated (v1) and Pebble generated (v2) range-del blocks by storing the latter under a different property.

The code still has a TODO to convert v1 to v2 blocks when encountered:

// TODO(peter): if we have a v1 range-del block, convert it on the fly
// and cache the converted version. We just need to create a
// rangedel.Fragmenter and loop over the v1 block and add all of the
// contents. Note that the contents of the v1 block may not be sorted, so
// we'll have to sort them first. We also need to truncate the v1
// tombstones to the sstable boundaries.

I'm not sure if that last sentence regarding tombstone truncation is still accurate given the implicit truncation that happens in mergingIter and the explicit truncation for compactions being done in #67.

@ajkr
Copy link
Contributor

ajkr commented Apr 9, 2019

storing the latter under a different property

Do you mind elaborating what you mean by this? I recall in RocksDB we distinguished the old and new formats by giving the range tombstone meta-block a different name. Or maybe we just talked about doing that; let me double-check.

@ajkr
Copy link
Contributor

ajkr commented Apr 9, 2019

Or maybe we just talked about doing that; let me double-check.

Seems just talked about it. I think our justification was we can check whether tombstones are already fragmented with a linear scan, which is only done when range tombstone meta-block gets loaded to block cache, and that shouldn't be super frequent.

@petermattis
Copy link
Collaborator Author

Seems just talked about it.

I did this in Pebble. See https://github.com/petermattis/pebble/blob/master/sstable/table.go#L159-L160. Perhaps that isn't necessary, but there it is.

I think our justification was we can check whether tombstones are already fragmented with a linear scan, which is only done when range tombstone meta-block gets loaded to block cache, and that shouldn't be super frequent.

I thought the range tombstones in RocksDB could be unsorted as well, which requires more than a linear scan to fragment.

@ajkr
Copy link
Contributor

ajkr commented Apr 9, 2019

I thought the range tombstones in RocksDB could be unsorted as well, which requires more than a linear scan to fragment.

In the newer versions they'll be stored sorted and fragmented. So it first does a linear scan checking if it's already sorted. Only if not it'll sort. It looks like it fragments either way.

@petermattis
Copy link
Collaborator Author

In the newer versions they'll be stored sorted and fragmented. So it first does a linear scan checking if it's already sorted. Only if not it'll sort. It looks like it fragments either way.

Ah, got it. I suppose that is an ok approach too. Regardless, we need to address the TODO in some fashion, either with the RocksDB approach or the plan I had previously envisioned.

@petermattis petermattis changed the title convert v1 range-del blocks to v2 on-the-fly sstable: convert v1 range-del blocks to v2 on-the-fly Apr 21, 2019
petermattis added a commit that referenced this issue May 24, 2019
RocksDB generates range-del blocks that contain range tombstones that
are not fragmented and sorted. Pebble generates range-del blocks
containing tombstones that are fragmented and sorted. The fragmented and
sorted range-del blocks are suitable for serving from directly, while
the unfragmented/unsorted blocks are not. We now transform RocksDB
format range-del blocks into Pebble format blocks on the fly.

The transformation is done via a hook that allows passing a block
through a function after it has been read from disk but before it is
added to the cache.

Fixes #68
petermattis added a commit that referenced this issue May 26, 2019
RocksDB generates range-del blocks that contain range tombstones that
are not fragmented and sorted. Pebble generates range-del blocks
containing tombstones that are fragmented and sorted. The fragmented and
sorted range-del blocks are suitable for serving from directly, while
the unfragmented/unsorted blocks are not. We now transform RocksDB
format range-del blocks into Pebble format blocks on the fly.

The transformation is done via a hook that allows passing a block
through a function after it has been read from disk but before it is
added to the cache.

Fixes #68
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants