Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix range deletion tombstone ingestion with global seqno #6429

Conversation

ajkr
Copy link
Contributor

@ajkr ajkr commented Feb 19, 2020

Original author: @jeffrey-xiao

If we are writing a global seqno for an ingested file, the range
tombstone metablock gets accessed and put into the cache during
ingestion preparation. At the time, the global seqno of the ingested
file has not yet been determined, so the cached block will not have a
global seqno. When the file is ingested and we read its range tombstone
metablock, it will be returned from the cache with no global seqno. In
that case, we use the actual seqnos stored in the range tombstones,
which are all zero, so the tombstones cover nothing.

This commit removes global_seqno_ variable from Block. When iterating
over a block, the global seqno for the block is determined by the
iterator instead of storing this mutable attribute in Block.
Additionally, this commit adds a regression test to check that keys are
deleted when ingesting a file with a global seqno and range deletion
tombstones.

Copy link
Contributor Author

@ajkr ajkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replied to @ltamasi's comments on the original PR (#5719).

ASSERT_OK(sst_file_writer.Open(file));
ASSERT_OK(sst_file_writer.DeleteRange(Key(0), Key(30)));
ExternalSstFileInfo file_info;
ASSERT_OK(sst_file_writer.Finish(&file_info));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just say ASSERT_OK(s); here, or even combine it with the previous line (ASSERT_OK(sst_file_writer.Finish(&file_info));).

Done.


for (int i = 5; i < 25; i++) {
std::string res;
ASSERT_TRUE(db_->Get(ReadOptions(), Key(i), &res).IsNotFound());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ASSERT_TRUE(Get(Key(i)).IsNotFound()) would be a bit cleaner here.

Done.

@@ -150,8 +150,7 @@ class BlockReadAmpBitmap {
class Block {
public:
// Initialize the block with the specified contents.
explicit Block(BlockContents&& contents, SequenceNumber _global_seqno,
size_t read_amp_bytes_per_bit = 0,
explicit Block(BlockContents&& contents, size_t read_amp_bytes_per_bit = 0,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem here (not with the PR per se but with the pre-existing default parameter values) is that removing _global_seqno changes the meaning of code like this:

Block reader1(std::move(contents), kDisableGlobalSequenceNumber); // kDisableGlobalSequenceNumber is now read_amp_bytes_per_bit !!!

We should review all places where Blocks are constructed and fix any problems like this.

Good catch. I went through all occurrences of Block and found a bunch. You can see the fixes in the "address comments" commit (35be04d).

HISTORY.md Outdated
@@ -12,6 +12,7 @@
* BlobDB now ignores trivially moved files when updating the mapping between blob files and SSTs. This should mitigate issue #6338 where out of order flush/compaction notifications could trigger an assertion with the earlier code.
* Batched MultiGet() ignores IO errors while reading data blocks, causing it to potentially continue looking for a key and returning stale results.
* `WriteBatchWithIndex::DeleteRange` returns `Status::NotSupported`. Previously it returned success even though reads on the batch did not account for range tombstones. The corresponding language bindings now cannot be used. In C, that includes `rocksdb_writebatch_wi_delete_range`, `rocksdb_writebatch_wi_delete_range_cf`, `rocksdb_writebatch_wi_delete_rangev`, and `rocksdb_writebatch_wi_delete_rangev_cf`. In Java, that includes `WriteBatchWithIndex::deleteRange`.
* Fix a bug where range tombstone blocks in ingested files were cached incorrectly during ingestion. If range tombstones were read from those incorrectly cached blocks, the keys they covered would be exposed.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, could you mention the fix in HISTORY.md?

Done.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ajkr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@ajkr ajkr requested a review from ltamasi February 21, 2020 23:01
Copy link
Contributor

@ltamasi ltamasi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with one minor comment. Thanks!

table/block_based/partitioned_filter_block.cc Outdated Show resolved Hide resolved
jeffrey-xiao and others added 4 commits February 25, 2020 13:29
If we are writing a global seqno for an ingested file, the range
tombstone metablock gets accessed and put into the cache during
ingestion preparation. At the time, the global seqno of the ingested
file has not yet been determined, so the cached block will not have a
global seqno. When the file is ingested and we read its range tombstone
metablock, it will be returned from the cache with no global seqno. In
that case, we use the actual seqnos stored in the range tombstones,
which are all zero, so the tombstones cover nothing.

This commit removes global_seqno_ variable from Block. When iterating
over a block, the global seqno for the block is determined by the
iterator instead of storing this mutable attribute in Block.
Additionally, this commit adds a regression test to check that keys are
deleted when ingesting a file with a global seqno and range deletion
tombstones.
@ajkr ajkr force-pushed the fix-range-deletion-tombstone-ingestion-with-global-seqno branch from 35be04d to 380eade Compare February 25, 2020 21:33
@facebook-github-bot
Copy link
Contributor

@ajkr has updated the pull request. Re-import the pull request

@ajkr
Copy link
Contributor Author

ajkr commented Feb 25, 2020

Thanks for the review!

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ajkr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@ajkr merged this pull request in 69679e7.

facebook-github-bot pushed a commit that referenced this pull request Mar 13, 2023
…11252)

Summary:
CreateColumnFamilyWithImport() did not support range tombstones for two reasons:
1. it uses point keys of a input file to determine its boundary (smallest and largest internal key), which means range tombstones outside of the point key range will be effectively dropped.
2. it does not handle files with no point keys.

Also included a fix in external_sst_file_ingestion_job.cc where the blocks read in `GetIngestedFileInfo()` can be added to block cache now (issue fixed in #6429).

This PR adds support for exporting and importing column family with range tombstones. The main change is to add smallest internal key and largest internal key to `SstFileMetaData` that will be part of the output of `ExportColumnFamily()`. Then during `CreateColumnFamilyWithImport(...,const ExportImportFilesMetaData& metadata,...)`, file boundaries can be set from `metadata` directly. This is needed since when file boundaries are extended by range tombstones, sometimes they cannot be deduced from a file's content alone.

Pull Request resolved: #11252

Test Plan:
- added unit tests that fails before this change

Closes #11245

Reviewed By: ajkr

Differential Revision: D43577443

Pulled By: cbi42

fbshipit-source-id: 6bff78e583cc50c44854994dea0a8dd519398f2f
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants