Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix range tombstones written to more files than necessary #4592

Closed
wants to merge 4 commits into from

Conversation

ajkr
Copy link
Contributor

@ajkr ajkr commented Oct 26, 2018

When there's a gap between files, we do not need to output tombstones starting at the next output file's begin key to the current output file.

Test Plan:

  • make check -j64

@ajkr
Copy link
Contributor Author

ajkr commented Oct 26, 2018

I'll try to come up with a DB-level test later.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ajkr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@abhimadan abhimadan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I wonder if there are other lurking bugs related to potentially overlapping SSTs on the same level.

@petermattis
Copy link
Contributor

LGTM

The CockroachDB test which caught this is https://github.com/cockroachdb/cockroach/blob/master/pkg/storage/engine/rocksdb_test.go#L1267-L1398. It doesn't use anything super complicated, but it is written in Go. The slightly non obvious part is that our wrapper around DeleteRange contains a hack so that it also adds point tombstones at the start and end of the range tombstone in order to trick the compaction output size heuristics into splitting sstables at good boundaries.

The test starts by creating 3 sstables at L6 (using ingestion):

6: "a000000000" - "a000009999"
6: "b000000000" - "b000009999"
6: "c000000000" - "c000009999"

It then writes the key a000000000, deletes the keys c000000000 and c000009999, and deletes the range [c000000000, c000001000). Lastly, it compacts the range [c000000000, c000001000).

@facebook-github-bot
Copy link
Contributor

@ajkr has updated the pull request. Re-import the pull request

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ajkr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@abhimadan
Copy link
Contributor

New DB test LGTM

@ajkr
Copy link
Contributor Author

ajkr commented Oct 29, 2018

appveyor error:
c:\projects\rocksdb\db\db_range_del_test.cc(1448): warning C4244: '+=': conversion from 'const uint64_t' to 'int', possible loss of data [C:\projects\rocksdb\build\rocksdb_db_range_del_test.vcxproj]

@facebook-github-bot
Copy link
Contributor

@ajkr has updated the pull request. Re-import the pull request

Copy link
Contributor

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Thanks for addressing this so quickly.

for (const auto& name_and_table_props : all_table_props) {
num_range_deletions += name_and_table_props.second->num_range_deletions;
}
ASSERT_EQ(1, num_range_deletions);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if you could assert the range deletion was present in the second L1 sstable, though I'd only do that if it is relatively easy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will try.

ASSERT_OK(Put("a", "val"));
ASSERT_OK(db_->DeleteRange(WriteOptions(), db_->DefaultColumnFamily(),
"c" + Key(1), "d"));
ASSERT_OK(Put("c" + Key(1), "value"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want to add a comment that putting the key c gives the compaction output heuristics a stopping point. When #3977 is addressed this shouldn't be necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, makes sense.

});
ASSERT_EQ("a", l1_metadata[0].smallestkey);
ASSERT_EQ("a", l1_metadata[0].largestkey);
ASSERT_EQ("c" + Key(1), l1_metadata[1].smallestkey);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, this and the below line indirectly verify the range tombstone is in the second L1 SST.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this could be the the point mutation, but you're correct that the line below implies that the tombstone is in this sstable.

@facebook-github-bot
Copy link
Contributor

@ajkr has updated the pull request. Re-import the pull request

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ajkr is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

petermattis pushed a commit to petermattis/rocksdb that referenced this pull request Oct 30, 2018
)

Summary:
When there's a gap between files, we do not need to output tombstones starting at the next output file's begin key to the current output file.
Pull Request resolved: facebook#4592

Differential Revision: D12808627

Pulled By: ajkr

fbshipit-source-id: 77c8b2e7523a95b1cd6611194144092c06acb505
petermattis added a commit to cockroachdb/rocksdb that referenced this pull request Oct 30, 2018
…elete-range

Fix range tombstones written to more files than necessary (facebook#4592)
petermattis pushed a commit to cockroachdb/rocksdb that referenced this pull request Dec 5, 2018
)

Summary:
When there's a gap between files, we do not need to output tombstones starting at the next output file's begin key to the current output file.
Pull Request resolved: facebook#4592

Differential Revision: D12808627

Pulled By: ajkr

fbshipit-source-id: 77c8b2e7523a95b1cd6611194144092c06acb505
@ajkr ajkr mentioned this pull request Jun 26, 2019
32 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants