-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
relax L0 seqno invariant for ingested files #59
relax L0 seqno invariant for ingested files #59
Conversation
8ecfd04
to
25342b1
Compare
This would also allow us to set |
db/version_builder.cc
Outdated
" with fileNumber " + | ||
NumberToString(f1->fd.GetNumber())); | ||
bool is_f1_ingested = f1->fd.smallest_seqno == f1->fd.largest_seqno; | ||
if (external_file_seqno != 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tbh I didn't understand why this check existed in the original code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't facebook#5539 change ingestion so that each sstable would get its own sequence number? I'd prefer to backport that PR instead and leave the invariant check in place. I suppose that change is riskier, though. Is your thinking to try to get force_consistency_checks=true
into the 19.2 release? Feels a little bit late in the cycle for that.
Reviewable status: 0 of 1 files reviewed, 1 unresolved discussion (waiting on @petermattis and @sumeerbhola)
Andy said (a while ago) he prefers relaxing the assertion. I don't really care so will let you two decide. I don't want to turn on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Andy said (a while ago) he prefers relaxing the assertion. I don't really care so will let you two decide.
Was Andy aware of the alternative? If the choice was between relaxing the invariant and not enabling assertions in tests, I'd be for relaxing the assertion as well. My preference is to keep the invariant as-is, otherwise we'll have to relax the invariant in Pebble as well.
Hrmm, if this doesn't get into 19.2 we might have to relax the invariant in Pebble regardless. Cc @sumeerbhola so he is aware.
I don't want to turn on force_consistency_checks=true right now (although AFAIK this is the missing piece to allow it). False positives are too scary.
I agree.
Reviewable status: 0 of 1 files reviewed, 1 unresolved discussion (waiting on @petermattis and @sumeerbhola)
I think yes, though don't want to speak too much for him. cc @andy-kimball This change is less risky because it only affects code built in debug mode, whereas the other PR affects all builds. I don't think it requires changing anything in Pebble. RocksDB has been generating L0 files with same sequence numbers for a while regardless of this PR. |
I misunderstood what you were saying. But yes, whether Pebble change is needed or not is unaffected by this PR, IMO. |
Right, but we only started ingesting multiple sstables in a single |
If there's a way to add invariant checks that might prevent another corruption issue, and it's simple and safe, then I think we should put it into 19.2. I don't know enough about the specifics to have a firm position on various alternatives. When @ajkr and I spoke, he thought that including the "right" fix for the spurious assertion in 19.2 was somewhat risky, and so I suggested we just remove the assertion for now so that we could safely enable the rest of the (correct) assertions. If that is in itself risky, or if there are other alternatives that are better, then I'd reconsider my position. |
You have a good memory :). Yes, I still think the proper fix is somewhat risky, and this assertion change is not risky (even if I totally messed it up it'd only affect debug builds). |
Seems like we should do this then. My understanding is that we run debug builds of RocksDB in our tests, and so this would give us greater likelihood of hitting an assert during the next month of 19.2 testing, with no risk of causing regressions. |
I don't think that is true. We could possibly make it true, but right now you have to explicitly ask for RocksDB assertions to be enabled. Possible I'm completely mistaken about this. |
It will be somewhat true if we're able to land this (or an alternative) and then revert cockroachdb/cockroach#39562. Reverting it would set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 files reviewed, 3 unresolved discussions (waiting on @ajkr and @sumeerbhola)
db/version_builder.cc, line 172 at r1 (raw file):
if (f2->fd.smallest_seqno == f2->fd.largest_seqno) { // This is an external file that we ingested SequenceNumber external_file_seqno = f2->fd.smallest_seqno;
Can we rename this f2_external_file_seqno
? I massively confused myself about this PR for a moment think this seqno came from f1
.
db/version_builder.cc, line 174 at r1 (raw file):
Previously, ajkr (Andrew Kryczka) wrote…
tbh I didn't understand why this check existed in the original code.
The check did exist: !(... || external_file_seqno == 0)
-> !... && external_file_seqno != 0
. Am I misunderstanding your statement?
db/version_builder.cc, line 177 at r1 (raw file):
// This file's seqnos were not zeroed so we can use its seqnos to // determine it is ordered properly. if (f1->fd.largest_seqno < external_file_seqno ||
Shouldn't this be f1->fd.largest_seqno >= external_file_seqno
? For some reason I'm having a hard to understanding this invariant logic the way it is written.
Ah-ha! Thanks for the explanation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 files reviewed, 3 unresolved discussions (waiting on @petermattis and @sumeerbhola)
db/version_builder.cc, line 172 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
Can we rename this
f2_external_file_seqno
? I massively confused myself about this PR for a moment think this seqno came fromf1
.
Sure.
db/version_builder.cc, line 174 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
The check did exist:
!(... || external_file_seqno == 0)
->!... && external_file_seqno != 0
. Am I misunderstanding your statement?
Yes, I know it existed, I just don't know why. f2 is the older file so I'm not sure why it needs to be treated specially for having seqnum zero.
db/version_builder.cc, line 177 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
Shouldn't this be
f1->fd.largest_seqno >= external_file_seqno
? For some reason I'm having a hard to understanding this invariant logic the way it is written.
It's probably confusing because the old conditional was written like if (!(...
and the new one does not negate the whole thing. Let me clarify that f1 is the newer file and f2 is the older file; I think that should help (assuming I'm right...)
This is necessary to revert cockroachdb/cockroach#39562 to support RocksDB built without `-DNDEBUG`. Consecutive files ingested to L0 may be assigned the same seqnum, and this PR updates the LSM validation to allow that. There is a longer-term fix in facebook#5539. We should take that later, maybe after the 19.2 release. Test Plan: ran some tests that previously failed with corruption error like `ENABLE_ROCKSDB_ASSERTIONS=1 make PKG=github.com/cockroachdb/cockroach/pkg/ccl/backupccl testrace`; verified they succeed now.
25342b1
to
3d78ed1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 files reviewed, 3 unresolved discussions (waiting on @petermattis and @sumeerbhola)
db/version_builder.cc, line 172 at r1 (raw file):
Previously, ajkr (Andrew Kryczka) wrote…
Sure.
Done.
db/version_builder.cc, line 177 at r1 (raw file):
Previously, ajkr (Andrew Kryczka) wrote…
It's probably confusing because the old conditional was written like
if (!(...
and the new one does not negate the whole thing. Let me clarify that f1 is the newer file and f2 is the older file; I think that should help (assuming I'm right...)
I went back to the original style of if (!(...)) {
. It is simpler so we can write the invariant we expect to be true as the ...
. Having the conditional contain a complicated invariant we expect to be false was too confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 files reviewed, 3 unresolved discussions (waiting on @ajkr, @petermattis, and @sumeerbhola)
db/version_builder.cc, line 174 at r1 (raw file):
Previously, ajkr (Andrew Kryczka) wrote…
Yes, I know it existed, I just don't know why. f2 is the older file so I'm not sure why it needs to be treated specially for having seqnum zero.
Oops, I misread your comment.
When can an L0 file have a zero seqnum? My memory is hazy here.
db/version_builder.cc, line 177 at r1 (raw file):
Previously, ajkr (Andrew Kryczka) wrote…
I went back to the original style of
if (!(...)) {
. It is simpler so we can write the invariant we expect to be true as the...
. Having the conditional contain a complicated invariant we expect to be false was too confusing.
Ack. This is definitely clearer. The change LGTM modulo the question about L0 files with zero seqnum. We can certainly merge without understanding as that is a holdover from the previous code, but it would be good to understand that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR.
Reviewable status: 0 of 1 files reviewed, 3 unresolved discussions (waiting on @petermattis and @sumeerbhola)
db/version_builder.cc, line 174 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
Oops, I misread your comment.
When can an L0 file have a zero seqnum? My memory is hazy here.
It looks possible if an ingested file's range overlaps with files at all levels, while the range also falls between consecutive keys in all levels. Then the file has to be ingested to L0 as we cannot have overlapping files within any level. At the same time, we know it doesn't cover anything since its range fell in a gap between keys in all lower levels, so we can assign the ingested file seqnum zero. Crazy edge case.
Flush looks like it cannot zero seqnums. L0->L0 compaction can, so we could have a file whose seqnums are all zero produced by L0->L0.
All that aside, it still feels like this would be a much bigger problem if f1
were the file with zero seqnum and f2
had nonzero seqnums. That would really prevent us from telling that f1
is ordered properly. But the code right now is handling f2
with zero seqnum only.
db/version_builder.cc, line 177 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
Ack. This is definitely clearer. The change LGTM modulo the question about L0 files with zero seqnum. We can certainly merge without understanding as that is a holdover from the previous code, but it would be good to understand that.
Done.
This reverts commit 6d6b5c4. In the same PR I upgraded RocksDB to pull in cockroachdb/rocksdb#59, which relaxes the assertion whose failure led to the original revert. Release justification: run `make testrace` in teamcity with rocksdb assertions enabled to increase our bug-finding chances Release note: None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 files reviewed, 3 unresolved discussions (waiting on @ajkr, @petermattis, and @sumeerbhola)
db/version_builder.cc, line 174 at r1 (raw file):
cockroachdb/cockroach#39562
so ingestion seeks into up-to-one sstable at each level > 0 to see if the file range does not overlap with the actual keys? does pebble does the same seqnum 0 assignment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 files reviewed, 3 unresolved discussions (waiting on @ajkr and @petermattis)
db/version_builder.cc, line 174 at r1 (raw file):
so ingestion seeks into up-to-one sstable at each level > 0 to see if the file range does not overlap with the actual keys? does pebble does the same seqnum 0 assignment?
Pebble currently always assigns non-zero seqnums during ingestion. I think RocksDB is the same, at least in the way we use RocksDB (force_global_seqno
).
Pebble does try to ingest sstables into the lowest level for which they don't overlap existing sstables. But it only uses sstable boundaries for this check, while RocksDB actually looks at the keys on a level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 files reviewed, 3 unresolved discussions (waiting on @petermattis and @sumeerbhola)
db/version_builder.cc, line 174 at r1 (raw file):
I think RocksDB is the same, at least in the way we use RocksDB (force_global_seqno).
According to my reading (not testing) the edge case described above can lead to seqnum zero ingested files. The force_global_seqno
can prevent it but only if the DB snapshot list is non-empty at the time of ingestion.
This reverts commit 6d6b5c4. In the same PR I upgraded RocksDB to pull in cockroachdb/rocksdb#59, which relaxes the assertion whose failure led to the original revert. Release justification: run `make testrace` in teamcity with rocksdb assertions enabled to increase our bug-finding chances Release note: None
41153: Revert "Revert "c-deps: fix assertion-enabled builds"" r=ajkr a=ajkr This reverts commit 6d6b5c4. In this same PR I upgraded RocksDB to pull in cockroachdb/rocksdb#59, which relaxes the assertion whose failure led to the original revert. Release justification: run `make testrace` in teamcity with rocksdb assertions enabled to increase our bug-finding chances Release note: None Co-authored-by: Andrew Kryczka <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to do a similar change in pebble, both in Version.CheckOrdering and BulkVersionEdit.Apply?
Reviewable status: 0 of 1 files reviewed, 3 unresolved discussions (waiting on @petermattis and @sumeerbhola)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to do a similar change in pebble, both in Version.CheckOrdering and BulkVersionEdit.Apply?
We need to handle L0 sstables with a global seqnum of 0. See cockroachdb/pebble#316.
Reviewable status: 0 of 1 files reviewed, 3 unresolved discussions (waiting on @petermattis and @sumeerbhola)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to handle L0 sstables with a global seqnum of 0. See cockroachdb/pebble#316.
To be clear, these L0 sstables are not generated via ingestion, but due to a flush where the new sstable does not overlap with anything in L0 or anything else below. We could certainly avoid using zero seqnums for such sstables in Pebble, but we'd still need to handle their existence for compatibility with RocksDB.
Reviewable status: 0 of 1 files reviewed, 3 unresolved discussions (waiting on @petermattis and @sumeerbhola)
This is necessary to revert
cockroachdb/cockroach#39562 to support RocksDB
built without
-DNDEBUG
. Consecutive files ingested to L0 may beassigned the same seqnum, and this PR updates the LSM validation to
allow that.
There is a longer-term fix in
facebook#5539. We should take that
later, maybe after the 19.2 release.
Test Plan: ran some tests that previously failed with corruption error
like
ENABLE_ROCKSDB_ASSERTIONS=1 make PKG=github.com/cockroachdb/cockroach/pkg/ccl/backupccl testrace
;verified they succeed now.
This change is