Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

relax L0 seqno invariant for ingested files #59

Merged

Conversation

ajkr
Copy link

@ajkr ajkr commented Sep 26, 2019

This is necessary to revert
cockroachdb/cockroach#39562 to support RocksDB
built without -DNDEBUG. Consecutive files ingested to L0 may be
assigned the same seqnum, and this PR updates the LSM validation to
allow that.

There is a longer-term fix in
facebook#5539. We should take that
later, maybe after the 19.2 release.

Test Plan: ran some tests that previously failed with corruption error
like ENABLE_ROCKSDB_ASSERTIONS=1 make PKG=github.com/cockroachdb/cockroach/pkg/ccl/backupccl testrace;
verified they succeed now.


This change is Reviewable

@ajkr ajkr force-pushed the 6.2.1-fix-ingest-seqno-assert branch from 8ecfd04 to 25342b1 Compare September 26, 2019 22:43
@ajkr
Copy link
Author

ajkr commented Sep 26, 2019

This would also allow us to set force_consistency_checks=true assuming no other errors show up.

" with fileNumber " +
NumberToString(f1->fd.GetNumber()));
bool is_f1_ingested = f1->fd.smallest_seqno == f1->fd.largest_seqno;
if (external_file_seqno != 0) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbh I didn't understand why this check existed in the original code.

Copy link

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't facebook#5539 change ingestion so that each sstable would get its own sequence number? I'd prefer to backport that PR instead and leave the invariant check in place. I suppose that change is riskier, though. Is your thinking to try to get force_consistency_checks=true into the 19.2 release? Feels a little bit late in the cycle for that.

Reviewable status: 0 of 1 files reviewed, 1 unresolved discussion (waiting on @petermattis and @sumeerbhola)

@ajkr
Copy link
Author

ajkr commented Sep 26, 2019

Andy said (a while ago) he prefers relaxing the assertion. I don't really care so will let you two decide.

I don't want to turn on force_consistency_checks=true right now (although AFAIK this is the missing piece to allow it). False positives are too scary. But we should at least be able to build/run tests with assertions enabled.

Copy link

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Andy said (a while ago) he prefers relaxing the assertion. I don't really care so will let you two decide.

Was Andy aware of the alternative? If the choice was between relaxing the invariant and not enabling assertions in tests, I'd be for relaxing the assertion as well. My preference is to keep the invariant as-is, otherwise we'll have to relax the invariant in Pebble as well.

Hrmm, if this doesn't get into 19.2 we might have to relax the invariant in Pebble regardless. Cc @sumeerbhola so he is aware.

I don't want to turn on force_consistency_checks=true right now (although AFAIK this is the missing piece to allow it). False positives are too scary.

I agree.

Reviewable status: 0 of 1 files reviewed, 1 unresolved discussion (waiting on @petermattis and @sumeerbhola)

@ajkr
Copy link
Author

ajkr commented Sep 26, 2019

Was Andy aware of the alternative?

I think yes, though don't want to speak too much for him. cc @andy-kimball

This change is less risky because it only affects code built in debug mode, whereas the other PR affects all builds.

I don't think it requires changing anything in Pebble. RocksDB has been generating L0 files with same sequence numbers for a while regardless of this PR.

@ajkr
Copy link
Author

ajkr commented Sep 26, 2019

I don't think it requires changing anything in Pebble. RocksDB has been generating L0 files with same sequence numbers for a while regardless of this PR.

I misunderstood what you were saying. But yes, whether Pebble change is needed or not is unaffected by this PR, IMO.

@petermattis
Copy link

RocksDB has been generating L0 files with same sequence numbers for a while regardless of this PR.

Right, but we only started ingesting multiple sstables in a single Ingest call in 19.2.

@andy-kimball
Copy link

andy-kimball commented Sep 26, 2019

If there's a way to add invariant checks that might prevent another corruption issue, and it's simple and safe, then I think we should put it into 19.2. I don't know enough about the specifics to have a firm position on various alternatives. When @ajkr and I spoke, he thought that including the "right" fix for the spurious assertion in 19.2 was somewhat risky, and so I suggested we just remove the assertion for now so that we could safely enable the rest of the (correct) assertions. If that is in itself risky, or if there are other alternatives that are better, then I'd reconsider my position.

@ajkr
Copy link
Author

ajkr commented Sep 26, 2019

You have a good memory :). Yes, I still think the proper fix is somewhat risky, and this assertion change is not risky (even if I totally messed it up it'd only affect debug builds).

@andy-kimball
Copy link

Seems like we should do this then. My understanding is that we run debug builds of RocksDB in our tests, and so this would give us greater likelihood of hitting an assert during the next month of 19.2 testing, with no risk of causing regressions.

@petermattis
Copy link

My understanding is that we run debug builds of RocksDB in our tests, and so this would give us greater likelihood of hitting an assert during the next month of 19.2 testing, with no risk of causing regressions.

I don't think that is true. We could possibly make it true, but right now you have to explicitly ask for RocksDB assertions to be enabled. Possible I'm completely mistaken about this.

@ajkr
Copy link
Author

ajkr commented Sep 27, 2019

It will be somewhat true if we're able to land this (or an alternative) and then revert cockroachdb/cockroach#39562. Reverting it would set ENABLE_ROCKSDB_ASSERTIONS=1 in the teamcity make testrace command.

Copy link

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 files reviewed, 3 unresolved discussions (waiting on @ajkr and @sumeerbhola)


db/version_builder.cc, line 172 at r1 (raw file):

          if (f2->fd.smallest_seqno == f2->fd.largest_seqno) {
            // This is an external file that we ingested
            SequenceNumber external_file_seqno = f2->fd.smallest_seqno;

Can we rename this f2_external_file_seqno? I massively confused myself about this PR for a moment think this seqno came from f1.


db/version_builder.cc, line 174 at r1 (raw file):

Previously, ajkr (Andrew Kryczka) wrote…

tbh I didn't understand why this check existed in the original code.

The check did exist: !(... || external_file_seqno == 0) -> !... && external_file_seqno != 0. Am I misunderstanding your statement?


db/version_builder.cc, line 177 at r1 (raw file):

              // This file's seqnos were not zeroed so we can use its seqnos to
              // determine it is ordered properly.
              if (f1->fd.largest_seqno < external_file_seqno ||

Shouldn't this be f1->fd.largest_seqno >= external_file_seqno? For some reason I'm having a hard to understanding this invariant logic the way it is written.

@petermattis
Copy link

Reverting it would set ENABLE_ROCKSDB_ASSERTIONS=1 in the teamcity make testrace command.

Ah-ha! Thanks for the explanation.

Copy link
Author

@ajkr ajkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 files reviewed, 3 unresolved discussions (waiting on @petermattis and @sumeerbhola)


db/version_builder.cc, line 172 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Can we rename this f2_external_file_seqno? I massively confused myself about this PR for a moment think this seqno came from f1.

Sure.


db/version_builder.cc, line 174 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

The check did exist: !(... || external_file_seqno == 0) -> !... && external_file_seqno != 0. Am I misunderstanding your statement?

Yes, I know it existed, I just don't know why. f2 is the older file so I'm not sure why it needs to be treated specially for having seqnum zero.


db/version_builder.cc, line 177 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Shouldn't this be f1->fd.largest_seqno >= external_file_seqno? For some reason I'm having a hard to understanding this invariant logic the way it is written.

It's probably confusing because the old conditional was written like if (!(... and the new one does not negate the whole thing. Let me clarify that f1 is the newer file and f2 is the older file; I think that should help (assuming I'm right...)

This is necessary to revert
cockroachdb/cockroach#39562 to support RocksDB
built without `-DNDEBUG`. Consecutive files ingested to L0 may be
assigned the same seqnum, and this PR updates the LSM validation to
allow that.

There is a longer-term fix in
facebook#5539. We should take that
later, maybe after the 19.2 release.

Test Plan: ran some tests that previously failed with corruption error
like `ENABLE_ROCKSDB_ASSERTIONS=1 make
PKG=github.com/cockroachdb/cockroach/pkg/ccl/backupccl testrace`;
verified they succeed now.
@ajkr ajkr force-pushed the 6.2.1-fix-ingest-seqno-assert branch from 25342b1 to 3d78ed1 Compare September 27, 2019 01:06
Copy link
Author

@ajkr ajkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 files reviewed, 3 unresolved discussions (waiting on @petermattis and @sumeerbhola)


db/version_builder.cc, line 172 at r1 (raw file):

Previously, ajkr (Andrew Kryczka) wrote…

Sure.

Done.


db/version_builder.cc, line 177 at r1 (raw file):

Previously, ajkr (Andrew Kryczka) wrote…

It's probably confusing because the old conditional was written like if (!(... and the new one does not negate the whole thing. Let me clarify that f1 is the newer file and f2 is the older file; I think that should help (assuming I'm right...)

I went back to the original style of if (!(...)) {. It is simpler so we can write the invariant we expect to be true as the .... Having the conditional contain a complicated invariant we expect to be false was too confusing.

Copy link

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 files reviewed, 3 unresolved discussions (waiting on @ajkr, @petermattis, and @sumeerbhola)


db/version_builder.cc, line 174 at r1 (raw file):

Previously, ajkr (Andrew Kryczka) wrote…

Yes, I know it existed, I just don't know why. f2 is the older file so I'm not sure why it needs to be treated specially for having seqnum zero.

Oops, I misread your comment.

When can an L0 file have a zero seqnum? My memory is hazy here.


db/version_builder.cc, line 177 at r1 (raw file):

Previously, ajkr (Andrew Kryczka) wrote…

I went back to the original style of if (!(...)) {. It is simpler so we can write the invariant we expect to be true as the .... Having the conditional contain a complicated invariant we expect to be false was too confusing.

Ack. This is definitely clearer. The change LGTM modulo the question about L0 files with zero seqnum. We can certainly merge without understanding as that is a holdover from the previous code, but it would be good to understand that.

Copy link
Author

@ajkr ajkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR.

Reviewable status: 0 of 1 files reviewed, 3 unresolved discussions (waiting on @petermattis and @sumeerbhola)


db/version_builder.cc, line 174 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Oops, I misread your comment.

When can an L0 file have a zero seqnum? My memory is hazy here.

It looks possible if an ingested file's range overlaps with files at all levels, while the range also falls between consecutive keys in all levels. Then the file has to be ingested to L0 as we cannot have overlapping files within any level. At the same time, we know it doesn't cover anything since its range fell in a gap between keys in all lower levels, so we can assign the ingested file seqnum zero. Crazy edge case.

Flush looks like it cannot zero seqnums. L0->L0 compaction can, so we could have a file whose seqnums are all zero produced by L0->L0.

All that aside, it still feels like this would be a much bigger problem if f1 were the file with zero seqnum and f2 had nonzero seqnums. That would really prevent us from telling that f1 is ordered properly. But the code right now is handling f2 with zero seqnum only.


db/version_builder.cc, line 177 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Ack. This is definitely clearer. The change LGTM modulo the question about L0 files with zero seqnum. We can certainly merge without understanding as that is a holdover from the previous code, but it would be good to understand that.

Done.

@ajkr ajkr merged commit e88209a into cockroachdb:crl-release-6.2.1 Sep 27, 2019
ajkr added a commit to ajkr/cockroach that referenced this pull request Sep 27, 2019
This reverts commit 6d6b5c4.

In the same PR I upgraded RocksDB to pull in
cockroachdb/rocksdb#59, which relaxes the
assertion whose failure led to the original revert.

Release justification: run `make testrace` in teamcity with rocksdb
assertions enabled to increase our bug-finding chances

Release note: None
Copy link

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 files reviewed, 3 unresolved discussions (waiting on @ajkr, @petermattis, and @sumeerbhola)


db/version_builder.cc, line 174 at r1 (raw file):

cockroachdb/cockroach#39562
so ingestion seeks into up-to-one sstable at each level > 0 to see if the file range does not overlap with the actual keys? does pebble does the same seqnum 0 assignment?

Copy link

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 files reviewed, 3 unresolved discussions (waiting on @ajkr and @petermattis)


db/version_builder.cc, line 174 at r1 (raw file):

so ingestion seeks into up-to-one sstable at each level > 0 to see if the file range does not overlap with the actual keys? does pebble does the same seqnum 0 assignment?

Pebble currently always assigns non-zero seqnums during ingestion. I think RocksDB is the same, at least in the way we use RocksDB (force_global_seqno).

Pebble does try to ingest sstables into the lowest level for which they don't overlap existing sstables. But it only uses sstable boundaries for this check, while RocksDB actually looks at the keys on a level.

Copy link
Author

@ajkr ajkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 files reviewed, 3 unresolved discussions (waiting on @petermattis and @sumeerbhola)


db/version_builder.cc, line 174 at r1 (raw file):

I think RocksDB is the same, at least in the way we use RocksDB (force_global_seqno).

According to my reading (not testing) the edge case described above can lead to seqnum zero ingested files. The force_global_seqno can prevent it but only if the DB snapshot list is non-empty at the time of ingestion.

ajkr added a commit to ajkr/cockroach that referenced this pull request Sep 28, 2019
This reverts commit 6d6b5c4.

In the same PR I upgraded RocksDB to pull in
cockroachdb/rocksdb#59, which relaxes the
assertion whose failure led to the original revert.

Release justification: run `make testrace` in teamcity with rocksdb
assertions enabled to increase our bug-finding chances

Release note: None
craig bot pushed a commit to cockroachdb/cockroach that referenced this pull request Sep 28, 2019
41153: Revert "Revert "c-deps: fix assertion-enabled builds"" r=ajkr a=ajkr

This reverts commit 6d6b5c4.

In this same PR I upgraded RocksDB to pull in
cockroachdb/rocksdb#59, which relaxes the
assertion whose failure led to the original revert.

Release justification: run `make testrace` in teamcity with rocksdb
assertions enabled to increase our bug-finding chances

Release note: None

Co-authored-by: Andrew Kryczka <[email protected]>
Copy link

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to do a similar change in pebble, both in Version.CheckOrdering and BulkVersionEdit.Apply?

Reviewable status: 0 of 1 files reviewed, 3 unresolved discussions (waiting on @petermattis and @sumeerbhola)

Copy link

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to do a similar change in pebble, both in Version.CheckOrdering and BulkVersionEdit.Apply?

We need to handle L0 sstables with a global seqnum of 0. See cockroachdb/pebble#316.

Reviewable status: 0 of 1 files reviewed, 3 unresolved discussions (waiting on @petermattis and @sumeerbhola)

Copy link

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to handle L0 sstables with a global seqnum of 0. See cockroachdb/pebble#316.

To be clear, these L0 sstables are not generated via ingestion, but due to a flush where the new sstable does not overlap with anything in L0 or anything else below. We could certainly avoid using zero seqnums for such sstables in Pebble, but we'd still need to handle their existence for compatibility with RocksDB.

Reviewable status: 0 of 1 files reviewed, 3 unresolved discussions (waiting on @petermattis and @sumeerbhola)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants