Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix sequence number bump logic in multi-CF SST ingestion #9005

Closed
wants to merge 1 commit into from

Conversation

ot
Copy link
Contributor

@ot ot commented Oct 9, 2021

Summary: The code in IngestExternalFiles() that bumps the DB's sequence number
depending on what seqnos were assigned to the files has 3 bugs:

  1. There is an assertion that the sequence number is increased in all the
    affected column families, but this is unnecessary, it is fine if some files can
    stick to a lower sequence number. It is very easy to hit the assertion: it is
    sufficient to insert 2 files in 2 CFs, one which overlaps the CF and one that
    doesn't (for example the CF is empty). The line added in the
    IngestFilesIntoMultipleColumnFamilies_Success test makes the assertion fail.

  2. SetLastSequence() is called with the sum of all the bumps across CFs, but we
    should take the maximum instead, as all CFs start with the current seqno and bump
    it independently.

  3. The code above is accidentally under a #ifndef NDEBUG, so it doesn't run in
    optimized builds, so some files may be assigned seqnos from the future.

Test Plan:
Added line in IngestFilesIntoMultipleColumnFamilies_Success that
triggers the assertion, verified that the test (and all the others) pass after
the fix.

@riversand963
Copy link
Contributor

I believe this is related to #5539, which we should have got to, but didn't...

@ot
Copy link
Contributor Author

ot commented Oct 12, 2021

@riversand963 you mean the comments towards the end? Yes I think it's the same problem.

@ajkr
Copy link
Contributor

ajkr commented Oct 12, 2021

It seems like a big problem that we were assigning seqnos from the future. When that happens, for example, point lookup and iterator can see different results (point lookup is correct due to short circuit upon finding the key in first component, while iterator is wrong due to looking at all components and picking key version with highest seqno to return). Do you mind updating HISTORY.md "Bug Fixes" section?

@riversand963
Copy link
Contributor

@riversand963 you mean the comments towards the end? Yes I think it's the same problem.

Right.

@ajkr
Copy link
Contributor

ajkr commented Oct 12, 2021

It seems like a big problem that we were assigning seqnos from the future. When that happens, for example, point lookup and iterator can see different results (point lookup is correct due to short circuit upon finding the key in first component, while iterator is wrong due to looking at all components and picking key version with highest seqno to return). Do you mind updating HISTORY.md "Bug Fixes" section?

After testing I realized the initial symptom is different. Keys whose sequence numbers are in the future at first will not be visible to any readers when the ingestion completes. They only become visible once the sequence number increases enough due to other writes. Once they become visible the value seen by reads without short-circuiting (i.e., iterators) can be incorrect.

@ot
Copy link
Contributor Author

ot commented Oct 12, 2021

Do you mind updating HISTORY.md "Bug Fixes" section?

Sure, but I don't think I have a full grasp of the implications, could you suggest a blurb? :)
Note that the bug is different in optimized and debug builds, in the former (which I assume is what most people would care about) we actually undercount the sequence numbers. I assume this could create a situation where later insertions are shadowed by the prior ingestion.

I would guess the reason this hasn't been noticed is that in most applications sets of non-overlapping files are ingested, and either they all overlap their respective CFs, or they don't because the DB is being bulk-loaded with ordered ingestion.

Besides the changelog, can anyone confirm that the assert() was in fact unnecessary? I can't find anything that relies on that condition; #5539 just did a literal translation of the previous assertion

assert(should_increment_last_seqno == ingestion_jobs[i].ShouldIncrementLastSequence());

but there's no explanation in the original PR #4895 for why this was needed (@riversand963 do you have any thoughts?).
If this is actually needed, the fix will be more complicated, we'll have to assign the seqnos in two phases (first check if any CF overlaps, then actually assign them).

@ajkr
Copy link
Contributor

ajkr commented Oct 13, 2021

Do you mind updating HISTORY.md "Bug Fixes" section?

Sure, but I don't think I have a full grasp of the implications, could you suggest a blurb? :) Note that the bug is different in optimized and debug builds, in the former (which I assume is what most people would care about) we actually undercount the sequence numbers. I assume this could create a situation where later insertions are shadowed by the prior ingestion.

Yes that is possible in optimized builds. The other part of the bug in optimized builds is ingested file keys may not be visible as soon as IngestExternalFiles() returns because the sequence number RocksDB uses for reads is still lower than ingested file keys. How about the following?

"Fixed bug in calls to IngestExternalFiles() with files for multiple column families. The bug could have introduced a delay in ingested file keys becoming visible after IngestExternalFiles() returned. Furthermore, mutations to ingested file keys while they were invisible could have been dropped (not necessarily immediately)."

Summary: The code in `IngestExternalFiles()` that bumps the DB's sequence number
depending on what seqnos were assigned to the files has 3 bugs:

1) There is an assertion that the sequence number is increased in all the
affected column families, but this is unnecessary, it is fine if some files can
stick to a lower sequence number. It is very easy to hit the assertion: it is
sufficient to insert 2 files in 2 CFs, one which overlaps the CF and one that
doesn't (for example the CF is empty). The line added in the
`IngestFilesIntoMultipleColumnFamilies_Success` test makes the assertion fail.

2) SetLastSequence() is called with the sum of all the bumps across CFs, but we
should take the maximum instead, as all CFs start with the current seqno and bump
it independently.

3) The code above is accidentally under a `#ifndef NDEBUG`, so it doesn't run in
optimized builds, so some files may be assigned seqnos from the future.

Test Plan: Added line in `IngestFilesIntoMultipleColumnFamilies_Success` that
triggers the assertion, verified that the test (and all the others) pass after
the fix.

Reviewers: riversand963 siying

Subscribers:

Tasks:

Tags:
@ot
Copy link
Contributor Author

ot commented Oct 13, 2021

@ajkr Thanks! Added.

Copy link
Contributor

@ajkr ajkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I doubt the assertion was needed (and it should have been a runtime check if so).

@facebook-github-bot
Copy link
Contributor

@ot has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

ajkr added a commit that referenced this pull request Oct 14, 2021
Summary:
The code in `IngestExternalFiles()` that bumps the DB's sequence number
depending on what seqnos were assigned to the files has 3 bugs:

1) There is an assertion that the sequence number is increased in all the
affected column families, but this is unnecessary, it is fine if some files can
stick to a lower sequence number. It is very easy to hit the assertion: it is
sufficient to insert 2 files in 2 CFs, one which overlaps the CF and one that
doesn't (for example the CF is empty). The line added in the
`IngestFilesIntoMultipleColumnFamilies_Success` test makes the assertion fail.

2) SetLastSequence() is called with the sum of all the bumps across CFs, but we
should take the maximum instead, as all CFs start with the current seqno and bump
it independently.

3) The code above is accidentally under a `#ifndef NDEBUG`, so it doesn't run in
optimized builds, so some files may be assigned seqnos from the future.

Pull Request resolved: #9005

Test Plan:
Added line in `IngestFilesIntoMultipleColumnFamilies_Success` that
triggers the assertion, verified that the test (and all the others) pass after
the fix.

Reviewed By: ajkr

Differential Revision: D31597892

Pulled By: ot

fbshipit-source-id: c2d3237f90290df1178736ace8653a9623f5a770
yoori pushed a commit to yoori/rocksdb that referenced this pull request Nov 26, 2023
Summary:
The code in `IngestExternalFiles()` that bumps the DB's sequence number
depending on what seqnos were assigned to the files has 3 bugs:

1) There is an assertion that the sequence number is increased in all the
affected column families, but this is unnecessary, it is fine if some files can
stick to a lower sequence number. It is very easy to hit the assertion: it is
sufficient to insert 2 files in 2 CFs, one which overlaps the CF and one that
doesn't (for example the CF is empty). The line added in the
`IngestFilesIntoMultipleColumnFamilies_Success` test makes the assertion fail.

2) SetLastSequence() is called with the sum of all the bumps across CFs, but we
should take the maximum instead, as all CFs start with the current seqno and bump
it independently.

3) The code above is accidentally under a `#ifndef NDEBUG`, so it doesn't run in
optimized builds, so some files may be assigned seqnos from the future.

Pull Request resolved: facebook/rocksdb#9005

Test Plan:
Added line in `IngestFilesIntoMultipleColumnFamilies_Success` that
triggers the assertion, verified that the test (and all the others) pass after
the fix.

Reviewed By: ajkr

Differential Revision: D31597892

Pulled By: ot

fbshipit-source-id: c2d3237f90290df1178736ace8653a9623f5a770
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants