Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

db_impl_open: Error out if WAL replay seqnums aren't increasing #78

Merged
merged 1 commit into from
Mar 19, 2020

Conversation

itsbilal
Copy link
Member

@itsbilal itsbilal commented Mar 18, 2020

This should help trip early on issues such as the one found in:
cockroachdb/pebble#567


This change is Reviewable

@itsbilal itsbilal self-assigned this Mar 18, 2020
Copy link

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 files reviewed, 2 unresolved discussions (waiting on @itsbilal, @petermattis, and @sumeerbhola)


db/db_impl_open.cc, line 767 at r1 (raw file):

      // That's why we set ignore missing column families to true
      bool has_valid_writes = false;
      if (last_observed_seq > sequence) {

I think next_sequence holds the next expected sequence number. Can this conditional be something like:

if (*next_sequence != kMaxSequenceNumber && sequence < *next_sequence) {
  ...
}

The benefit to this (besides getting rid of last_observed_seq) is it looks like next_sequence is kept up to date across log files. It also accounts for the sequence number increase from the entries in a batch.


db/db_impl_open.cc, line 769 at r1 (raw file):

      if (last_observed_seq > sequence) {
        status = Status::Corruption(
          "Sequence numbers in WAL not in increasing order");

I think you can pass in arguments here. It would be nice to include the batch sequence number and the previous sequence number.

This should help trip early on issues such as the one found in:
cockroachdb/pebble#567
@itsbilal itsbilal force-pushed the bilal/rocksdb-wal-replay-assert branch from 6c8c391 to ea2d729 Compare March 19, 2020 16:02
Copy link
Member Author

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 files reviewed, 2 unresolved discussions (waiting on @petermattis and @sumeerbhola)


db/db_impl_open.cc, line 767 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

I think next_sequence holds the next expected sequence number. Can this conditional be something like:

if (*next_sequence != kMaxSequenceNumber && sequence < *next_sequence) {
  ...
}

The benefit to this (besides getting rid of last_observed_seq) is it looks like next_sequence is kept up to date across log files. It also accounts for the sequence number increase from the entries in a batch.

Good point. Made the change.


db/db_impl_open.cc, line 769 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

I think you can pass in arguments here. It would be nice to include the batch sequence number and the previous sequence number.

Corruption only takes 2 args at most, including the first message, so I just did the same approach as elsewhere and converted the two sequence numbers into strings and concatenated them.

Copy link

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

I feel we should make a similar change to this in Pebble. The protection seems worthwhile.

Reviewable status: 0 of 1 files reviewed, all discussions resolved (waiting on @sumeerbhola)

@itsbilal
Copy link
Member Author

TFTR! Non-java tests are passing

@itsbilal itsbilal merged commit 8b578c1 into crl-release-6.2.1 Mar 19, 2020
@itsbilal itsbilal deleted the bilal/rocksdb-wal-replay-assert branch March 19, 2020 19:20
itsbilal added a commit to itsbilal/cockroach that referenced this pull request Mar 19, 2020
Picks up cockroachdb/rocksdb#78 ,
a change to ensure sequence numbers are in increasing order
during WAL replay.

Release justification: Adds a safety check.
Release note (bug fix): Adds a check that detects invalid sequence numbers
in the RocksDB write-ahead log and returns an error during start
instead of applying the invalid log entries.
craig bot pushed a commit to cockroachdb/cockroach that referenced this pull request Mar 19, 2020
46206: sql: add active user txn information to crdb_internal r=rohany a=rohany

Fixes #46055.

Release justification: low risk change to existing functionality

Release note (sql change): This PR:
* Adds new internal tables `crdb_internal.node_transactions`
and `crdb_internal.cluster_transactions.`
* These tables contain some metadata about active user transactions.
* Adds the column `txn_id` to the `crdb_internal.node_queries` and
  `crdb_internal.cluster_queries` tables. These fields represent
  the transaction ID of each query in each row.


46234: kv: immediately push on WriteIntentError when lock-table disabled r=nvanbenschoten a=nvanbenschoten

Fixes #46148.

This commit fixes a bug where follower reads that hit intents could get stuck in an indefinite loop of running into the intent during evaluation, not adding the intent to the lock-table because the lock table was disabled, sequencing in the concurrency manager without issue, and repeating. The new TestClosedTimestampCanServeWithConflictingIntent test hits exactly this issue before this commit.

The fix implemented here is to immediately push the transaction responsible for an intent when serving a follower read (i.e. when a replica's lock-table is disabled). This ensures that the intent gets cleaned up if it was abandoned and avoids the busy loop we see today. If/when lockTables are maintained on follower replicas by propagating lockTable state transitions through the Raft log in the ReplicatedEvalResult instead of through the (leaseholder-only) LocalResult, we should be able to remove the lockTable "disabled" state and, in turn, remove this special-case.

The alternative approach floated to address this was to simply pass a NotLeaseHolderError back to the client when an intent is hit on a follower. This would have worked to avoid the infinite loop, but it seems like a short-term patch that doesn't get to the root of the issue. As we push further on follower reads (or even consistent read replicas), we want non-leaseholders to be able to perform conflict resolution. Falling back to the leaseholder works counter to this goal. The approach implemented by this commit works towards this goal, simply falling back to the previous sub-optimal approach of pushing immediately during conflicts.

Release note (bug fix): Follower reads that hit intents no longer have a chance of entering an infinite loop. This bug was present in earlier versions of the v20.1 release.

Release justification: fixes a high-priority bug where follower reads could get stuck indefinitely if they hit an abandoned intent.

46328: c-deps/rocksdb: Bump to pick up WAL sequence check r=itsbilal a=itsbilal

Picks up cockroachdb/rocksdb#78 ,
a change to ensure sequence numbers are in increasing order
during WAL replay.

Release justification: Adds a safety check
Release note (bug fix): Adds a check that detects invalid sequence numbers
in the RocksDB write-ahead log and returns an error during start
instead of applying the invalid log entries.

Co-authored-by: Rohan Yadav <[email protected]>
Co-authored-by: Nathan VanBenschoten <[email protected]>
Co-authored-by: Bilal Akhtar <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants