db_impl_open: Error out if WAL replay seqnums aren't increasing #78

itsbilal · 2020-03-18T22:06:52Z

This should help trip early on issues such as the one found in:
cockroachdb/pebble#567

This change is

petermattis

Reviewable status: 0 of 1 files reviewed, 2 unresolved discussions (waiting on @itsbilal, @petermattis, and @sumeerbhola)

db/db_impl_open.cc, line 767 at r1 (raw file):

      // That's why we set ignore missing column families to true
      bool has_valid_writes = false;
      if (last_observed_seq > sequence) {

I think next_sequence holds the next expected sequence number. Can this conditional be something like:

if (*next_sequence != kMaxSequenceNumber && sequence < *next_sequence) {
  ...
}

The benefit to this (besides getting rid of last_observed_seq) is it looks like next_sequence is kept up to date across log files. It also accounts for the sequence number increase from the entries in a batch.

db/db_impl_open.cc, line 769 at r1 (raw file):

      if (last_observed_seq > sequence) {
        status = Status::Corruption(
          "Sequence numbers in WAL not in increasing order");

I think you can pass in arguments here. It would be nice to include the batch sequence number and the previous sequence number.

This should help trip early on issues such as the one found in: cockroachdb/pebble#567

itsbilal

Reviewable status: 0 of 1 files reviewed, 2 unresolved discussions (waiting on @petermattis and @sumeerbhola)

db/db_impl_open.cc, line 767 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

I think next_sequence holds the next expected sequence number. Can this conditional be something like:
if (*next_sequence != kMaxSequenceNumber && sequence < *next_sequence) {
  ...
}
The benefit to this (besides getting rid of last_observed_seq) is it looks like next_sequence is kept up to date across log files. It also accounts for the sequence number increase from the entries in a batch.

Good point. Made the change.

db/db_impl_open.cc, line 769 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

I think you can pass in arguments here. It would be nice to include the batch sequence number and the previous sequence number.

Corruption only takes 2 args at most, including the first message, so I just did the same approach as elsewhere and converted the two sequence numbers into strings and concatenated them.

petermattis

I feel we should make a similar change to this in Pebble. The protection seems worthwhile.

Reviewable status: 0 of 1 files reviewed, all discussions resolved (waiting on @sumeerbhola)

itsbilal · 2020-03-19T19:20:22Z

TFTR! Non-java tests are passing

Picks up cockroachdb/rocksdb#78 , a change to ensure sequence numbers are in increasing order during WAL replay. Release justification: Adds a safety check. Release note (bug fix): Adds a check that detects invalid sequence numbers in the RocksDB write-ahead log and returns an error during start instead of applying the invalid log entries.

46206: sql: add active user txn information to crdb_internal r=rohany a=rohany Fixes #46055. Release justification: low risk change to existing functionality Release note (sql change): This PR: * Adds new internal tables `crdb_internal.node_transactions` and `crdb_internal.cluster_transactions.` * These tables contain some metadata about active user transactions. * Adds the column `txn_id` to the `crdb_internal.node_queries` and `crdb_internal.cluster_queries` tables. These fields represent the transaction ID of each query in each row. 46234: kv: immediately push on WriteIntentError when lock-table disabled r=nvanbenschoten a=nvanbenschoten Fixes #46148. This commit fixes a bug where follower reads that hit intents could get stuck in an indefinite loop of running into the intent during evaluation, not adding the intent to the lock-table because the lock table was disabled, sequencing in the concurrency manager without issue, and repeating. The new TestClosedTimestampCanServeWithConflictingIntent test hits exactly this issue before this commit. The fix implemented here is to immediately push the transaction responsible for an intent when serving a follower read (i.e. when a replica's lock-table is disabled). This ensures that the intent gets cleaned up if it was abandoned and avoids the busy loop we see today. If/when lockTables are maintained on follower replicas by propagating lockTable state transitions through the Raft log in the ReplicatedEvalResult instead of through the (leaseholder-only) LocalResult, we should be able to remove the lockTable "disabled" state and, in turn, remove this special-case. The alternative approach floated to address this was to simply pass a NotLeaseHolderError back to the client when an intent is hit on a follower. This would have worked to avoid the infinite loop, but it seems like a short-term patch that doesn't get to the root of the issue. As we push further on follower reads (or even consistent read replicas), we want non-leaseholders to be able to perform conflict resolution. Falling back to the leaseholder works counter to this goal. The approach implemented by this commit works towards this goal, simply falling back to the previous sub-optimal approach of pushing immediately during conflicts. Release note (bug fix): Follower reads that hit intents no longer have a chance of entering an infinite loop. This bug was present in earlier versions of the v20.1 release. Release justification: fixes a high-priority bug where follower reads could get stuck indefinitely if they hit an abandoned intent. 46328: c-deps/rocksdb: Bump to pick up WAL sequence check r=itsbilal a=itsbilal Picks up cockroachdb/rocksdb#78 , a change to ensure sequence numbers are in increasing order during WAL replay. Release justification: Adds a safety check Release note (bug fix): Adds a check that detects invalid sequence numbers in the RocksDB write-ahead log and returns an error during start instead of applying the invalid log entries. Co-authored-by: Rohan Yadav <[email protected]> Co-authored-by: Nathan VanBenschoten <[email protected]> Co-authored-by: Bilal Akhtar <[email protected]>

itsbilal requested review from petermattis and sumeerbhola March 18, 2020 22:06

itsbilal self-assigned this Mar 18, 2020

petermattis reviewed Mar 18, 2020

View reviewed changes

db_impl_open: Error out if WAL replay seqnums aren't increasing

ea2d729

This should help trip early on issues such as the one found in: cockroachdb/pebble#567

itsbilal force-pushed the bilal/rocksdb-wal-replay-assert branch from 6c8c391 to ea2d729 Compare March 19, 2020 16:02

itsbilal commented Mar 19, 2020

View reviewed changes

petermattis approved these changes Mar 19, 2020

View reviewed changes

itsbilal merged commit 8b578c1 into crl-release-6.2.1 Mar 19, 2020

itsbilal deleted the bilal/rocksdb-wal-replay-assert branch March 19, 2020 19:20

itsbilal mentioned this pull request Mar 19, 2020

c-deps/rocksdb: Bump to pick up WAL sequence check cockroachdb/cockroach#46328

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

db_impl_open: Error out if WAL replay seqnums aren't increasing #78

db_impl_open: Error out if WAL replay seqnums aren't increasing #78

itsbilal commented Mar 18, 2020 •

edited by petermattis

Loading

petermattis left a comment

itsbilal left a comment

petermattis left a comment

itsbilal commented Mar 19, 2020

db_impl_open: Error out if WAL replay seqnums aren't increasing #78

db_impl_open: Error out if WAL replay seqnums aren't increasing #78

Conversation

itsbilal commented Mar 18, 2020 • edited by petermattis Loading

petermattis left a comment

Choose a reason for hiding this comment

itsbilal left a comment

Choose a reason for hiding this comment

petermattis left a comment

Choose a reason for hiding this comment

itsbilal commented Mar 19, 2020

itsbilal commented Mar 18, 2020 •

edited by petermattis

Loading