-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
db_impl_open: Error out if WAL replay seqnums aren't increasing #78
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 files reviewed, 2 unresolved discussions (waiting on @itsbilal, @petermattis, and @sumeerbhola)
db/db_impl_open.cc, line 767 at r1 (raw file):
// That's why we set ignore missing column families to true bool has_valid_writes = false; if (last_observed_seq > sequence) {
I think next_sequence
holds the next expected sequence number. Can this conditional be something like:
if (*next_sequence != kMaxSequenceNumber && sequence < *next_sequence) {
...
}
The benefit to this (besides getting rid of last_observed_seq
) is it looks like next_sequence
is kept up to date across log files. It also accounts for the sequence number increase from the entries in a batch.
db/db_impl_open.cc, line 769 at r1 (raw file):
if (last_observed_seq > sequence) { status = Status::Corruption( "Sequence numbers in WAL not in increasing order");
I think you can pass in arguments here. It would be nice to include the batch sequence number and the previous sequence number.
This should help trip early on issues such as the one found in: cockroachdb/pebble#567
6c8c391
to
ea2d729
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 files reviewed, 2 unresolved discussions (waiting on @petermattis and @sumeerbhola)
db/db_impl_open.cc, line 767 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
I think
next_sequence
holds the next expected sequence number. Can this conditional be something like:if (*next_sequence != kMaxSequenceNumber && sequence < *next_sequence) { ... }
The benefit to this (besides getting rid of
last_observed_seq
) is it looks likenext_sequence
is kept up to date across log files. It also accounts for the sequence number increase from the entries in a batch.
Good point. Made the change.
db/db_impl_open.cc, line 769 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
I think you can pass in arguments here. It would be nice to include the batch sequence number and the previous sequence number.
Corruption
only takes 2 args at most, including the first message, so I just did the same approach as elsewhere and converted the two sequence numbers into strings and concatenated them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel we should make a similar change to this in Pebble. The protection seems worthwhile.
Reviewable status: 0 of 1 files reviewed, all discussions resolved (waiting on @sumeerbhola)
TFTR! Non-java tests are passing |
Picks up cockroachdb/rocksdb#78 , a change to ensure sequence numbers are in increasing order during WAL replay. Release justification: Adds a safety check. Release note (bug fix): Adds a check that detects invalid sequence numbers in the RocksDB write-ahead log and returns an error during start instead of applying the invalid log entries.
46206: sql: add active user txn information to crdb_internal r=rohany a=rohany Fixes #46055. Release justification: low risk change to existing functionality Release note (sql change): This PR: * Adds new internal tables `crdb_internal.node_transactions` and `crdb_internal.cluster_transactions.` * These tables contain some metadata about active user transactions. * Adds the column `txn_id` to the `crdb_internal.node_queries` and `crdb_internal.cluster_queries` tables. These fields represent the transaction ID of each query in each row. 46234: kv: immediately push on WriteIntentError when lock-table disabled r=nvanbenschoten a=nvanbenschoten Fixes #46148. This commit fixes a bug where follower reads that hit intents could get stuck in an indefinite loop of running into the intent during evaluation, not adding the intent to the lock-table because the lock table was disabled, sequencing in the concurrency manager without issue, and repeating. The new TestClosedTimestampCanServeWithConflictingIntent test hits exactly this issue before this commit. The fix implemented here is to immediately push the transaction responsible for an intent when serving a follower read (i.e. when a replica's lock-table is disabled). This ensures that the intent gets cleaned up if it was abandoned and avoids the busy loop we see today. If/when lockTables are maintained on follower replicas by propagating lockTable state transitions through the Raft log in the ReplicatedEvalResult instead of through the (leaseholder-only) LocalResult, we should be able to remove the lockTable "disabled" state and, in turn, remove this special-case. The alternative approach floated to address this was to simply pass a NotLeaseHolderError back to the client when an intent is hit on a follower. This would have worked to avoid the infinite loop, but it seems like a short-term patch that doesn't get to the root of the issue. As we push further on follower reads (or even consistent read replicas), we want non-leaseholders to be able to perform conflict resolution. Falling back to the leaseholder works counter to this goal. The approach implemented by this commit works towards this goal, simply falling back to the previous sub-optimal approach of pushing immediately during conflicts. Release note (bug fix): Follower reads that hit intents no longer have a chance of entering an infinite loop. This bug was present in earlier versions of the v20.1 release. Release justification: fixes a high-priority bug where follower reads could get stuck indefinitely if they hit an abandoned intent. 46328: c-deps/rocksdb: Bump to pick up WAL sequence check r=itsbilal a=itsbilal Picks up cockroachdb/rocksdb#78 , a change to ensure sequence numbers are in increasing order during WAL replay. Release justification: Adds a safety check Release note (bug fix): Adds a check that detects invalid sequence numbers in the RocksDB write-ahead log and returns an error during start instead of applying the invalid log entries. Co-authored-by: Rohan Yadav <[email protected]> Co-authored-by: Nathan VanBenschoten <[email protected]> Co-authored-by: Bilal Akhtar <[email protected]>
This should help trip early on issues such as the one found in:
cockroachdb/pebble#567
This change is