-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: prevent finished proposal from being present in proposals map #94825
Conversation
I'm not a huge fan of these fatal assertions, since they can be far worse than the problem they try to prevent when hit in production. Can we come up with a scheme for assertions that are compiled out in production builds? |
Fair point. We can
everywhere, it's a bit difficult to have generic helpers in Go because that usually costs you in allocations, etc. I filed #94979 to track this, we should do a proper pass for stability. |
28856d7
to
c22b5af
Compare
Put the assertion behind the build tag in this PR. |
Alternatively, could do something like:
or:
|
I think a |
Also filed #94986 for the general issue of making it easier to write assertions. |
Friendly ping. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 4 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @pavelkalinnikov and @tbg)
pkg/kv/kvserver/apply/task.go
line 284 at r1 (raw file):
return rejectErr } return err
Don't we still want to return the error regardless of its type?
Also, this now "leaks" commands in the non-err removed case. Is that assumption that all other cases are fatal to the process, so we don't care about leaking commands in those cases?
pkg/kv/kvserver/replica_test.go
line 8134 at r1 (raw file):
proposalDoneCh := proposal.doneCh g, _, pErr := repl.concMgr.SequenceReq(ctx, nil /* guard */, concurrency.Request{
For the sake of readers, it seems more appropriate to sequence the request before the call to requestToProposal
, because that's the real order of operations.
The conjecture in cockroachdb#86547 is that a finished proposal somehow makes its way into the proposal map, most likely by never being removed prior to being finished. This commit adds an assertion that we're never outright *inserting* a finished proposals, and better documents the places in which we're running a risk of violating the invariant. It also clarifies the handling of proposals in an apply batch when a replication change that removes the replica is encountered. I suspected that this could lead to a case in which proposals would be finished despite remaining in the proposals map. Upon inspection this turned out to be incorrect - the map (at least by code inspection) is empty at that point, so the invariant holds trivially. Unfortunately, that leaves me without an explanation for cockroachdb#86547, but the newly added invariants may prove helpful. Touches cockroachdb#86547.
c22b5af
to
15b1c6a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, RFAL.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @pavelkalinnikov)
pkg/kv/kvserver/apply/task.go
line 284 at r1 (raw file):
Good catch - the error should've been returned either way, glad you caught this.
Also, this now "leaks" commands in the non-err removed case. Is that assumption that all other cases are fatal to the process, so we don't care about leaking commands in those cases?
Yes, errors from the apply loop are fatal. We may handle them in the future by calling into some version of replica removal at the caller, and replica removal would be in charge of releasing all pending commands.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r2, 2 of 2 files at r3, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @pavelkalinnikov)
bors r=nvanbenschoten |
Build failed (retrying...): |
Build failed (retrying...): |
Build failed: |
Build succeeded: |
… in proposals map" Reverts cockroachdb#94825. This reverts commit ac23f46.
…posals map" Reverts cockroachdb#94825. This reverts commit 15b1c6a.
98481: kvserver: revert recent changes to reproposals r=pavelkalinnikov a=tbg Reverts #97606, #97564, #94825, #94633. - Revert "kvserver: disable assertion 'finished proposal inserted'" - Revert "kvserver: narrow down 'finishing a proposal with outstanding reproposal'" - Revert "kvserver: fill gaps in comment near tryReproposeWithNewLeaseIndex" - Revert "kvserver: hoist early return out of tryReproposeWithNewLeaseIndex" - Revert "fixup! kvserver: prevent finished proposal from being present in proposals map" - Revert "kvserver: prevent finished proposal from being present in proposals map" - Revert "kvserver: improve reproposal assertions and documentation" Closes #97973. Epic: CRDB-25287 Release Note: none 98537: sql: check row level ttl change before truncating a table r=chengxiong-ruan a=chengxiong-ruan Fixes: #93443 Release note (sql change): This commit fixed a bug where crdb paniced wehn user tried to truncate a table which is has an ongoing row level ttl change. We still don't support table truncates in this scenario, but a more gentle unimplemented error is returned instead of panic. 98575: cdc: use int64 for emitted bytes telemetry r=miretskiy a=jayshrivastava Previously, the stored `emitted_bytes` field was an int32, which can hold a maximum value of 2.1GB. This value is too small because the logging period is 24h and changefeeds can emit much more than 2.1GB in 24h. This change updates the field to be an int64, which solves this problem. Epic: None Release note: None 98582: ci: allow-list `BUILD_VCS_NUMBER` env var in cloud unit tests r=jlinder a=rickystewart This job was filing issues linking to the wrong commit. Epic: none Release note: None Co-authored-by: Tobias Grieger <[email protected]> Co-authored-by: Chengxiong Ruan <[email protected]> Co-authored-by: Jayant Shrivastava <[email protected]> Co-authored-by: Ricky Stewart <[email protected]>
The conjecture in #86547 is that a finished proposal somehow makes its
way into the proposal map, most likely by never being removed prior to
being finished.
This commit adds an assertion that we're never outright inserting
a finished proposals, and better documents the places in which we're
running a risk of violating the invariant.
It also clarifies the handling of proposals in an apply batch when a
replication change that removes the replica is encountered. I suspected
that this could lead to a case in which proposals would be finished
despite remaining in the proposals map. Upon inspection this turned out
to be incorrect - the map (at least by code inspection) is empty at that
point, so the invariant holds trivially.
Unfortunately, that leaves me without an explanation for #86547, but the
newly added invariants may prove helpful.
Touches #86547.
Epic: None
Release note: None