Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: prevent finished proposal from being present in proposals map #94825

Merged
merged 2 commits into from
Feb 9, 2023

Conversation

tbg
Copy link
Member

@tbg tbg commented Jan 6, 2023

The conjecture in #86547 is that a finished proposal somehow makes its
way into the proposal map, most likely by never being removed prior to
being finished.

This commit adds an assertion that we're never outright inserting
a finished proposals, and better documents the places in which we're
running a risk of violating the invariant.

It also clarifies the handling of proposals in an apply batch when a
replication change that removes the replica is encountered. I suspected
that this could lead to a case in which proposals would be finished
despite remaining in the proposals map. Upon inspection this turned out
to be incorrect - the map (at least by code inspection) is empty at that
point, so the invariant holds trivially.

Unfortunately, that leaves me without an explanation for #86547, but the
newly added invariants may prove helpful.

Touches #86547.

Epic: None
Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@tbg tbg marked this pull request as ready for review January 9, 2023 10:30
@tbg tbg requested a review from a team January 9, 2023 10:30
@tbg tbg requested a review from a team as a code owner January 9, 2023 10:30
@tbg tbg requested review from nvanbenschoten and pav-kv January 9, 2023 10:31
@erikgrinaker
Copy link
Contributor

I'm not a huge fan of these fatal assertions, since they can be far worse than the problem they try to prevent when hit in production. Can we come up with a scheme for assertions that are compiled out in production builds?

@tbg
Copy link
Member Author

tbg commented Jan 10, 2023

Can we come up with a scheme for assertions that are compiled out in production builds?

Fair point. We can

if buildutil.CrdbTestBuild && (...) {

}

everywhere, it's a bit difficult to have generic helpers in Go because that usually costs you in allocations, etc.

I filed #94979 to track this, we should do a proper pass for stability.

@tbg tbg force-pushed the finish-application branch from 28856d7 to c22b5af Compare January 10, 2023 07:41
@tbg
Copy link
Member Author

tbg commented Jan 10, 2023

Put the assertion behind the build tag in this PR.

@pav-kv
Copy link
Collaborator

pav-kv commented Jan 10, 2023

if buildutil.CrdbTestBuild && (...) {

}

Alternatively, could do something like:

if (...) {
	buildutil.AlmostFatalf("something happened")
}

...

func AlmostFatalf(...) {
	if CrdbTestBuild {
		log.Fatalf(...)
	}
	log.Errorf(...)
}

or:

if buildutil.CrdbTestBuild {
	assert.Equal(a, b) // will these be optimized out if CrdbTestBuild == false?
	assert.True(c)
}

...

package assert // something akin to testify assert

@tbg
Copy link
Member Author

tbg commented Jan 10, 2023

I think a require-style API for runtime assertions would be useful (cc @cockroachdb/test-eng). Ideally, we wouldn't need to gate everything behind the build tag explicitly.
Anyway, this is a bit out of scope for this PR - we really need test-eng to be involved here. Note issue #86678 which indicates that the nightly roachtests don't even have the crdb_test build tag on, so by hiding it behind the tag we are basically trimming our chances of catching these problems before they reach production. We would really need a suite of roachtests that runs the debug build.
Let's please discuss the tangent on the related issue and focus here on the review, as most of the context here will be lost anyway.

@tbg
Copy link
Member Author

tbg commented Jan 10, 2023

Also filed #94986 for the general issue of making it easier to write assertions.

@tbg
Copy link
Member Author

tbg commented Jan 13, 2023

Friendly ping.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 4 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @pavelkalinnikov and @tbg)


pkg/kv/kvserver/apply/task.go line 284 at r1 (raw file):

					return rejectErr
				}
				return err

Don't we still want to return the error regardless of its type?

Also, this now "leaks" commands in the non-err removed case. Is that assumption that all other cases are fatal to the process, so we don't care about leaking commands in those cases?


pkg/kv/kvserver/replica_test.go line 8134 at r1 (raw file):

	proposalDoneCh := proposal.doneCh

	g, _, pErr := repl.concMgr.SequenceReq(ctx, nil /* guard */, concurrency.Request{

For the sake of readers, it seems more appropriate to sequence the request before the call to requestToProposal, because that's the real order of operations.

The conjecture in cockroachdb#86547 is that a finished proposal somehow makes its
way into the proposal map, most likely by never being removed prior to
being finished.

This commit adds an assertion that we're never outright *inserting*
a finished proposals, and better documents the places in which we're
running a risk of violating the invariant.

It also clarifies the handling of proposals in an apply batch when a
replication change that removes the replica is encountered. I suspected
that this could lead to a case in which proposals would be finished
despite remaining in the proposals map. Upon inspection this turned out
to be incorrect - the map (at least by code inspection) is empty at that
point, so the invariant holds trivially.

Unfortunately, that leaves me without an explanation for cockroachdb#86547, but the
newly added invariants may prove helpful.

Touches cockroachdb#86547.
@tbg tbg force-pushed the finish-application branch from c22b5af to 15b1c6a Compare February 6, 2023 07:31
Copy link
Member Author

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, RFAL.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @pavelkalinnikov)


pkg/kv/kvserver/apply/task.go line 284 at r1 (raw file):
Good catch - the error should've been returned either way, glad you caught this.

Also, this now "leaks" commands in the non-err removed case. Is that assumption that all other cases are fatal to the process, so we don't care about leaking commands in those cases?

Yes, errors from the apply loop are fatal. We may handle them in the future by calling into some version of replica removal at the caller, and replica removal would be in charge of releasing all pending commands.

@tbg tbg requested a review from nvanbenschoten February 6, 2023 07:39
Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 3 of 3 files at r2, 2 of 2 files at r3, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @pavelkalinnikov)

@tbg
Copy link
Member Author

tbg commented Feb 8, 2023

bors r=nvanbenschoten
TFTR!

@craig
Copy link
Contributor

craig bot commented Feb 8, 2023

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Feb 8, 2023

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Feb 9, 2023

Build failed:

@tbg
Copy link
Member Author

tbg commented Feb 9, 2023

bors r=nvanbenschoten
image

@craig
Copy link
Contributor

craig bot commented Feb 9, 2023

Build succeeded:

@craig craig bot merged commit 95fcd95 into cockroachdb:master Feb 9, 2023
@tbg tbg deleted the finish-application branch February 9, 2023 10:17
tbg added a commit to tbg/cockroach that referenced this pull request Mar 13, 2023
tbg added a commit to tbg/cockroach that referenced this pull request Mar 13, 2023
craig bot pushed a commit that referenced this pull request Mar 14, 2023
98481: kvserver: revert recent changes to reproposals r=pavelkalinnikov a=tbg

Reverts #97606, #97564, #94825, #94633.

- Revert "kvserver: disable assertion 'finished proposal inserted'"
- Revert "kvserver: narrow down 'finishing a proposal with outstanding reproposal'"
- Revert "kvserver: fill gaps in comment near tryReproposeWithNewLeaseIndex"
- Revert "kvserver: hoist early return out of tryReproposeWithNewLeaseIndex"
- Revert "fixup! kvserver: prevent finished proposal from being present in proposals map"
- Revert "kvserver: prevent finished proposal from being present in proposals map"
- Revert "kvserver: improve reproposal assertions and documentation"

Closes #97973.

Epic: CRDB-25287
Release Note: none


98537: sql: check row level ttl change before truncating a table r=chengxiong-ruan a=chengxiong-ruan

Fixes: #93443

Release note (sql change): This commit fixed a bug where crdb paniced wehn user tried to truncate a table which is has an ongoing row level ttl change. We still don't support table truncates in this scenario, but a more gentle unimplemented error is returned instead of panic.

98575: cdc: use int64 for emitted bytes telemetry r=miretskiy a=jayshrivastava

Previously, the stored `emitted_bytes` field was an int32, which can hold a maximum value of 2.1GB. This value is too small because the logging period is 24h and changefeeds can emit much more than 2.1GB in 24h. This change updates the field to be an int64, which solves this problem.

Epic: None
Release note: None

98582: ci: allow-list `BUILD_VCS_NUMBER` env var in cloud unit tests r=jlinder a=rickystewart

This job was filing issues linking to the wrong commit.

Epic: none
Release note: None

Co-authored-by: Tobias Grieger <[email protected]>
Co-authored-by: Chengxiong Ruan <[email protected]>
Co-authored-by: Jayant Shrivastava <[email protected]>
Co-authored-by: Ricky Stewart <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants