Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

leaktest: print non-closed stoppers #51413

Merged
merged 2 commits into from
Jul 21, 2020
Merged

Conversation

tbg
Copy link
Member

@tbg tbg commented Jul 14, 2020

In #51363 we saw
use-after-close of an engine, which I tracked down to LocalTestCluster
using more than one stopper (only one of which was actually stopped).

This is likely a larger class of problem, so I added a facility to
ensure that no stoppers are currently active, which is invoked from
leaktest. However, due to the large amount of work that is necessary
to sanitize the whole testing codebase, this is for now only advisory
(i.e. logs, does not error). Nevertheless, this will be useful in tests
in which a use-after-stop actually causes problems.

For the LocalTestCluster, I made sure that the warning is no longer
printed, which mainly entailed not handing a different stopper to the kv
DB than that used for the main server. This explains the failures we
have been seeing, which always involved a KV client request.

Closes #51363.

Release note: None

@tbg tbg requested a review from andreimatei July 14, 2020 13:51
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @tbg)


pkg/testutils/localtestcluster/local_test_cluster.go, line 115 at r1 (raw file):

	ltc.tester = t
	if ltc.Stopper == nil {

can ltc.Stopper be non-nil? Does someone assign it manually? I don't see anyone doing it. How about passing it in to this Start()?


pkg/util/stop/stopper.go, line 66 at r1 (raw file):

type stopperWithStack struct {
	s     *Stopper
	stack []byte

comment this pls or give it a more suggestive name. Can it be a string?

@tbg tbg force-pushed the leaktest-stoppers branch from 1c2f019 to 37c3021 Compare July 17, 2020 16:19
@tbg
Copy link
Member Author

tbg commented Jul 17, 2020

PTAL @andreimatei

I did not add the task yet since I am running out of time (back only on Tuesday). Feel free to add the task separately in another PR. However I think this PR alone will also already help since we're not going to have one-off stoppers active longer than the test servers' any more.

@tbg tbg force-pushed the leaktest-stoppers branch from 37c3021 to 3f9b393 Compare July 17, 2020 16:21
Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @tbg)

In cockroachdb#51363 we saw
use-after-close of an engine, which I tracked down to `LocalTestCluster`
using more than one stopper (only one of which was actually stopped).

This is likely a larger class of problem, so I added a facility to
ensure that no stoppers are currently active, which is invoked from
leaktest. However, due to the large amount of work that is necessary
to sanitize the whole testing codebase, this is for now only advisory
(i.e. logs, does not error). Nevertheless, this will be useful in tests
in which a use-after-stop actually causes problems.

For the LocalTestCluster, I made sure that the warning is no longer
printed, which mainly entailed not handing a different stopper to the kv
DB than that used for the main server. This explains the failures we
have been seeing, which always involved a KV client request.

Closes cockroachdb#51363.

Release note: None
DefaultDBContext previously created an internal stopper, but nobody was
ever stopping it. Nothing interesting happens in this change.

Release note: None
Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bors r+

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @tbg)

@craig
Copy link
Contributor

craig bot commented Jul 21, 2020

Build failed (retrying...)

Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bors r+

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @tbg)

@craig
Copy link
Contributor

craig bot commented Jul 21, 2020

Build succeeded

@craig craig bot merged commit 4eac05a into cockroachdb:master Jul 21, 2020
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Jul 22, 2020
This looks harmless because the first stopper was never actually
used, but it was caught by cockroachdb#51413.
craig bot pushed a commit that referenced this pull request Jul 23, 2020
51807: testcluster: don't create two stoppers in StartTestCluster r=nvanbenschoten a=nvanbenschoten

This looks harmless because the first stopper was never actually used, but it was caught by #51413.

51832: dump: fix dump-backwards-compatibility remote roachtest r=pbardea a=pbardea

This commit changes the execution of the cockroach binary from
`cockroach` to `./cockroach`. This follows how other CLI roachtests run
the binary. Before this commit, this test would fail when run on a GCE
roachprod cluster, it now passes.

Release note: None

Co-authored-by: Nathan VanBenschoten <[email protected]>
Co-authored-by: Paul Bardea <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kv/kvclient/kvcoord: TestTxnCoordSenderCondenseLockSpans failed
3 participants