Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: TestRejectedLeaseDoesntDictateClosedTimestamp deadlocks after Fatal #65109

Closed
RaduBerinde opened this issue May 13, 2021 · 2 comments · Fixed by #65158
Closed

kvserver: TestRejectedLeaseDoesntDictateClosedTimestamp deadlocks after Fatal #65109

RaduBerinde opened this issue May 13, 2021 · 2 comments · Fixed by #65158
Assignees
Labels
C-test-failure Broken test (automatically or manually discovered). skipped-test

Comments

@RaduBerinde
Copy link
Member

We have been seeing various instances of the kvserver tests timing out, with this test still running. I investigated the stack traces and found this:

goroutine 1109203 [chan receive, 16 minutes]:
github.com/cockroachdb/cockroach/pkg/kv/kvserver_test.TestRejectedLeaseDoesntDictateClosedTimestamp.func7(0xc00b84b080, 0xc0043b67e0)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_closedts_test.go:573 +0x4f
runtime.Goexit()
	/usr/local/go/src/runtime/panic.go:636 +0x146
testing.(*common).FailNow(0xc00b84b080)
	/usr/local/go/src/testing/testing.go:732 +0x65
testing.(*common).Fatalf(0xc00b84b080, 0x64a71a3, 0x2c, 0xc002517530, 0x1, 0x1)
	/usr/local/go/src/testing/testing.go:806 +0x9e
github.com/cockroachdb/cockroach/pkg/kv/kvserver_test.TestRejectedLeaseDoesntDictateClosedTimestamp(0xc00b84b080)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_closedts_test.go:595 +0x1d6c
testing.tRunner(0xc00b84b080, 0x6ac85d0)
	/usr/local/go/src/testing/testing.go:1123 +0x203
created by testing.(*T).Run
	/usr/local/go/src/testing/testing.go:1168 +0x5bc

So we Fatal here:

t.Fatalf("lease request unexpectedly finished. err: %v", err)

But when we unwind we get stuck in this deferred function:

defer func() {
require.NoError(t, <-transferErrCh)
}()

@RaduBerinde RaduBerinde added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label May 13, 2021
RaduBerinde added a commit to RaduBerinde/cockroach that referenced this issue May 13, 2021
Refs: cockroachdb#65109

Reason: flaky test

Generated by bin/skip-test.

Release justification: non-production code changes

Release note: None
@RaduBerinde RaduBerinde added C-test-failure Broken test (automatically or manually discovered). skipped-test and removed C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. labels May 13, 2021
craig bot pushed a commit that referenced this issue May 13, 2021
65057: storage/cloud: replace WriteFile(Reader) with Writer r=dt a=dt

This changes the ExternalStorage API's writing method from WriteFile which takes an io.Reader
and writes its content to the requested destination or returns an error encountered while doing
so to instead have Writer() which returns an io.Writer pointing to the destination that can be
written to later and then closed to finish the upload (or CloseWithError'ed to cancel it).

All existing callers use the shim and are unaffected, but can later choose to change to a
push-based model and use the Writer directly. This is left to a follow-up change.

(note: first commit is just adding a shim for existing callers and switching them)

Release note: none.

65110: kv/kvserver: skip TestRejectedLeaseDoesntDictateClosedTimestamp r=RaduBerinde a=RaduBerinde

Refs: #65109

Reason: flaky test

Generated by bin/skip-test.

Release justification: non-production code changes

Release note: None

Co-authored-by: David Taylor <[email protected]>
Co-authored-by: Radu Berinde <[email protected]>
@andreimatei
Copy link
Contributor

Ugh I'll do something. Thanks for debugging.

@irfansharif irfansharif removed their assignment May 17, 2021
craig bot pushed a commit that referenced this issue May 18, 2021
65158: kvserver: fix TestRejectedLeaseDoesntDictateClosedTimestamp  r=andreimatei a=andreimatei

This test was flaky (and skipped) in the case where cluster startup is
taking sufficiently many seconds such that, by the time it finishes,
the original lease on r1 is either expired or at least almost expired
such that the test inadvertendly kicks off a preemptive lease refresh.
The test tries to control exactly what lease requests get proposed, so
it doesn't like these refreshes happening.
The fix is to disable the preemptive refreshes, and also to make lease
duration longer.

Fixes #65109

cc @cockroachdb/kv 

65380: kvserver: deflake TestRollbackSyncRangedIntentResolution r=andreimatei a=erikgrinaker

Release note: None

65390: roachtest: update version map and fixtures r=j-low a=j-low

This commit adds the recently released 20.2.10
to the version map in PredecessorVersion.

Release note: None (testing change)

65394: jobs: don't check the number of deleted rows r=postamar a=postamar

When attempting to delete expired job records it's entirely possible to
legitimately delete less records than expected. This can happen when
another node is concurrently attempting cleanup. Previously, we returned
an assertion failure in these cases.

Fixes #65048.

Release note: None

65417: authors add Stephen Mooney to authors r=rail a=stephenmooney1976

Release notes: None

Co-authored-by: Andrei Matei <[email protected]>
Co-authored-by: Erik Grinaker <[email protected]>
Co-authored-by: Joseph Lowinske <[email protected]>
Co-authored-by: Marius Posta <[email protected]>
Co-authored-by: stephenmooney1976 <[email protected]>
@craig craig bot closed this as completed in 0a040d6 May 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-test-failure Broken test (automatically or manually discovered). skipped-test
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants