Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvcoord: TestNoDuplicateHeartbeatLoops is skipped #59373

Closed
tbg opened this issue Jan 25, 2021 · 0 comments · Fixed by #69814
Closed

kvcoord: TestNoDuplicateHeartbeatLoops is skipped #59373

tbg opened this issue Jan 25, 2021 · 0 comments · Fixed by #69814
Assignees
Labels
C-test-failure Broken test (automatically or manually discovered). skipped-test T-kv KV Team

Comments

@tbg
Copy link
Member

tbg commented Jan 25, 2021

This test uses a trace to determine the number of heartbeat goroutines, but the use of tracing was never reasonable and does not work any more, since the heartbeats are run in separate goroutines that do not use the test's trace span.

The test is skipped as a result.

Epic: CRDB-2557

@tbg tbg added the C-test-failure Broken test (automatically or manually discovered). label Jan 25, 2021
tbg added a commit to tbg/cockroach that referenced this issue Jan 25, 2021
Refs: cockroachdb#59373

Reason: Needs rewrite - uses tracing in illegal manner

Generated by bin/skip-test.

Release justification: non-production code changes

Release note: None
craig bot pushed a commit that referenced this issue Jan 25, 2021
59199: rowexec: minor joinreader cleanup r=yuzefovich a=yuzefovich

Release note: None

59277: build: add crdb_test_off tag temporarily to sqllite logic tests r=yuzefovich a=yuzefovich

SQLLite logic tests have been failing since we introduced the
randomizations of the batch sizes (with a timeout). It is not
immediately clear what exactly needs to be adjusted, so let's disable
the randomizations for now (so that we at least get some value out of
the tests).

Informs: #58089.

Release note: None

59374: kv/kvclient/kvcoord: skip TestNoDuplicateHeartbeatLoops r=nvanbenschoten a=tbg

Refs: #59373

Reason: Needs rewrite - uses tracing in illegal manner

Generated by bin/skip-test.

Release justification: non-production code changes

Release note: None

Co-authored-by: Yahor Yuzefovich <[email protected]>
Co-authored-by: Tobias Grieger <[email protected]>
tbg added a commit to tbg/cockroach that referenced this issue Jan 25, 2021
Refs: cockroachdb#59373

Reason: Needs rewrite - uses tracing in illegal manner

Generated by bin/skip-test.

Release justification: non-production code changes

Release note: None
@jlinder jlinder added the T-kv KV Team label Jun 16, 2021
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Sep 3, 2021
Fixes cockroachdb#59373.

This commit deflakes and unskips `TestNoDuplicateHeartbeatLoops`. It does so by
updating the trace matching that the test performs to properly match on the
trace message output on the request's goroutine when spawning a heartbeat loop.

Release justification: testing only
@nvanbenschoten nvanbenschoten self-assigned this Sep 3, 2021
craig bot pushed a commit that referenced this issue Sep 4, 2021
69812: kv: deflake and unskip TestPushTxnHeartbeatTimeout r=nvanbenschoten a=nvanbenschoten

Fixes #62860.

In #62860, we found that `TestPushTxnHeartbeatTimeout` became flaky after d77c3ed merged, but only when run with Bazel. That change having such an effect did not make a lot of sense. Furthermore, the fact that this only failed under Bazel and not under normal `go test` was even more fascinating. This was highly reproducible, and remained so until the merge of 4321fa8. After that change, the flakiness disappeared. Again, this was surprising, as the change was unrelated.

It turns out that this flakiness was due to the way that Bazel runs tests in the same package. Adding tests like those two commits did must have shuffled the order and grouping of tests into Bazel shards (https://docs.bazel.build/versions/main/test-encyclopedia.html#test-sharding), which revealed a bug when `TestStoreRangeMergeInFlightTxns` and `TestPushTxnHeartbeatTimeout` were run in the same shard and in that order. `TestStoreRangeMergeInFlightTxns` was unintentionally mutating the `txnwait.TxnLivenessThreshold` global variable, which was tripping up `TestPushTxnHeartbeatTimeout`.

This commit deflakes `TestPushTxnHeartbeatTimeout` by fixing a bug in `TestStoreRangeMergeInFlightTxns`, which is quite amusing. It then unskips the test.

Release justification: testing only

69814: kv: deflake and unskip TestNoDuplicateHeartbeatLoops r=nvanbenschoten a=nvanbenschoten

Fixes #59373.

This commit deflakes and unskips `TestNoDuplicateHeartbeatLoops`. It does so by updating the trace matching that the test performs to properly match on the trace message output on the request's goroutine when spawning a heartbeat loop.

I'm curious whether there's an objection to this approach. #59373 uses some pretty strong language about the use of tracing like this in tests. We do make decent use of tracing in tests these days (e.g. see uses of `OnlyFollowerReads`), but I can also see arguments against this and I've personally never been the biggest fan of this approach. Any way, I've assigned the two who will likely have the strongest opinions about this for review. Should be fun.

Release justification: testing only

Co-authored-by: Nathan VanBenschoten <[email protected]>
@craig craig bot closed this as completed in 4a71d7a Sep 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-test-failure Broken test (automatically or manually discovered). skipped-test T-kv KV Team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants