Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql/catalog/lease: TestRangefeedUpdatesHandledProperlyInTheFaceOfRaces failed #135777

Closed
cockroach-teamcity opened this issue Nov 20, 2024 · 3 comments · Fixed by #136293
Closed
Assignees
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. P-2 Issues/test failures with a fix SLA of 3 months T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Nov 20, 2024

sql/catalog/lease.TestRangefeedUpdatesHandledProperlyInTheFaceOfRaces failed with artifacts on master @ 36fb1b122380b273f58ac9e13c83e5a6c498454d:

Fatal error:

panic: test timed out after 15m0s
running tests:
	TestRangefeedUpdatesHandledProperlyInTheFaceOfRaces (14m7s)

Stack:

goroutine 587984 [running]:
testing.(*M).startAlarm.func1()
	GOROOT/src/testing/testing.go:2366 +0x30c
created by time.goFunc
	GOROOT/src/time/sleep.go:177 +0x38
Log preceding fatal error

=== RUN   TestRangefeedUpdatesHandledProperlyInTheFaceOfRaces
    test_log_scope.go:165: test logs captured to: /artifacts/tmp/_tmp/1e31bde6494190748168afe7a747530a/logTestRangefeedUpdatesHandledProperlyInTheFaceOfRaces317038366
    test_log_scope.go:76: use -show-logs to present logs inline
    test_server_shim.go:152: automatically injected an external process virtual cluster under test; see comment at top of test_server_shim.go for details.

Help

See also: How To Investigate a Go Test Failure (internal)

/cc @cockroachdb/sql-foundations

This test on roachdash | Improve this report!

Jira issue: CRDB-44703

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) labels Nov 20, 2024
@spilchen
Copy link
Contributor

We seem to hit this timeout every so often. See #131415 and #129097.

@spilchen spilchen removed the release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. label Nov 26, 2024
@spilchen
Copy link
Contributor

@spilchen spilchen self-assigned this Nov 26, 2024
@spilchen spilchen added the P-2 Issues/test failures with a fix SLA of 3 months label Nov 26, 2024
@spilchen
Copy link
Contributor

The test involves running a concurrent query and an ALTER operation on the same table. The query starts first and acquires the table descriptor lease, at which point the test suspends it. Then, the ALTER operation begins creating a new version of the descriptor. It suspends at the end of its execution, waiting for only one version of the descriptor to exist. If the new version of descriptor change is detected by the test, the query resumes, allowing the ALTER to complete.

In the failed run, the descriptor did change, as evidenced by the ALTER operation reaching the waitForOneVersion step. However, the test failed to recognize this, leaving everything suspended until it timed out. While I couldn’t pinpoint the root cause, we could enhance the debug logic to capture additional information if this issue recurs.

craig bot pushed a commit that referenced this issue Dec 2, 2024
135234: kvserver: improve TestReplicaLatchingOptimisticEvaluationKeyLimit r=arulajmani a=arulajmani

This test would fail opaquely previously, which wasn't helpful. This patch sets up tracing on the read path to help investigate future failures.

Informs #135197

Release note: None

136293: sql/catalog/lease: Add diagnostics to TestRangefeedUpdatesHandledProperlyInTheFaceOfRaces r=spilchen a=spilchen

This change enhances diagnostics for TestRangefeedUpdatesHandledProperlyInTheFaceOfRaces. The test involves a concurrent query and an ALTER operation on the same table. The query starts first, acquiring the table descriptor lease, and is then suspended by the test. Next, the ALTER operation begins, creating a new version of the descriptor. It pauses at the end of its execution, waiting for only one version of the descriptor to remain. When the new descriptor version is detected, the query resumes, allowing the ALTER to complete.

In the failure case, the test did not detect the new version of the descriptor, even though the ALTER operation had already updated it and was waiting at waitForOneVersion. This change adds extra logging to capture the descriptor changes observed during the test, helping diagnose the issue if it recurs.

Epic: none
Closes #135777
Release note: none

Co-authored-by: Arul Ajmani <[email protected]>
Co-authored-by: Matt Spilchen <[email protected]>
@craig craig bot closed this as completed in 82ddf8e Dec 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. P-2 Issues/test failures with a fix SLA of 3 months T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Projects
None yet
2 participants