Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg/server/storage_api/storage_api_test: TestAdminDecommissionedOperations failed #107875

Closed
cockroach-teamcity opened this issue Jul 31, 2023 · 2 comments · Fixed by #108227
Closed
Assignees
Labels
A-testing Testing tools and infrastructure branch-master Failures and bugs on the master branch. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. T-server-and-security DB Server & Security
Milestone

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Jul 31, 2023

pkg/server/storage_api/storage_api_test.TestAdminDecommissionedOperations failed with artifacts on master @ e57e9742146b7fac7f7530fa6224e11121ada92f:

=== RUN   TestAdminDecommissionedOperations
    test_log_scope.go:167: test logs captured to: /artifacts/tmp/_tmp/800529510c8a77f305555fe4848c57f0/logTestAdminDecommissionedOperations246959236
    test_log_scope.go:81: use -show-logs to present logs inline
=== CONT  TestAdminDecommissionedOperations
    decommission_test.go:978: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/800529510c8a77f305555fe4848c57f0/logTestAdminDecommissionedOperations246959236
--- FAIL: TestAdminDecommissionedOperations (21.23s)
=== RUN   TestAdminDecommissionedOperations/Drain
    decommission_test.go:973: 
        	Error Trace:	pkg/server/storage_api/storage_api_test_test/pkg/server/storage_api/decommission_test.go:973
        	            				github.com/cockroachdb/cockroach/pkg/testutils/soon.go:75
        	            				github.com/cockroachdb/cockroach/pkg/util/retry/retry.go:213
        	            				github.com/cockroachdb/cockroach/pkg/testutils/soon.go:81
        	            				github.com/cockroachdb/cockroach/pkg/testutils/soon.go:60
        	            				pkg/server/storage_api/storage_api_test_test/pkg/server/storage_api/decommission_test.go:955
        	Error:      	Not equal: 
        	            	expected: 0x2
        	            	actual  : 0x4
        	Test:       	TestAdminDecommissionedOperations/Drain
        	Messages:   	rpc error: code = DeadlineExceeded desc = context deadline exceeded
    --- FAIL: TestAdminDecommissionedOperations/Drain (10.00s)

Parameters: TAGS=bazel,gss,deadlock , stress=true

Help

See also: How To Investigate a Go Test Failure (internal)

/cc @cockroachdb/unowned

This test on roachdash | Improve this report!

Jira issue: CRDB-30237

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. labels Jul 31, 2023
@cockroach-teamcity cockroach-teamcity added this to the 23.2 milestone Jul 31, 2023
@maryliag maryliag added the T-storage Storage Team label Aug 3, 2023
@blathers-crl blathers-crl bot added the A-storage Relating to our storage engine (Pebble) on-disk storage. label Aug 3, 2023
@blathers-crl blathers-crl bot added the T-kv KV Team label Aug 4, 2023
@jbowens jbowens added A-kv-server Relating to the KV-level RPC server T-server-and-security DB Server & Security and removed A-storage Relating to our storage engine (Pebble) on-disk storage. T-storage Storage Team labels Aug 4, 2023
@nvanbenschoten
Copy link
Member

The test expected grpc/codes.Unknown, but got grpc/codes.DeadlineExceeded. We also see that in the failure message, which is context deadline exceeded.

@nvanbenschoten nvanbenschoten self-assigned this Aug 4, 2023
@nvanbenschoten
Copy link
Member

nvanbenschoten commented Aug 4, 2023

The problem here seems to be that we give the drain a 10s timeout, but also leave server.shutdown.query_wait at its default value of 10s. This means that if any queries run for long enough that they need to be canceled (e.g. a slow list-tenants query), the test will time out before the cancellation.

@nvanbenschoten nvanbenschoten added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-testing Testing tools and infrastructure and removed A-kv-server Relating to the KV-level RPC server labels Aug 6, 2023
craig bot pushed a commit that referenced this issue Aug 9, 2023
108227: kv: deflake `TestAdminDecommissionedOperations` by setting `server.shutdown.query_wait` r=nvanbenschoten a=nvanbenschoten

Fixes #107875.

This commit deflakes `TestAdminDecommissionedOperations` by setting the `server.shutdown.query_wait` cluster setting to 0s. This ensures that SQL queries are immediately canceled during the drain, instead of being let run for up to 10s (the previous test timeout) before cancellation.

The commit also increases the timeouts in the test to the default `testutils` timeouts, to avoid flakes when CI is slow.

Release note: None

Co-authored-by: Nathan VanBenschoten <[email protected]>
@craig craig bot closed this as completed in 73f540d Aug 9, 2023
@exalate-issue-sync exalate-issue-sync bot removed the T-kv KV Team label Aug 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-testing Testing tools and infrastructure branch-master Failures and bugs on the master branch. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. T-server-and-security DB Server & Security
Projects
None yet
4 participants