Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql/sem/builtins: TestSerialNormalizationWithUniqueUnorderedID failed [deadlock in handleRaftReady] #115541

Closed
cockroach-teamcity opened this issue Dec 4, 2023 · 3 comments · Fixed by #115685
Assignees
Labels
A-testing Testing tools and infrastructure branch-master Failures and bugs on the master branch. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. P-2 Issues/test failures with a fix SLA of 3 months T-kv KV Team
Milestone

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Dec 4, 2023

sql/sem/builtins.TestSerialNormalizationWithUniqueUnorderedID failed with artifacts on master @ b95ad1dc5d1c00ab1b99a13cfd5014c1093bef89:

created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx in goroutine 1297359
	github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:475 +0x415
Other goroutines holding locks:
goroutine 1308985 lock 0xc001403078
github.com/cockroachdb/cockroach/pkg/kv/kvserver/concurrency/lock_table.go:4720 concurrency.(*lockTableImpl).verify ??? <<<<<
github.com/cockroachdb/cockroach/pkg/kv/kvserver/concurrency/lock_table.go:4719 concurrency.(*lockTableImpl).verify ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/concurrency/verifiable_lock_table.go:98 concurrency.verifyingLockTable.UpdateLocks ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/concurrency/concurrency_manager.go:551 concurrency.(*managerImpl).OnLockUpdated ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_proposal.go:795 kvserver.(*Replica).handleReadWriteLocalEvalResult ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_application_state_machine.go:225 kvserver.(*replicaStateMachine).ApplySideEffects ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/apply/cmd.go:214 apply.mapCheckedCmdIter ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/apply/task.go:295 apply.(*Task).applyOneBatch ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/apply/task.go:250 apply.(*Task).ApplyCommittedEntries ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_application_state_machine.go:429 kvserver.(*replicaStateMachine).moveStats ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_raft.go:739 kvserver.(*Replica).handleRaftReady ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/store_raft.go:688 kvserver.(*Store).processReady ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/scheduler.go:420 kvserver.(*raftSchedulerShard).worker ???
GOROOT/src/sync/waitgroup.go:86 sync.(*WaitGroup).Done ???
github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:484 stop.(*Stopper).RunAsyncTaskEx.func2 ???

goroutine 39 lock 0xc00002d750
github.com/cockroachdb/cockroach/pkg/util/admission/work_queue.go:861 admission.(*WorkQueue).hasWaitingRequests ??? <<<<<
github.com/cockroachdb/cockroach/pkg/util/admission/work_queue.go:860 admission.(*WorkQueue).hasWaitingRequests ???
github.com/cockroachdb/cockroach/pkg/util/admission/granter.go:255 admission.(*tokenGranter).requesterHasWaitingRequests ???
github.com/cockroachdb/cockroach/pkg/util/admission/grant_coordinator.go:924 admission.(*GrantCoordinator).tryGrantLocked ???
github.com/cockroachdb/cockroach/pkg/util/admission/grant_coordinator.go:767 admission.(*GrantCoordinator).CPULoad ???
github.com/cockroachdb/cockroach/pkg/util/goschedstats/runnable.go:241 goschedstats.(*schedStatsTicker).getStatsOnTick ???
github.com/cockroachdb/cockroach/pkg/util/goschedstats/runnable.go:171 goschedstats.init.0.func1 ???

goroutine 39 lock 0xc0018b5a60
github.com/cockroachdb/cockroach/pkg/util/admission/grant_coordinator.go:740 admission.(*GrantCoordinator).CPULoad ??? <<<<<
github.com/cockroachdb/cockroach/pkg/util/admission/grant_coordinator.go:739 admission.(*GrantCoordinator).CPULoad ???
github.com/cockroachdb/cockroach/pkg/util/goschedstats/runnable.go:241 goschedstats.(*schedStatsTicker).getStatsOnTick ???
github.com/cockroachdb/cockroach/pkg/util/goschedstats/runnable.go:171 goschedstats.init.0.func1 ???

goroutine 35848949 lock 0xc00001bf18
github.com/cockroachdb/cockroach/pkg/kv/kvserver/allocator/storepool/store_pool.go:1074 storepool.(*StorePool).GetStoreList ??? <<<<<
github.com/cockroachdb/cockroach/pkg/kv/kvserver/allocator/storepool/store_pool.go:1073 storepool.(*StorePool).GetStoreList ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/allocator/allocatorimpl/allocator.go:1674 allocatorimpl.Allocator.RebalanceTarget ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/allocator/allocatorimpl/allocator.go:1883 allocatorimpl.Allocator.RebalanceVoter ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/allocator/plan/replicate.go:172 plan.ReplicaPlanner.ShouldPlanChange ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replicate_queue.go:637 kvserver.(*replicateQueue).shouldQueue ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/queue.go:714 kvserver.(*baseQueue).maybeAdd ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/queue.go:571 kvserver.baseQueueHelper.MaybeAdd ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/queue.go:623 kvserver.(*replicateQueue).MaybeAddAsync.(*baseQueue).MaybeAddAsync.func1 ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/queue.go:610 kvserver.(*baseQueue).Async.func1 ???
github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:484 stop.(*Stopper).RunAsyncTaskEx.func2 ???



Parameters: TAGS=bazel,gss,deadlock , stress=true

Help

See also: How To Investigate a Go Test Failure (internal)

/cc @cockroachdb/sql-foundations

This test on roachdash | Improve this report!

Jira issue: CRDB-34090

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) labels Dec 4, 2023
@cockroach-teamcity cockroach-teamcity added this to the 24.1 milestone Dec 4, 2023
@cockroach-teamcity
Copy link
Member Author

sql/sem/builtins.TestSerialNormalizationWithUniqueUnorderedID failed with artifacts on master @ 8bcaae9db5fe148536a24c2b0b99ef6691687ebc:

POTENTIAL DEADLOCK:
Previous place where the lock was grabbed
goroutine 1308961 lock 0xc003d40658
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_raft.go:738 kvserver.(*Replica).handleRaftReady ??? <<<<<
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_raft.go:737 kvserver.(*Replica).handleRaftReady ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/store_raft.go:688 kvserver.(*Store).processReady ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/scheduler.go:420 kvserver.(*raftSchedulerShard).worker ???
GOROOT/src/sync/waitgroup.go:86 sync.(*WaitGroup).Done ???
github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:484 stop.(*Stopper).RunAsyncTaskEx.func2 ???

Have been trying to lock it again for more than 5m0s
goroutine 2673850 lock 0xc003d40658
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_raft.go:324 kvserver.(*Replica).evalAndPropose.func6 ??? <<<<<
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_raft.go:323 kvserver.(*Replica).evalAndPropose.func6 ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_write.go:334 kvserver.(*Replica).executeWriteBatch ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_send.go:510 kvserver.(*Replica).executeBatchWithConcurrencyRetries ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_send.go:200 kvserver.(*Replica).SendWithWriteBytes ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/store_send.go:192 kvserver.(*Store).SendWithWriteBytes ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/stores.go:202 kvserver.(*Stores).SendWithWriteBytes ???
github.com/cockroachdb/cockroach/pkg/server/node.go:1332 server.(*Node).batchInternal ???
github.com/cockroachdb/cockroach/pkg/server/node.go:1466 server.(*Node).Batch ???
github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/context.go:704 rpc.makeInternalClientAdapter.func1 ???
github.com/cockroachdb/cockroach/pkg/util/tracing/grpcinterceptor/grpc_interceptor.go:96 rpc.NewServerEx.ServerInterceptor.func12 ???
github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/context.go:814 rpc.makeInternalClientAdapter.chainUnaryServerInterceptors.bindUnaryServerInterceptorToHandler.func4 ???
github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/context.go:168 rpc.NewServerEx.func3 ???
github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/context.go:814 rpc.makeInternalClientAdapter.chainUnaryServerInterceptors.bindUnaryServerInterceptorToHandler.func4 ???
github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/auth.go:104 rpc.kvAuth.unaryInterceptor ???
github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/context.go:814 rpc.makeInternalClientAdapter.chainUnaryServerInterceptors.bindUnaryServerInterceptorToHandler.func4 ???
github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/context.go:135 rpc.NewServerEx.func1.1 ???
github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:335 stop.(*Stopper).RunTaskWithErr ???
github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/context.go:137 rpc.NewServerEx.func1 ???
github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/context.go:814 rpc.makeInternalClientAdapter.chainUnaryServerInterceptors.bindUnaryServerInterceptorToHandler.func4 ???
github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/context.go:714 rpc.makeInternalClientAdapter.func2 ???
github.com/cockroachdb/cockroach/pkg/util/tracing/grpcinterceptor/grpc_interceptor.go:227 rpc.NewContext.ClientInterceptor.func8 ???
github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/context.go:898 rpc.getChainUnaryInvoker.func1 ???
github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/context.go:784 rpc.makeInternalClientAdapter.func3 ???
<autogenerated>:0 rpc.(*internalClientAdapter).Batch ???
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/transport.go:210 kvcoord.(*grpcTransport).sendBatch ???
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/transport.go:188 kvcoord.(*grpcTransport).SendNext ???
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/dist_sender.go:2415 kvcoord.(*DistSender).sendToReplicas ???
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/dist_sender.go:1916 kvcoord.(*DistSender).sendPartialBatch ???
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/dist_sender.go:1484 kvcoord.(*DistSender).divideAndSendBatchToRanges ???
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/dist_sender.go:1100 kvcoord.(*DistSender).Send ???
github.com/cockroachdb/cockroach/pkg/kv/db.go:223 kv.(*CrossRangeTxnWrapperSender).Send ???
github.com/cockroachdb/cockroach/pkg/internal/client/requestbatcher/batcher.go:333 requestbatcher.(*RequestBatcher).sendBatch.func1.1 ???
github.com/cockroachdb/cockroach/pkg/util/timeutil/timeout.go:29 timeutil.RunWithTimeout ???
github.com/cockroachdb/cockroach/pkg/internal/client/requestbatcher/batcher.go:351 requestbatcher.(*RequestBatcher).sendBatch.func1.2 ???
github.com/cockroachdb/cockroach/pkg/internal/client/requestbatcher/batcher.go:406 requestbatcher.(*RequestBatcher).sendBatch.func1 ???
github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:484 stop.(*Stopper).RunAsyncTaskEx.func2 ???

Parameters: TAGS=bazel,gss,deadlock , stress=true

Help

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

@rafiss rafiss changed the title sql/sem/builtins: TestSerialNormalizationWithUniqueUnorderedID failed sql/sem/builtins: TestSerialNormalizationWithUniqueUnorderedID failed [deadlock in handleRaftReady] Dec 5, 2023
@rafiss rafiss added T-kv KV Team and removed T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) labels Dec 5, 2023
@rafiss
Copy link
Collaborator

rafiss commented Dec 5, 2023

Might relate to the potential deadlock in #115542

@arulajmani
Copy link
Collaborator

This isn't an actual deadlock -- the test is just slow. It takes ~30 or so seconds to run normally.

The addition of test-only verification in the lock table made this test susceptible to hitting timeouts when running under a deadlock build, which is why we're seeing these failures pop up. Given it's already skipped under race, I'm going to skip this under deadlock as well.

@arulajmani arulajmani removed the release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. label Dec 6, 2023
craig bot pushed a commit that referenced this issue Dec 6, 2023
115685: builtins: skip TestSerialNormalizationWithUniqueUnorderedID under deadlock r=erikgrinaker a=arulajmani

This test is extremely slow -- it takes ~30s to run normally. The addition of test-only verification pushed it over the edge, such that running it under the deadlock detector would cause spurious failures, so we skip it.

Closes #115541
Release note: None

Co-authored-by: Arul Ajmani <[email protected]>
@arulajmani arulajmani added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-testing Testing tools and infrastructure P-2 Issues/test failures with a fix SLA of 3 months labels Dec 6, 2023
@craig craig bot closed this as completed in 4c5caa4 Dec 6, 2023
blathers-crl bot pushed a commit that referenced this issue Dec 6, 2023
…dlock

This test is extremely slow -- it takes ~30s to run normally. The
addition of test-only verification pushed it over the edge, such that
running it under the deadlock detector would cause spurious failures,
so we skip it.

Closes #115541
Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-testing Testing tools and infrastructure branch-master Failures and bugs on the master branch. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. P-2 Issues/test failures with a fix SLA of 3 months T-kv KV Team
Projects
No open projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants