Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tracing: data race in isNoop #108696

Closed
pav-kv opened this issue Aug 14, 2023 · 2 comments · Fixed by #108775
Closed

tracing: data race in isNoop #108696

pav-kv opened this issue Aug 14, 2023 · 2 comments · Fixed by #108775
Assignees
Labels
A-kv-replication Relating to Raft, consensus, and coordination. A-tracing Relating to tracing in CockroachDB. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.

Comments

@pav-kv
Copy link
Collaborator

pav-kv commented Aug 14, 2023

Describe the problem

WARNING: DATA RACE
Read at 0x00c039d8fc10 by goroutine 4729:
  github.com/cockroachdb/cockroach/pkg/util/tracing.(*spanInner).isNoop()
      github.com/cockroachdb/cockroach/pkg/util/tracing/span_inner.go:60 +0x64
  github.com/cockroachdb/cockroach/pkg/util/tracing.(*Span).IsNoop()
      github.com/cockroachdb/cockroach/pkg/util/tracing/span.go:155 +0xb5
  github.com/cockroachdb/cockroach/pkg/util/tracing.(*Span).detectUseAfterFinish()
      github.com/cockroachdb/cockroach/pkg/util/tracing/span.go:171 +0xb7
  github.com/cockroachdb/cockroach/pkg/util/tracing.(*Span).RecordingType()
      github.com/cockroachdb/cockroach/pkg/util/tracing/span.go:444 +0x2e
  github.com/cockroachdb/cockroach/pkg/util/tracing.(*Span).IsVerbose()
      github.com/cockroachdb/cockroach/pkg/util/tracing/span.go:452 +0x4e
  github.com/cockroachdb/cockroach/pkg/util/log.getSpanOrEventLog()
      github.com/cockroachdb/cockroach/pkg/util/log/trace.go:92 +0x52
  github.com/cockroachdb/cockroach/pkg/util/log.logfDepthInternal()
      github.com/cockroachdb/cockroach/pkg/util/log/channels.go:101 +0x60a
  github.com/cockroachdb/cockroach/pkg/util/log.logfDepth()
      github.com/cockroachdb/cockroach/pkg/util/log/channels.go:39 +0x124
  github.com/cockroachdb/cockroach/pkg/util/log.vEventf()
      github.com/cockroachdb/cockroach/pkg/util/log/trace.go:210 +0xce
  github.com/cockroachdb/cockroach/pkg/util/log.VEvent()
      github.com/cockroachdb/cockroach/pkg/util/log/trace.go:231 +0x17cf
  github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*propBuf).FlushLockedWithRaftGroup()
      github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_proposal_buf.go:601 +0x1785
  github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).handleRaftReadyRaftMuLocked.func2()
      github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_raft.go:782 +0xfc
  github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).withRaftGroupLocked()
      github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_raft.go:2005 +0xad
  github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).handleRaftReadyRaftMuLocked()
      github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_raft.go:779 +0x39a
  github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).handleRaftReady()
      github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_raft.go:735 +0x21b
  github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).processReady()
      github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/store_raft.go:689 +0x1ef
  github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*raftSchedulerShard).worker()
      github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/scheduler.go:418 +0x2f5
  github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*raftScheduler).Start.func2()
      github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/scheduler.go:321 +0x9a
  github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx.func2()
      github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:484 +0x1f6

Previous write at 0x00c039d8fc10 by goroutine 19833:
  github.com/cockroachdb/cockroach/pkg/util/tracing.(*Span).reset()
      github.com/cockroachdb/cockroach/pkg/util/tracing/span.go:686 +0x1da
  github.com/cockroachdb/cockroach/pkg/util/tracing.(*Tracer).newSpan()
      github.com/cockroachdb/cockroach/pkg/util/tracing/tracer.go:1003 +0x1ba
  github.com/cockroachdb/cockroach/pkg/util/tracing.(*Tracer).startSpanGeneric()
      github.com/cockroachdb/cockroach/pkg/util/tracing/tracer.go:1208 +0xba5
  github.com/cockroachdb/cockroach/pkg/util/tracing.(*Tracer).StartSpanCtx()
      github.com/cockroachdb/cockroach/pkg/util/tracing/tracer.go:1084 +0x104
  github.com/cockroachdb/cockroach/pkg/util/tracing.(*Tracer).StartSpan()
      github.com/cockroachdb/cockroach/pkg/util/tracing/tracer.go:968 +0x378
  github.com/cockroachdb/cockroach/pkg/util/tracing/grpcinterceptor.ClientInterceptor.func2()
      github.com/cockroachdb/cockroach/pkg/util/tracing/grpcinterceptor/grpc_interceptor.go:235 +0x2e5
  google.golang.org/grpc.(*ClientConn).Invoke()
      google.golang.org/grpc/external/org_golang_google_grpc/call.go:35 +0x26d
  github.com/cockroachdb/cockroach/pkg/rpc.(*heartbeatClient).Ping()
      github.com/cockroachdb/cockroach/pkg/rpc/bazel-out/k8-fastbuild/bin/pkg/rpc/rpc_go_proto_/github.com/cockroachdb/cockroach/pkg/rpc/heartbeat.pb.go:408 +0xf4
  github.com/cockroachdb/cockroach/pkg/rpc.runSingleHeartbeat.func2()
      github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/peer.go:396 +0xca
  github.com/cockroachdb/cockroach/pkg/util/timeutil.RunWithTimeout()
      github.com/cockroachdb/cockroach/pkg/util/timeutil/timeout.go:29 +0x12d
  github.com/cockroachdb/cockroach/pkg/rpc.runSingleHeartbeat()
      github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/peer.go:401 +0x732
  github.com/cockroachdb/cockroach/pkg/rpc.(*peer).runHeartbeatUntilFailure()
      github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/peer.go:482 +0x5bb
  github.com/cockroachdb/cockroach/pkg/rpc.(*peer).runOnce()
      github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/peer.go:344 +0x589
  github.com/cockroachdb/cockroach/pkg/rpc.(*peer).run()
      github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/peer.go:281 +0x3da
  github.com/cockroachdb/cockroach/pkg/rpc.(*peer).launch.func1()
      github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/peer.go:241 +0x64
  github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx.func2()
      github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:484 +0x1f6

Goroutine 4729 (running) created at:
  github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx()
      github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:475 +0x619
  github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*raftScheduler).Start()
      github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/scheduler.go:313 +0x406
  github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).processRaft()
      github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/store_raft.go:757 +0xc4
  github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).Start()
      github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/store.go:2107 +0x1364
  github.com/cockroachdb/cockroach/pkg/server.(*Node).start()
      github.com/cockroachdb/cockroach/pkg/server/node.go:658 +0x1699
  github.com/cockroachdb/cockroach/pkg/server.(*topLevelServer).PreStart()
      github.com/cockroachdb/cockroach/pkg/server/server.go:1822 +0x34ba
  github.com/cockroachdb/cockroach/pkg/server.(*testServer).Start()
      github.com/cockroachdb/cockroach/pkg/server/testserver.go:677 +0xfb
  github.com/cockroachdb/cockroach/pkg/testutils/testcluster.(*TestCluster).startServer()
      github.com/cockroachdb/cockroach/pkg/testutils/testcluster/testcluster.go:632 +0xf2
  github.com/cockroachdb/cockroach/pkg/testutils/testcluster.(*TestCluster).Start()
      github.com/cockroachdb/cockroach/pkg/testutils/testcluster/testcluster.go:407 +0x550
  github.com/cockroachdb/cockroach/pkg/testutils/serverutils.StartNewTestCluster()
      github.com/cockroachdb/cockroach/pkg/testutils/serverutils/test_cluster_shim.go:286 +0x144
  github.com/cockroachdb/cockroach/pkg/testutils/testcluster.NewTestCluster()
      github.com/cockroachdb/cockroach/pkg/testutils/testcluster/testcluster.go:364 +0x15b0
  github.com/cockroachdb/cockroach/pkg/testutils/testcluster.NewTestCluster()
      github.com/cockroachdb/cockroach/pkg/testutils/testcluster/testcluster.go:364 +0x15b0
  github.com/cockroachdb/cockroach/pkg/testutils/testcluster.testClusterFactoryImpl.NewTestCluster()
      github.com/cockroachdb/cockroach/pkg/testutils/testcluster/testcluster.go:1949 +0x8b
  github.com/cockroachdb/cockroach/pkg/testutils/testcluster.(*testClusterFactoryImpl).NewTestCluster()
      <autogenerated>:1 +0xaf
  github.com/cockroachdb/cockroach/pkg/testutils/serverutils.NewTestCluster()
      github.com/cockroachdb/cockroach/pkg/testutils/serverutils/test_cluster_shim.go:300 +0x118
  github.com/cockroachdb/cockroach/pkg/testutils/serverutils.StartNewTestCluster()
      github.com/cockroachdb/cockroach/pkg/testutils/serverutils/test_cluster_shim.go:285 +0x56
  github.com/cockroachdb/cockroach/pkg/sql_test.testCase.runTest()
      github.com/cockroachdb/cockroach/pkg/sql_test/pkg/sql/multitenant_admin_function_test.go:250 +0x1f9
  github.com/cockroachdb/cockroach/pkg/sql_test.TestRelocateNonVoters.func1()
      github.com/cockroachdb/cockroach/pkg/sql_test/pkg/sql/multitenant_admin_function_test.go:886 +0x1e4
  testing.tRunner()
      GOROOT/src/testing/testing.go:1446 +0x216
  testing.(*T).Run.func1()
      GOROOT/src/testing/testing.go:1493 +0x47

Goroutine 19833 (running) created at:
  github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx()
      github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:475 +0x619
  github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask()
      github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:346 +0x386
  github.com/cockroachdb/cockroach/pkg/rpc.(*peer).launch()
      github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/peer.go:240 +0x46
  github.com/cockroachdb/cockroach/pkg/rpc.(*Context).newPeer.func2.1()
      github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/peer.go:197 +0x64
  runtime/pprof.Do()
      GOROOT/src/runtime/pprof/runtime.go:40 +0x122
  github.com/cockroachdb/cockroach/pkg/rpc.(*Context).newPeer.func2()
      github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/peer.go:196 +0x3bd
  github.com/cockroachdb/cockroach/pkg/util/circuit.(*Breaker).maybeTriggerProbe()
      github.com/cockroachdb/cockroach/pkg/util/circuit/circuitbreaker.go:257 +0x321
  github.com/cockroachdb/cockroach/pkg/util/circuit.(*Breaker).Probe()
      github.com/cockroachdb/cockroach/pkg/util/circuit/circuitbreaker.go:96 +0x545
  github.com/cockroachdb/cockroach/pkg/rpc.(*Context).grpcDialNodeInternal()
      github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/context.go:2163 +0x51c
  github.com/cockroachdb/cockroach/pkg/rpc.(*Context).GRPCDialNode()
      github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/context.go:2114 +0x1f3
  github.com/cockroachdb/cockroach/pkg/rpc/nodedialer.(*Dialer).dial()
      github.com/cockroachdb/cockroach/pkg/rpc/nodedialer/nodedialer.go:170 +0xed
  github.com/cockroachdb/cockroach/pkg/rpc/nodedialer.(*Dialer).Dial()
      github.com/cockroachdb/cockroach/pkg/rpc/nodedialer/nodedialer.go:103 +0x1dd
  github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*RaftTransport).startProcessNewQueue.func2()
      github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/raft_transport.go:979 +0x294
  runtime/pprof.Do()
      GOROOT/src/runtime/pprof/runtime.go:40 +0x122
  github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*RaftTransport).startProcessNewQueue.func3()
      github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/raft_transport.go:1001 +0x28c
  github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx.func2()
      github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:484 +0x1f6

To Reproduce

Caught while stressing TestRelocateNonVoters in pkg/sql:sql_test, but the failure doesn't seem specific to this test.

Environment:

Jira issue: CRDB-30598

@pav-kv pav-kv added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-tracing Relating to tracing in CockroachDB. labels Aug 14, 2023
@pav-kv
Copy link
Collaborator Author

pav-kv commented Aug 14, 2023

Caught this while stressing the test to catch the "use of Span after Finish" bug in #108534.

Conjecture: Since the span is being used after Finish, the following can be happening:

  • The Span is Finish-ed and returned to the pool.
  • Another unrelated Span is being created by reusing the struct in the pool.
  • We try to use the old Span after Finish, and hence there is a race while reading the variables that are now being used by an unrelated Span.

Fixing #108534 should fix this issue too.

@pav-kv pav-kv added the A-kv-replication Relating to Raft, consensus, and coordination. label Aug 14, 2023
@blathers-crl
Copy link

blathers-crl bot commented Aug 14, 2023

cc @cockroachdb/replication

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-replication Relating to Raft, consensus, and coordination. A-tracing Relating to tracing in CockroachDB. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant