Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvcoord: data race in (*txnSpanRefresher).splitEndTxnAndRetrySend() #104791

Closed
jayshrivastava opened this issue Jun 13, 2023 · 1 comment · Fixed by #105480
Closed

kvcoord: data race in (*txnSpanRefresher).splitEndTxnAndRetrySend() #104791

jayshrivastava opened this issue Jun 13, 2023 · 1 comment · Fixed by #105480
Labels
A-kv Anything in KV that doesn't belong in a more specific category. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-kv KV Team

Comments

@jayshrivastava
Copy link
Contributor

jayshrivastava commented Jun 13, 2023

See failure here

TestTableNameCollision/kafka
11:36:57       No pass/skip/fail event found for test
11:36:57       === RUN   TestTableNameCollision/kafka
          helpers_test.go:826: making server as secondary tenant
          helpers_test.go:907: making kafka feed factory
      ==================
      WARNING: DATA RACE
      Write at 0x00c00b2619a0 by goroutine 1243887:
        github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*txnSpanRefresher).splitEndTxnAndRetrySend()
            github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/txn_interceptor_span_refresher.go:411 +0x764
        github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*txnSpanRefresher).maybeRefreshAndRetrySend()
            github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/txn_interceptor_span_refresher.go:337 +0x5f5
        github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*txnSpanRefresher).sendLockedWithRefreshAttempts()
            github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/txn_interceptor_span_refresher.go:275 +0x7bd
        github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*txnSpanRefresher).SendLocked()
            github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/txn_interceptor_span_refresher.go:148 +0x1e6
        github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*txnCommitter).SendLocked()
            github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/txn_interceptor_committer.go:190 +0x3e1
        github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*txnPipeliner).SendLocked()
            github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go:292 +0x245
        github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*txnSeqNumAllocator).SendLocked()
            github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/txn_interceptor_seq_num_allocator.go:105 +0xd4
        github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*txnHeartbeater).SendLocked()
            github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/txn_interceptor_heartbeater.go:246 +0x7cc
        github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*TxnCoordSender).Send()
            github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/txn_coord_sender.go:529 +0x9b2
        github.com/cockroachdb/cockroach/pkg/kv.(*DB).sendUsingSender()
            github.com/cockroachdb/cockroach/pkg/kv/db.go:1006 +0x190
        github.com/cockroachdb/cockroach/pkg/kv.(*Txn).Send()
            github.com/cockroachdb/cockroach/pkg/kv/txn.go:1087 +0x334
        github.com/cockroachdb/cockroach/pkg/kv.(*Txn).commit()
            github.com/cockroachdb/cockroach/pkg/kv/txn.go:707 +0x204
        github.com/cockroachdb/cockroach/pkg/kv.(*Txn).Commit()
            github.com/cockroachdb/cockroach/pkg/kv/txn.go:717 +0xaf
        github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).commitSQLTransactionInternal()
            github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:1349 +0x9aa
        github.com/cockroachdb/cockroach/pkg/sql.(*InternalExecutor).commitTxn()
            github.com/cockroachdb/cockroach/pkg/sql/internal.go:1222 +0x848
        github.com/cockroachdb/cockroach/pkg/sql.(*InternalDB).newInternalExecutorWithTxn.func1()
            github.com/cockroachdb/cockroach/pkg/sql/internal.go:1634 +0xe9
        github.com/cockroachdb/cockroach/pkg/sql.(*InternalDB).txn.func4()
            github.com/cockroachdb/cockroach/pkg/sql/internal.go:1760 +0x62b
        github.com/cockroachdb/cockroach/pkg/kv.runTxn.func1()
            github.com/cockroachdb/cockroach/pkg/kv/db.go:970 +0x51
        github.com/cockroachdb/cockroach/pkg/kv.(*Txn).exec()
            github.com/cockroachdb/cockroach/pkg/kv/txn.go:945 +0xb0
        github.com/cockroachdb/cockroach/pkg/kv.runTxn()
            github.com/cockroachdb/cockroach/pkg/kv/db.go:969 +0x90
        github.com/cockroachdb/cockroach/pkg/kv.(*DB).TxnWithAdmissionControl()
            github.com/cockroachdb/cockroach/pkg/kv/db.go:932 +0xf0
        github.com/cockroachdb/cockroach/pkg/kv.(*DB).Txn()
            github.com/cockroachdb/cockroach/pkg/kv/db.go:907 +0x51
        github.com/cockroachdb/cockroach/pkg/kv.(*DB).Txn-fm()
            <autogenerated>:1 +0x64
        github.com/cockroachdb/cockroach/pkg/sql.(*InternalDB).txn()
            github.com/cockroachdb/cockroach/pkg/sql/internal.go:1739 +0x5fa
        github.com/cockroachdb/cockroach/pkg/sql.(*InternalDB).Txn()
            github.com/cockroachdb/cockroach/pkg/sql/internal.go:1674 +0xf0
        github.com/cockroachdb/cockroach/pkg/sql.(*internalDBWithOverrides).Txn()
            github.com/cockroachdb/cockroach/pkg/sql/internal.go:1815 +0x13d
        github.com/cockroachdb/cockroach/pkg/jobs.(*Registry).claimJobs()
            github.com/cockroachdb/cockroach/pkg/jobs/adopt.go:98 +0x129
        github.com/cockroachdb/cockroach/pkg/jobs.(*Registry).Start.func5()
            github.com/cockroachdb/cockroach/pkg/jobs/registry.go:1074 +0x15d
        github.com/cockroachdb/cockroach/pkg/jobs.(*Registry).withSession()
            github.com/cockroachdb/cockroach/pkg/jobs/registry.go:1009 +0x301
        github.com/cockroachdb/cockroach/pkg/jobs.(*Registry).Start.func12()
            github.com/cockroachdb/cockroach/pkg/jobs/registry.go:1025 +0x52
        github.com/cockroachdb/cockroach/pkg/jobs.(*Registry).Start.func10()
            github.com/cockroachdb/cockroach/pkg/jobs/registry.go:1218 +0x516
        github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx.func2()
            github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:484 +0x1f6
      
      Previous read at 0x00c00b2619a0 by goroutine 1240391:
        reflect.Value.IsNil()
            GOROOT/src/reflect/value.go:1546 +0x455
        encoding/json.isEmptyValue()
            GOROOT/src/encoding/json/encode.go:353 +0x411
        encoding/json.structEncoder.encode()
            GOROOT/src/encoding/json/encode.go:749 +0x169
        encoding/json.structEncoder.encode-fm()
            <autogenerated>:1 +0xdb
        encoding/json.ptrEncoder.encode()
            GOROOT/src/encoding/json/encode.go:944 +0x3c2
        encoding/json.ptrEncoder.encode-fm()
            <autogenerated>:1 +0x90
        encoding/json.(*encodeState).reflectValue()
            GOROOT/src/encoding/json/encode.go:359 +0x88
        encoding/json.interfaceEncoder()
            GOROOT/src/encoding/json/encode.go:715 +0x17b
        encoding/json.structEncoder.encode()
            GOROOT/src/encoding/json/encode.go:760 +0x2ba
        encoding/json.structEncoder.encode-fm()
            <autogenerated>:1 +0xdb
        encoding/json.arrayEncoder.encode()
            GOROOT/src/encoding/json/encode.go:915 +0x10e
        encoding/json.arrayEncoder.encode-fm()
            <autogenerated>:1 +0x90
        encoding/json.sliceEncoder.encode()
            GOROOT/src/encoding/json/encode.go:888 +0x4e2
        encoding/json.sliceEncoder.encode-fm()
            <autogenerated>:1 +0x90
        encoding/json.structEncoder.encode()
            GOROOT/src/encoding/json/encode.go:760 +0x2ba
        encoding/json.structEncoder.encode-fm()
            <autogenerated>:1 +0xdb
        encoding/json.ptrEncoder.encode()
            GOROOT/src/encoding/json/encode.go:944 +0x3c2
        encoding/json.ptrEncoder.encode-fm()
            <autogenerated>:1 +0x90
        encoding/json.(*encodeState).reflectValue()
            GOROOT/src/encoding/json/encode.go:359 +0x88
        encoding/json.(*encodeState).marshal()
            GOROOT/src/encoding/json/encode.go:331 +0x20b
        encoding/json.(*Encoder).Encode()
            GOROOT/src/encoding/json/stream.go:206 +0xcf
        github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.GRPCTransportFactory.func1()
            github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/transport_race.go:120 +0x2c4
        github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx.func2()
            github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:484 +0x1f6
      
      Goroutine 1243887 (running) created at:
        github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx()
            github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:475 +0x619
        github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask()
            github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:346 +0xe64
        github.com/cockroachdb/cockroach/pkg/jobs.(*Registry).Start()
            github.com/cockroachdb/cockroach/pkg/jobs/registry.go:1193 +0xc87
        github.com/cockroachdb/cockroach/pkg/server.(*SQLServer).preStart()
            github.com/cockroachdb/cockroach/pkg/server/server_sql.go:1566 +0x12f6
        github.com/cockroachdb/cockroach/pkg/server.(*SQLServerWrapper).PreStart()
            github.com/cockroachdb/cockroach/pkg/server/tenant.go:734 +0x33a4
        github.com/cockroachdb/cockroach/pkg/server.(*SQLServerWrapper).Start()
            github.com/cockroachdb/cockroach/pkg/server/tenant.go:897 +0x44
        github.com/cockroachdb/cockroach/pkg/server.(*TestServer).StartTenant()
            github.com/cockroachdb/cockroach/pkg/server/testserver.go:1186 +0x2bc6
        github.com/cockroachdb/cockroach/pkg/testutils/serverutils.StartTenant()
            github.com/cockroachdb/cockroach/pkg/testutils/serverutils/test_server_shim.go:458 +0xc9
        github.com/cockroachdb/cockroach/pkg/ccl/changefeedccl.startTestTenant()
            github.com/cockroachdb/cockroach/pkg/ccl/changefeedccl/helpers_test.go:514 +0x677
        github.com/cockroachdb/cockroach/pkg/ccl/changefeedccl.makeTenantServerWithOptions()
            github.com/cockroachdb/cockroach/pkg/ccl/changefeedccl/helpers_test.go:798 +0x11b
        github.com/cockroachdb/cockroach/pkg/testutils/serverutils.StartServer()
            github.com/cockroachdb/cockroach/pkg/testutils/serverutils/test_server_shim.go:352 +0x1d1
        github.com/cockroachdb/cockroach/pkg/testutils/serverutils.StartServer()
            github.com/cockroachdb/cockroach/pkg/testutils/serverutils/test_server_shim.go:347 +0xac
        github.com/cockroachdb/cockroach/pkg/ccl/changefeedccl.startTestFullServer()
            github.com/cockroachdb/cockroach/pkg/ccl/changefeedccl/helpers_test.go:409 +0x795
        github.com/cockroachdb/cockroach/pkg/ccl/changefeedccl.makeTenantServerWithOptions()
            github.com/cockroachdb/cockroach/pkg/ccl/changefeedccl/helpers_test.go:797 +0xb2
        github.com/cockroachdb/cockroach/pkg/ccl/changefeedccl.makeServerWithOptions()
            github.com/cockroachdb/cockroach/pkg/ccl/changefeedccl/helpers_test.go:827 +0x10b
        github.com/cockroachdb/cockroach/pkg/ccl/changefeedccl.cdcTestNamedWithSystem.func1()
            github.com/cockroachdb/cockroach/pkg/ccl/changefeedccl/helpers_test.go:1090 +0x14b
        testing.tRunner()
            GOROOT/src/testing/testing.go:1446 +0x216
        testing.(*T).Run.func1()
            GOROOT/src/testing/testing.go:1493 +0x47
      
      Goroutine 1240391 (running) created at:
        github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx()
            github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:475 +0x619
        github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask()
            github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:346 +0x1cb
        github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.GRPCTransportFactory()
            github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/transport_race.go:98 +0x161
        github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*DistSender).sendToReplicas()
            github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/dist_sender.go:2095 +0xd0d
        github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*DistSender).sendPartialBatch()
            github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/dist_sender.go:1703 +0xa44
        github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*DistSender).divideAndSendBatchToRanges()
            github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/dist_sender.go:1274 +0x592
        github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*RangeIterator).Seek()
            github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/range_iter.go:208 +0x73a
        github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*DistSender).divideAndSendBatchToRanges()
            github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/dist_sender.go:1268 +0x2b7
        github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*DistSender).Send()
            github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/dist_sender.go:894 +0xa79
        github.com/cockroachdb/cockroach/pkg/kv.lookupRangeFwdScan()
            github.com/cockroachdb/cockroach/pkg/kv/range_lookup.go:331 +0x832
        github.com/cockroachdb/cockroach/pkg/kv.RangeLookup()
            github.com/cockroachdb/cockroach/pkg/kv/range_lookup.go:206 +0x315
        github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*DistSender).RangeLookup()
            github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/dist_sender.go:594 +0x128
        github.com/cockroachdb/cockroach/pkg/kv/kvclient/rangecache.(*RangeCache).performRangeLookup()
            github.com/cockroachdb/cockroach/pkg/kv/kvclient/rangecache/range_cache.go:1041 +0x3fe
        github.com/cockroachdb/cockroach/pkg/kv/kvclient/rangecache.tryLookupImpl.func1()
            github.com/cockroachdb/cockroach/pkg/kv/kvclient/rangecache/range_cache.go:929 +0xc5
        github.com/cockroachdb/cockroach/pkg/util/timeutil.RunWithTimeout()
            github.com/cockroachdb/cockroach/pkg/util/timeutil/timeout.go:29 +0x12d
        github.com/cockroachdb/cockroach/pkg/kv/kvclient/rangecache.tryLookupImpl()
            github.com/cockroachdb/cockroach/pkg/kv/kvclient/rangecache/range_cache.go:926 +0x1a8
        github.com/cockroachdb/cockroach/pkg/kv/kvclient/rangecache.(*RangeCache).tryLookup.func3()
            github.com/cockroachdb/cockroach/pkg/kv/kvclient/rangecache/range_cache.go:824 +0xd9
        github.com/cockroachdb/cockroach/pkg/util/syncutil/singleflight.(*Group).doCall.func1()
            github.com/cockroachdb/cockroach/pkg/util/syncutil/singleflight/singleflight.go:387 +0x51
        github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunTask()
            github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:319 +0x147
        github.com/cockroachdb/cockroach/pkg/util/syncutil/singleflight.(*Group).doCall()
            github.com/cockroachdb/cockroach/pkg/util/syncutil/singleflight/singleflight.go:386 +0x2a4
        github.com/cockroachdb/cockroach/pkg/util/syncutil/singleflight.(*Group).DoChan.func1()
            github.com/cockroachdb/cockroach/pkg/util/syncutil/singleflight/singleflight.go:356 +0xd0
      ==================

Jira issue: CRDB-28726

@jayshrivastava jayshrivastava added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-kv Anything in KV that doesn't belong in a more specific category. T-kv KV Team labels Jun 13, 2023
@nvanbenschoten
Copy link
Member

Duplicate of #103687.

nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Jun 24, 2023
Fixes cockroachdb#103687.
Fixes cockroachdb#103247.
Fixes cockroachdb#104791.

This commit avoids a data race between `splitEndTxnAndRetrySend` and
`raceTransport` by avoiding a mutation of a shared `RequestUnion_EndTxn`
object within an unshared `RequestUnion` object. The `raceTransport`
makes an effort to copy the `BatchRequest`'s `RequestUnion` slice, but
it does not copy the inner interface, so we can't play tricks to avoid a
reallocation of the `RequestUnion_EndTxn`.

The commit also addresses a similar problem in
`retryTxnCommitAfterFailedParallelCommit`.

We may be able to fix this in the `raceTransport`, but doing so would
require some reflection magic and this is currently failing CI, so we
make the easier change.

Release note: None
craig bot pushed a commit that referenced this issue Jun 26, 2023
105391: colinfo: add version gate for storing pg_lsn types r=rafiss a=otan

We don't want mixed version clusters storing pg_lsn types, in case they need to rollback / older versions do not understand the type.

Informs #105130

Release note: None

105444: systemschema: stop running PostDeserializationChanges when building tables r=Xiang-Gu a=rafiss

### systemschema: add JobInfo to MakeSystemTables helper

This table was missing from the list, and the expected count of tables
was off since we weren't accounting for non-system tenant tables.

The function is only used for testing, so there was no impact.

---

### systemschema: stop running PostDeserializationChanges when building tables

We would like to remove old PostDeserializationChanges that are no
longer needed. In order to do so, we need to stop relying on them to
build system tables.

Instead, now we adjust the hard-coded system table descriptors and the
related helpers so that they create valid descriptors. This needed two
changes:
- Update the ConstraintID for check constraints.
- Update the primary index encoding so that it includes stored columns.

---

Epic: None
Release note: None

105476: kv: fix data race when updating pending txn in txnStatusCache r=arulajmani a=nvanbenschoten

Fixes #105244.

This commit avoids a data race by treating *roachpb.Transaction objects as immutable, and simply choosing the right transaction to keep in the cache when there is a choice to be made.

The behavior of this logic is tested by `TestTxnCacheUpdatesTxn`.

Release note: None

105480: kv: fix data race during retry of EndTxn after refresh r=arulajmani a=nvanbenschoten

Fixes #103687.
Fixes #103247.
Fixes #104791.

This commit avoids a data race between `splitEndTxnAndRetrySend` and `raceTransport` by avoiding a mutation of a shared `RequestUnion_EndTxn` object within an unshared `RequestUnion` object. The `raceTransport` makes an effort to copy the `BatchRequest`'s `RequestUnion` slice, but it does not copy the inner interface, so we can't play tricks to avoid a reallocation of the `RequestUnion_EndTxn`.

The commit also addresses a similar problem in
`retryTxnCommitAfterFailedParallelCommit`.

We may be able to fix this in the `raceTransport`, but doing so would require some reflection magic and this is currently failing CI, so we make the easier change.

Release note: None

105515: pgwire: fix race in TestConn r=knz a=rafiss

fixes #105410

A recent refactor introduced this race, since the context is used by two testing goroutines.

Release note: None

Co-authored-by: Oliver Tan <[email protected]>
Co-authored-by: Rafi Shamim <[email protected]>
Co-authored-by: Nathan VanBenschoten <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv Anything in KV that doesn't belong in a more specific category. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-kv KV Team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants