Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

datarace in LeaseManager exposed by TestReplicateQueueDownReplicate #28222

Closed
andreimatei opened this issue Aug 2, 2018 · 1 comment
Closed
Assignees
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-test-failure Broken test (automatically or manually discovered).

Comments

@andreimatei
Copy link
Contributor

Check out this test failure: #27783 (comment)
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=806649&tab=buildLog

[05:53:03] :	 [Step 2/2] WARNING: DATA RACE
[05:53:03] :	 [Step 2/2] Write at 0x00c422182738 by goroutine 514:
[05:53:03] :	 [Step 2/2]   github.com/cockroachdb/cockroach/pkg/internal/client.(*LeaseManager).ExtendLease()
[05:53:03] :	 [Step 2/2]       /go/src/github.com/cockroachdb/cockroach/pkg/internal/client/lease.go:154 +0x3d7
[05:53:03] :	 [Step 2/2]   github.com/cockroachdb/cockroach/pkg/sqlmigrations.(*Manager).EnsureMigrations.func2()
[05:53:03] :	 [Step 2/2]       /go/src/github.com/cockroachdb/cockroach/pkg/sqlmigrations/migrations.go:394 +0x192
[05:53:03] :	 [Step 2/2]   github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask.func1()
[05:53:03] :	 [Step 2/2]       /go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:324 +0xf3
[05:53:03] :	 [Step 2/2] 
[05:53:03] :	 [Step 2/2] Previous read at 0x00c422182738 by goroutine 1347:
[05:53:03] :	 [Step 2/2]   github.com/cockroachdb/cockroach/pkg/internal/client.(*LeaseManager).ReleaseLease()
[05:53:03] :	 [Step 2/2]       /go/src/github.com/cockroachdb/cockroach/pkg/internal/client/lease.go:161 +0x54
[05:53:03] :	 [Step 2/2]   github.com/cockroachdb/cockroach/pkg/sqlmigrations.(*Manager).EnsureMigrations.func1()
[05:53:03] :	 [Step 2/2]       /go/src/github.com/cockroachdb/cockroach/pkg/sqlmigrations/migrations.go:384 +0xd1
[05:53:03] :	 [Step 2/2]   github.com/cockroachdb/cockroach/pkg/sqlmigrations.(*Manager).EnsureMigrations()
[05:53:03] :	 [Step 2/2]       /go/src/github.com/cockroachdb/cockroach/pkg/sqlmigrations/migrations.go:462 +0x1429
[05:53:03] :	 [Step 2/2]   github.com/cockroachdb/cockroach/pkg/server.(*Server).Start()
[05:53:03] :	 [Step 2/2]       /go/src/github.com/cockroachdb/cockroach/pkg/server/server.go:1591 +0x4f9d
[05:53:03] :	 [Step 2/2]   github.com/cockroachdb/cockroach/pkg/server.(*TestServer).Start()
[05:53:03] :	 [Step 2/2]       /go/src/github.com/cockroachdb/cockroach/pkg/server/testserver.go:344 +0x244
[05:53:03] :	 [Step 2/2]   github.com/cockroachdb/cockroach/pkg/testutils/serverutils.StartServerRaw()
[05:53:03] :	 [Step 2/2]       /go/src/github.com/cockroachdb/cockroach/pkg/testutils/serverutils/test_server_shim.go:206 +0x140
[05:53:03] :	 [Step 2/2]   github.com/cockroachdb/cockroach/pkg/testutils/serverutils.StartServer()
[05:53:03] :	 [Step 2/2]       /go/src/github.com/cockroachdb/cockroach/pkg/testutils/serverutils/test_server_shim.go:174 +0x73
[05:53:03] :	 [Step 2/2]   github.com/cockroachdb/cockroach/pkg/testutils/testcluster.(*TestCluster).doAddServer()
[05:53:03] :	 [Step 2/2]       /go/src/github.com/cockroachdb/cockroach/pkg/testutils/testcluster/testcluster.go:251 +0x161
[05:53:03] :	 [Step 2/2]   github.com/cockroachdb/cockroach/pkg/testutils/testcluster.StartTestCluster()
[05:53:03] :	 [Step 2/2]       /go/src/github.com/cockroachdb/cockroach/pkg/testutils/testcluster/testcluster.go:157 +0x6ff
[05:53:03] :	 [Step 2/2]   github.com/cockroachdb/cockroach/pkg/storage_test.TestReplicateQueueDownReplicate()
[05:53:03] :	 [Step 2/2]       /go/src/github.com/cockroachdb/cockroach/pkg/storage/replicate_queue_test.go:180 +0xce
[05:53:03] :	 [Step 2/2]   testing.tRunner()
[05:53:03] :	 [Step 2/2]       /usr/local/go/src/testing/testing.go:777 +0x16d
[05:53:03] :	 [Step 2/2] 
[05:53:03] :	 [Step 2/2] Goroutine 514 (running) created at:
[05:53:03] :	 [Step 2/2]   github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask()
[05:53:03] :	 [Step 2/2]       /go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:319 +0x14c
[05:53:03] :	 [Step 2/2]   github.com/cockroachdb/cockroach/pkg/sqlmigrations.(*Manager).EnsureMigrations()
[05:53:03] :	 [Step 2/2]       /go/src/github.com/cockroachdb/cockroach/pkg/sqlmigrations/migrations.go:388 +0x9cd
[05:53:03] :	 [Step 2/2]   github.com/cockroachdb/cockroach/pkg/server.(*Server).Start()
[05:53:03] :	 [Step 2/2]       /go/src/github.com/cockroachdb/cockroach/pkg/server/server.go:1591 +0x4f9d
[05:53:03] :	 [Step 2/2]   github.com/cockroachdb/cockroach/pkg/server.(*TestServer).Start()
[05:53:03] :	 [Step 2/2]       /go/src/github.com/cockroachdb/cockroach/pkg/server/testserver.go:344 +0x244
[05:53:03] :	 [Step 2/2]   github.com/cockroachdb/cockroach/pkg/testutils/serverutils.StartServerRaw()
[05:53:03] :	 [Step 2/2]       /go/src/github.com/cockroachdb/cockroach/pkg/testutils/serverutils/test_server_shim.go:206 +0x140
[05:53:03] :	 [Step 2/2]   github.com/cockroachdb/cockroach/pkg/testutils/serverutils.StartServer()
[05:53:03] :	 [Step 2/2]       /go/src/github.com/cockroachdb/cockroach/pkg/testutils/serverutils/test_server_shim.go:174 +0x73
[05:53:03] :	 [Step 2/2]   github.com/cockroachdb/cockroach/pkg/testutils/testcluster.(*TestCluster).doAddServer()
[05:53:03] :	 [Step 2/2]       /go/src/github.com/cockroachdb/cockroach/pkg/testutils/testcluster/testcluster.go:251 +0x161
[05:53:03] :	 [Step 2/2]   github.com/cockroachdb/cockroach/pkg/testutils/testcluster.StartTestCluster()
[05:53:03] :	 [Step 2/2]       /go/src/github.com/cockroachdb/cockroach/pkg/testutils/testcluster/testcluster.go:157 +0x6ff
[05:53:03] :	 [Step 2/2]   github.com/cockroachdb/cockroach/pkg/storage_test.TestReplicateQueueDownReplicate()
[05:53:03] :	 [Step 2/2]       /go/src/github.com/cockroachdb/cockroach/pkg/storage/replicate_queue_test.go:180 +0xce
[05:53:03] :	 [Step 2/2]   testing.tRunner()
[05:53:03] :	 [Step 2/2]       /usr/local/go/src/testing/testing.go:777 +0x16d
[05:53:03] :	 [Step 2/2] 
[05:53:03] :	 [Step 2/2] Goroutine 1347 (running) created at:
[05:53:03] :	 [Step 2/2]   testing.(*T).Run()
[05:53:03] :	 [Step 2/2]       /usr/local/go/src/testing/testing.go:824 +0x564
[05:53:03] :	 [Step 2/2]   testing.runTests.func1()
[05:53:03] :	 [Step 2/2]       /usr/local/go/src/testing/testing.go:1063 +0xa4
[05:53:03] :	 [Step 2/2]   testing.tRunner()
[05:53:03] :	 [Step 2/2]       /usr/local/go/src/testing/testing.go:777 +0x16d
[05:53:03] :	 [Step 2/2]   testing.runTests()
[05:53:03] :	 [Step 2/2]       /usr/local/go/src/testing/testing.go:1061 +0x4e1
[05:53:03] :	 [Step 2/2]   testing.(*M).Run()
[05:53:03] :	 [Step 2/2]       /usr/local/go/src/testing/testing.go:978 +0x2cd
[05:53:03] :	 [Step 2/2]   github.com/cockroachdb/cockroach/pkg/storage_test.TestMain()
[05:53:03] :	 [Step 2/2]       /go/src/github.com/cockroachdb/cockroach/pkg/storage/main_test.go:57 +0x2a8
[05:53:03] :	 [Step 2/2]   main.main()
[05:53:03] :	 [Step 2/2]       _testmain.go:1088 +0x22a
@andreimatei andreimatei added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-test-failure Broken test (automatically or manually discovered). labels Aug 2, 2018
@a-robinson
Copy link
Contributor

Huh, this looks like a legit race that's been there basically since this code was first implemented. It's impressive it's made it nearly two years without being noticed.

a-robinson added a commit to a-robinson/cockroach that referenced this issue Aug 2, 2018
This race was exceptionally rare due to how the lease manager is
typically used by the sqlmigrations package, but it is indeed a race.

Holding a mutex while making a remote RPC is usually a terrible idea,
but in the context of how it's used it's actually more dangerous to let
ExtendLease and ReleaseLease interleave, since if ReleaseLease's CPut
fails then the sqlmigrations package will log.Fatal, and the only
potential for lock contention is between one goroutine using ExtendLease
and one running ReleaseLseas. Perhaps this is tuning the package too
tightly to the needs of its client, but as of now it's its only client.

Fixes cockroachdb#28222

Release note: None
a-robinson added a commit to a-robinson/cockroach that referenced this issue Aug 3, 2018
This race was exceptionally rare due to how the lease manager is
typically used by the sqlmigrations package, but it is indeed a race.

Holding a semaphore while making a remote RPC is usually a terrible
idea, but in the context of how it's used it's actually more dangerous
to let ExtendLease and ReleaseLease interleave, since if ReleaseLease's
CPut fails then the sqlmigrations package will log.Fatal, and the only
potential for lock contention is between one goroutine using ExtendLease
and one running ReleaseLease. Perhaps this is tuning the package too
tightly to the needs of its client, but as of now it's its only client.

Fixes cockroachdb#28222

Release note: None
craig bot pushed a commit that referenced this issue Aug 3, 2018
28174: changefeedccl: test that the initial scan only emits the latest value r=nvanbenschoten a=danhhz

Release note: None

28223: internal/client: Make the lease manager thread-safe r=a-robinson a=a-robinson

This race was exceptionally rare due to how the lease manager is
typically used by the sqlmigrations package, but it is indeed a race.

Holding a mutex while making a remote RPC is usually a terrible idea,
but in the context of how it's used it's actually more dangerous to let
ExtendLease and ReleaseLease interleave, since if ReleaseLease's CPut
fails then the sqlmigrations package will log.Fatal, and the only
potential for lock contention is between one goroutine using ExtendLease
and one running ReleaseLseas. Perhaps this is tuning the package too
tightly to the needs of its client, but as of now it's its only client.

Fixes #28222

Release note: None

Co-authored-by: Daniel Harrison <[email protected]>
Co-authored-by: Alex Robinson <[email protected]>
@craig craig bot closed this as completed in #28223 Aug 3, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-test-failure Broken test (automatically or manually discovered).
Projects
None yet
Development

No branches or pull requests

2 participants