Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: TestReplicateQueueDownReplicate failed under stress #31535

Closed
cockroach-teamcity opened this issue Oct 17, 2018 · 18 comments
Closed

storage: TestReplicateQueueDownReplicate failed under stress #31535

cockroach-teamcity opened this issue Oct 17, 2018 · 18 comments
Assignees
Labels
C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot.
Milestone

Comments

@cockroach-teamcity
Copy link
Member

SHA: https://github.com/cockroachdb/cockroach/commits/2cbfb514fed209e9e4192bd07af6baa8dd073bab

Parameters:

TAGS=
GOFLAGS=-race

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stressrace instead of stress and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
make stress TESTS=TestReplicateQueueDownReplicate PKG=github.com/cockroachdb/cockroach/pkg/storage TESTTIMEOUT=5m STRESSFLAGS='-stderr=false -maxtime 20m -timeout 10m'

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=971557&tab=buildLog

1      [async] transport racer
W181017 07:56:00.886909 117992 storage/raft_transport.go:583  [n3] while processing outgoing Raft queue to node 4: rpc error: code = Canceled desc = grpc: the client connection is closing:
I181017 07:56:00.892252 119984 rpc/nodedialer/nodedialer.go:91  [ct-client] unable to connect to n4: context canceled
I181017 07:56:00.894779 119986 rpc/nodedialer/nodedialer.go:91  [ct-client] unable to connect to n1: context canceled
W181017 07:56:00.895569 118546 storage/raft_transport.go:583  [n3] while processing outgoing Raft queue to node 2: rpc error: code = Canceled desc = grpc: the client connection is closing:
W181017 07:56:00.897266 118568 storage/raft_transport.go:583  [n2] while processing outgoing Raft queue to node 3: rpc error: code = Unavailable desc = transport is closing:
W181017 07:56:00.904696 118212 storage/raft_transport.go:583  [n2] while processing outgoing Raft queue to node 4: rpc error: code = Canceled desc = grpc: the client connection is closing:
W181017 07:56:00.906965 117981 storage/raft_transport.go:583  [n2] while processing outgoing Raft queue to node 5: rpc error: code = Canceled desc = context canceled:
W181017 07:56:00.918946 117953 storage/raft_transport.go:583  [n4] while processing outgoing Raft queue to node 3: EOF:
I181017 07:56:00.919612 120026 util/stop/stopper.go:537  quiescing; tasks left:
2      [async] closedts-subscription
1      [async] closedts-rangefeed-subscriber
I181017 07:56:00.921530 120024 util/stop/stopper.go:537  quiescing; tasks left:
2      [async] closedts-subscription
W181017 07:56:00.936350 118405 storage/raft_transport.go:583  [n4] while processing outgoing Raft queue to node 5: EOF:
I181017 07:56:00.941820 120069 rpc/nodedialer/nodedialer.go:91  [ct-client] unable to connect to n1: context canceled
I181017 07:56:00.944618 120025 util/stop/stopper.go:537  quiescing; tasks left:
3      [async] closedts-subscription
1      [async] closedts-rangefeed-subscriber
I181017 07:56:00.945149 120025 util/stop/stopper.go:537  quiescing; tasks left:
3      [async] closedts-subscription
I181017 07:56:00.945586 120025 util/stop/stopper.go:537  quiescing; tasks left:
2      [async] closedts-subscription
I181017 07:56:00.948990 120083 rpc/nodedialer/nodedialer.go:91  [ct-client] unable to connect to n4: context canceled
I181017 07:56:00.950825 119906 rpc/nodedialer/nodedialer.go:91  [ct-client] unable to connect to n3: context canceled
I181017 07:56:00.952485 120084 rpc/nodedialer/nodedialer.go:91  [ct-client] unable to connect to n5: context canceled
I181017 07:56:00.953458 119985 rpc/nodedialer/nodedialer.go:91  [ct-client] unable to connect to n5: context canceled
I181017 07:56:00.954696 120068 rpc/nodedialer/nodedialer.go:91  [ct-client] unable to connect to n5: context canceled
I181017 07:56:00.957513 120023 util/stop/stopper.go:537  quiescing; tasks left:
3      [async] closedts-subscription
1      [async] closedts-rangefeed-subscriber
I181017 07:56:00.957776 120085 rpc/nodedialer/nodedialer.go:91  [ct-client] unable to connect to n1: context canceled
W181017 07:56:00.959279 118159 storage/raft_transport.go:583  [n4] while processing outgoing Raft queue to node 2: rpc error: code = Canceled desc = grpc: the client connection is closing:
I181017 07:56:00.959331 120067 rpc/nodedialer/nodedialer.go:91  [ct-client] unable to connect to n3: context canceled
I181017 07:56:00.959536 120023 util/stop/stopper.go:537  quiescing; tasks left:
1      [async] closedts-subscription
1      [async] closedts-rangefeed-subscriber
I181017 07:56:00.960265 120023 util/stop/stopper.go:537  quiescing; tasks left:
1      [async] closedts-rangefeed-subscriber
I181017 07:56:01.073252 119920 rpc/nodedialer/nodedialer.go:91  [ct-client] unable to connect to n2: context canceled
I181017 07:56:01.085230 115555 kv/transport_race.go:113  transport race promotion: ran 54 iterations on up to 3113 requests
I181017 07:56:01.086536 120103 rpc/nodedialer/nodedialer.go:91  [ct-client] unable to connect to n2: context canceled
--- FAIL: TestReplicateQueueDownReplicate (32.09s)
	testing.go:730: race detected during execution of test

@cockroach-teamcity cockroach-teamcity added this to the 2.2 milestone Oct 17, 2018
@cockroach-teamcity cockroach-teamcity added C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. labels Oct 17, 2018
@tbg
Copy link
Member

tbg commented Oct 17, 2018

==================
WARNING: DATA RACE
Write at 0x00c423b38770 by goroutine 886:
  github.com/cockroachdb/cockroach/pkg/storage.(*Store).Send.func1()
      /go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:2953 +0x1f6
  github.com/cockroachdb/cockroach/pkg/storage.(*Store).Send()
      /go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:3139 +0x79a
  github.com/cockroachdb/cockroach/pkg/storage.(*Stores).Send()
      /go/src/github.com/cockroachdb/cockroach/pkg/storage/stores.go:185 +0x133
  github.com/cockroachdb/cockroach/pkg/server.(*Node).batchInternal.func1()
      /go/src/github.com/cockroachdb/cockroach/pkg/server/node.go:988 +0x212
  github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunTaskWithErr()
      /go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:303 +0xfa
  github.com/cockroachdb/cockroach/pkg/server.(*Node).batchInternal()
      /go/src/github.com/cockroachdb/cockroach/pkg/server/node.go:975 +0x2b2
  github.com/cockroachdb/cockroach/pkg/server.(*Node).Batch()
      /go/src/github.com/cockroachdb/cockroach/pkg/server/node.go:1016 +0xba
  github.com/cockroachdb/cockroach/pkg/roachpb._Internal_Batch_Handler.func1()
      /go/src/github.com/cockroachdb/cockroach/pkg/roachpb/api.pb.go:6633 +0xa1
  github.com/cockroachdb/cockroach/vendor/github.com/grpc-ecosystem/grpc-opentracing/go/otgrpc.OpenTracingServerInterceptor.func1()
      /go/src/github.com/cockroachdb/cockroach/vendor/github.com/grpc-ecosystem/grpc-opentracing/go/otgrpc/server.go:48 +0xc23
  github.com/cockroachdb/cockroach/pkg/rpc.NewServerWithInterceptor.func1()
      /go/src/github.com/cockroachdb/cockroach/pkg/rpc/context.go:197 +0x10c
  github.com/cockroachdb/cockroach/pkg/rpc.NewServerWithInterceptor.func3()
      /go/src/github.com/cockroachdb/cockroach/pkg/rpc/context.go:227 +0xe9
  github.com/cockroachdb/cockroach/pkg/roachpb._Internal_Batch_Handler()
      /go/src/github.com/cockroachdb/cockroach/pkg/roachpb/api.pb.go:6635 +0x1e3
  github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc.(*Server).processUnaryRPC()
      /go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/server.go:1011 +0xde7
  github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc.(*Server).handleStream()
      /go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/server.go:1249 +0x1364
  github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc.(*Server).serveStreams.func1.1()
      /go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/server.go:680 +0xac

Previous write at 0x00c423b38770 by goroutine 413:
  github.com/cockroachdb/cockroach/pkg/storage.(*Store).Send.func1()
      /go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:2953 +0x1f6
  github.com/cockroachdb/cockroach/pkg/storage.(*Store).Send()
      /go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:3139 +0x79a
  github.com/cockroachdb/cockroach/pkg/storage.(*Stores).Send()
      /go/src/github.com/cockroachdb/cockroach/pkg/storage/stores.go:185 +0x133
  github.com/cockroachdb/cockroach/pkg/server.(*Node).batchInternal.func1()
      /go/src/github.com/cockroachdb/cockroach/pkg/server/node.go:988 +0x212
  github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunTaskWithErr()
      /go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:303 +0xfa
  github.com/cockroachdb/cockroach/pkg/server.(*Node).batchInternal()
      /go/src/github.com/cockroachdb/cockroach/pkg/server/node.go:975 +0x2b2
  github.com/cockroachdb/cockroach/pkg/server.(*Node).Batch()
      /go/src/github.com/cockroachdb/cockroach/pkg/server/node.go:1016 +0xba
  github.com/cockroachdb/cockroach/pkg/roachpb._Internal_Batch_Handler.func1()
      /go/src/github.com/cockroachdb/cockroach/pkg/roachpb/api.pb.go:6633 +0xa1
  github.com/cockroachdb/cockroach/vendor/github.com/grpc-ecosystem/grpc-opentracing/go/otgrpc.OpenTracingServerInterceptor.func1()
      /go/src/github.com/cockroachdb/cockroach/vendor/github.com/grpc-ecosystem/grpc-opentracing/go/otgrpc/server.go:48 +0xc23
  github.com/cockroachdb/cockroach/pkg/rpc.NewServerWithInterceptor.func1()
      /go/src/github.com/cockroachdb/cockroach/pkg/rpc/context.go:197 +0x10c
  github.com/cockroachdb/cockroach/pkg/rpc.NewServerWithInterceptor.func3()
      /go/src/github.com/cockroachdb/cockroach/pkg/rpc/context.go:227 +0xe9
  github.com/cockroachdb/cockroach/pkg/roachpb._Internal_Batch_Handler()
      /go/src/github.com/cockroachdb/cockroach/pkg/roachpb/api.pb.go:6635 +0x1e3
  github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc.(*Server).processUnaryRPC()
      /go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/server.go:1011 +0xde7
  github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc.(*Server).handleStream()
      /go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/server.go:1249 +0x1364
  github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc.(*Server).serveStreams.func1.1()
      /go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/server.go:680 +0xac

looks similar to the race in #30483 (comment)

@tbg tbg closed this as completed Oct 17, 2018
@petermattis
Copy link
Collaborator

How is there possibly a race here? Is there some code path where we're using a statically allocated error?

@petermattis
Copy link
Collaborator

Hmm, possibly: txnwait/txnqueue.go:var ErrDeadlock = roachpb.NewErrorf("deadlock detected")

@tbg
Copy link
Member

tbg commented Oct 17, 2018

That seems like a good candidate. And likely the only one (the DistSender one is high enough in the stack to be OK though I don't trust that one neither)

$ rg 'var.*roachpb.NewError'
pkg/kv/dist_sender.go
578:var errNo1PCTxn = roachpb.NewErrorf("cannot send 1PC txn to multiple ranges")

pkg/storage/txnwait/txnqueue.go
354:var ErrDeadlock = roachpb.NewErrorf("deadlock detected")

@petermattis
Copy link
Collaborator

Lovely, we have code in store.go that looks for that singleton:

		if pErr == txnwait.ErrDeadlock {
			// We've experienced a deadlock; set Force=true on push request,
			// and set the push type to ABORT.
			pushReqCopy.Force = true
			pushReqCopy.PushType = roachpb.PUSH_ABORT
...

@petermattis
Copy link
Collaborator

@tschottdorf Are you able to reproduce a failure with either TestReplicateQueueDownReplicate or TestSystemZoneConfigs?

@tbg
Copy link
Member

tbg commented Oct 17, 2018

Heh, someone must've been too lazy to introduce a new ErrorDetail.

Are you able to reproduce a failure with either TestReplicateQueueDownReplicate or TestSystemZoneConfigs?

If I ever was then it has been lost to the sands of time. But I usually post when I manage to repro. A case for make roachprod-stressrace?

@petermattis
Copy link
Collaborator

I'm actually not seeing a code path where ErrDeadlock is returned. That error is only returned from MaybeWaitForPush which is only called from Store.maybeWaitForPushee. Unless I'm misreading the code, it looks like ErrDeadlock gets translated into a nil error upon return from that method.

@petermattis
Copy link
Collaborator

Perhaps this is the errNo1PCTxn singleton. Could that be returned out of DistSender?

These roachpb.Error singletons should be removed. They are on rare paths and the cognitive burden isn't worth it.

@petermattis
Copy link
Collaborator

I managed to reproduce with make roachprod-stressrace PKG=./storage/ STRESSFLAGS="-p 2" CLUSTER=peter-stress on a 10-node cluster. Note that stressing the individual test didn't seem to do it. The test which experienced the race was TestSystemZoneConfigs. I'm just keeping the discussion here because the race is identical.

@petermattis
Copy link
Collaborator

Also, I added instrumentation to detect if pErr == txnwait.ErrDeadlock. That didn't fire, so I think that singleton is not the problem.

@petermattis
Copy link
Collaborator

Hmm, strangely none of my instrumentation to detect the roachpb.Error singletons has fired even though I've reproduced the race. Adding more general instrumentation to see what is going on.

@tbg
Copy link
Member

tbg commented Oct 18, 2018

Wouldn't it be nice if you could write code like if runtime.RaceDetected() { fmt.Printf("this is the error: %T %+v", pErr, pErr); os.Exit(1) }

@petermattis
Copy link
Collaborator

Yes. I want that. I'll get this figured out, its just irritating to reproduced. I didn't see a reproduction with the instrumentation I added last night. Probably because it produced copious logs.

@petermattis
Copy link
Collaborator

Made a bit more progress on this:

I181019 14:46:47.465070 7959 storage/store.go:2945  [n7,s7] Store.Send: 0xc426d49440: *roachpb.Error result is ambiguous (removing replica): 493 Merge
I181019 14:46:47.466725 7933 storage/store.go:2945  [n7,s7] Store.Send: 0xc426d49440: *roachpb.Error result is ambiguous (removing replica): 493 Merge
==================
WARNING: DATA RACE
Write at 0x00c426d49440 by goroutine 2053:

[It's odd that the race detector goroutine IDs do not match up with the log messages. That is repeatable and worth understanding.]

The log messages indicate this is definitely not occurring for the two singleton roachpb.Errors. This ambiguous result error is generated by https://github.com/cockroachdb/cockroach/blob/master/pkg/storage/replica.go#L871-L881:

func (r *Replica) cancelPendingCommandsLocked() {
	r.mu.AssertHeld()
	pr := proposalResult{
		Err:           roachpb.NewError(roachpb.NewAmbiguousResultError("removing replica")),
		ProposalRetry: proposalRangeNoLongerExists,
	}
	for _, p := range r.mu.proposals {
		r.cleanupFailedProposalLocked(p)
		p.finishApplication(pr)
	}
}

Seems pretty clear what is going on. We're sharing an error across multiple proposals. Easy to fix.

@tbg
Copy link
Member

tbg commented Oct 19, 2018

[It's odd that the race detector goroutine IDs do not match up with the log messages. That is repeatable and worth understanding.]

Do you have the full message for this failure?

@petermattis
Copy link
Collaborator

Do you have the full message for this failure?

Here you go:

==================
WARNING: DATA RACE
Write at 0x00c426d49440 by goroutine 2053:
  github.com/cockroachdb/cockroach/pkg/storage.(*Store).Send.func1()
      /go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:2946 +0x476
  github.com/cockroachdb/cockroach/pkg/storage.(*Store).Send()
      /go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:3144 +0x8ce
  github.com/cockroachdb/cockroach/pkg/storage.(*Stores).Send()
      /go/src/github.com/cockroachdb/cockroach/pkg/storage/stores.go:185 +0x133
  github.com/cockroachdb/cockroach/pkg/server.(*Node).batchInternal.func1()
      /go/src/github.com/cockroachdb/cockroach/pkg/server/node.go:988 +0x212
  github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunTaskWithErr()
      /go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:303 +0xfa
  github.com/cockroachdb/cockroach/pkg/server.(*Node).batchInternal()
      /go/src/github.com/cockroachdb/cockroach/pkg/server/node.go:975 +0x2b2
  github.com/cockroachdb/cockroach/pkg/server.(*Node).Batch()
      /go/src/github.com/cockroachdb/cockroach/pkg/server/node.go:1016 +0xba
  github.com/cockroachdb/cockroach/pkg/roachpb._Internal_Batch_Handler.func1()
      /go/src/github.com/cockroachdb/cockroach/pkg/roachpb/api.pb.go:6633 +0xa1
  github.com/cockroachdb/cockroach/vendor/github.com/grpc-ecosystem/grpc-opentracing/go/otgrpc.OpenTracingServerInterceptor.func1()
      /go/src/github.com/cockroachdb/cockroach/vendor/github.com/grpc-ecosystem/grpc-opentracing/go/otgrpc/server.go:48 +0xc23
  github.com/cockroachdb/cockroach/pkg/rpc.NewServerWithInterceptor.func1()
      /go/src/github.com/cockroachdb/cockroach/pkg/rpc/context.go:197 +0x10c
  github.com/cockroachdb/cockroach/pkg/rpc.NewServerWithInterceptor.func3()
      /go/src/github.com/cockroachdb/cockroach/pkg/rpc/context.go:227 +0xe9
  github.com/cockroachdb/cockroach/pkg/roachpb._Internal_Batch_Handler()
      /go/src/github.com/cockroachdb/cockroach/pkg/roachpb/api.pb.go:6635 +0x1e3
  github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc.(*Server).processUnaryRPC()
      /go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/server.go:1011 +0xde7
  github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc.(*Server).handleStream()
      /go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/server.go:1249 +0x1364
  github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc.(*Server).serveStreams.func1.1()
      /go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/server.go:680 +0xac

Previous write at 0x00c426d49440 by goroutine 1316:
  github.com/cockroachdb/cockroach/pkg/storage.(*Store).Send.func1()
      /go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:2946 +0x476
  github.com/cockroachdb/cockroach/pkg/storage.(*Store).Send()
      /go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:3144 +0x8ce
  github.com/cockroachdb/cockroach/pkg/storage.(*Stores).Send()
      /go/src/github.com/cockroachdb/cockroach/pkg/storage/stores.go:185 +0x133
  github.com/cockroachdb/cockroach/pkg/server.(*Node).batchInternal.func1()
      /go/src/github.com/cockroachdb/cockroach/pkg/server/node.go:988 +0x212
  github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunTaskWithErr()
      /go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:303 +0xfa
  github.com/cockroachdb/cockroach/pkg/server.(*Node).batchInternal()
      /go/src/github.com/cockroachdb/cockroach/pkg/server/node.go:975 +0x2b2
  github.com/cockroachdb/cockroach/pkg/server.(*Node).Batch()
      /go/src/github.com/cockroachdb/cockroach/pkg/server/node.go:1016 +0xba
  github.com/cockroachdb/cockroach/pkg/roachpb._Internal_Batch_Handler.func1()
      /go/src/github.com/cockroachdb/cockroach/pkg/roachpb/api.pb.go:6633 +0xa1
  github.com/cockroachdb/cockroach/vendor/github.com/grpc-ecosystem/grpc-opentracing/go/otgrpc.OpenTracingServerInterceptor.func1()
      /go/src/github.com/cockroachdb/cockroach/vendor/github.com/grpc-ecosystem/grpc-opentracing/go/otgrpc/server.go:48 +0xc23
  github.com/cockroachdb/cockroach/pkg/rpc.NewServerWithInterceptor.func1()
      /go/src/github.com/cockroachdb/cockroach/pkg/rpc/context.go:197 +0x10c
  github.com/cockroachdb/cockroach/pkg/rpc.NewServerWithInterceptor.func3()
      /go/src/github.com/cockroachdb/cockroach/pkg/rpc/context.go:227 +0xe9
  github.com/cockroachdb/cockroach/pkg/roachpb._Internal_Batch_Handler()
      /go/src/github.com/cockroachdb/cockroach/pkg/roachpb/api.pb.go:6635 +0x1e3
  github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc.(*Server).processUnaryRPC()
      /go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/server.go:1011 +0xde7
  github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc.(*Server).handleStream()
      /go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/server.go:1249 +0x1364
  github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc.(*Server).serveStreams.func1.1()
      /go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/server.go:680 +0xac

Goroutine 2053 (running) created at:
  github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc.(*Server).serveStreams.func1()
      /go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/server.go:678 +0xb8
  github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/transport.(*http2Server).operateHeaders()
      /go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/transport/http2_server.go:417 +0x1446
  github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/transport.(*http2Server).HandleStreams()
      /go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/transport/http2_server.go:457 +0x87a
  github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc.(*Server).serveStreams()
      /go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/server.go:676 +0x16e
  github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc.(*Server).handleRawConn.func2()
      /go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/server.go:631 +0x53
  github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc.(*Server).handleRawConn.func3()
      /go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/server.go:639 +0x34

Goroutine 1316 (finished) created at:
  github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc.(*Server).serveStreams.func1()
      /go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/server.go:678 +0xb8
  github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/transport.(*http2Server).operateHeaders()
      /go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/transport/http2_server.go:417 +0x1446
  github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/transport.(*http2Server).HandleStreams()
      /go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/transport/http2_server.go:457 +0x87a
  github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc.(*Server).serveStreams()
      /go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/server.go:676 +0x16e
  github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc.(*Server).handleRawConn.func2()
      /go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/server.go:631 +0x53
  github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc.(*Server).handleRawConn.func3()
      /go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/server.go:639 +0x34
==================

petermattis added a commit to petermattis/cockroach that referenced this issue Oct 19, 2018
Replica.cancelPendingCommandsLocked was sharing an error across multiple
proposals. This is problematic because Store.Send mutates Error.Now.

Fixes cockroachdb#31535
Fixes cockroachdb#30483

Release note: None
craig bot pushed a commit that referenced this issue Oct 19, 2018
31631: storage: do not share an error across proposals r=benesch a=petermattis

Replica.cancelPendingCommandsLocked was sharing an error across multiple
proposals. This is problematic because Store.Send mutates Error.Now.

Fixes #31535
Fixes #30483

Release note: None

Co-authored-by: Peter Mattis <[email protected]>
@craig craig bot closed this as completed in #31631 Oct 19, 2018
petermattis added a commit to petermattis/cockroach that referenced this issue Oct 19, 2018
Replica.cancelPendingCommandsLocked was sharing an error across multiple
proposals. This is problematic because Store.Send mutates Error.Now.

Fixes cockroachdb#31535
Fixes cockroachdb#30483

Release note: None
tbg added a commit to tbg/cockroach that referenced this issue Oct 19, 2018
Spawned from
cockroachdb#31535 (comment).

```
[tschottdorf@toblerone cockroach (fix/race-goroutine-id)]$ make testrace PKG=./pkg/testutils
Running make with -j8
GOPATH set to /Users/tschottdorf/go
go test -race -tags ' make x86_64_apple_darwin18.0.0' -ldflags '-X github.com/cockroachdb/cockroach/pkg/build.typ=development -extldflags "" -X "github.com/cockroachdb/cockroach/pkg/build.tag=v2.2.0-alpha.00000000-1804-g91b9c04cfe-dirty" -X "github.com/cockroachdb/cockroach/pkg/build.rev=91b9c04cfe72a501e084cd21eaf44569b19e5d56" -X "github.com/cockroachdb/cockroach/pkg/build.cgoTargetTriple=x86_64-apple-darwin18.0.0"  ' -run "."  -timeout 25m ./pkg/testutils
I181019 16:29:57.839687 26 testutils/error_test.go:43  my id is 26
I181019 16:29:57.839720 27 testutils/error_test.go:43  my id is 27
goroutine 26 [running]:
runtime/debug.Stack(0x4bcef99, 0xc000070e70, 0x404fe5f)
	/usr/local/Cellar/go/1.11.1/libexec/src/runtime/debug/stack.go:24 +0xb5
runtime/debug.PrintStack()
	/usr/local/Cellar/go/1.11.1/libexec/src/runtime/debug/stack.go:16 +0x34
github.com/cockroachdb/cockroach/pkg/testutils.TestGoroutine.func1()
	/Users/tschottdorf/go/src/github.com/cockroachdb/cockroach/pkg/testutils/error_test.go:45 +0x129
github.com/cockroachdb/cockroach/pkg/testutils.TestGoroutine(0xc000190300)
	/Users/tschottdorf/go/src/github.com/cockroachdb/cockroach/pkg/testutils/error_test.go:51 +0x127
testing.tRunner(0xc000190300, 0x4e66458)
	/usr/local/Cellar/go/1.11.1/libexec/src/testing/testing.go:827 +0x163
created by testing.(*T).Run
	/usr/local/Cellar/go/1.11.1/libexec/src/testing/testing.go:878 +0x651
goroutine 27 [running]:
runtime/debug.Stack(0x4bcef99, 0xc000531f38, 0x404fe5f)
	/usr/local/Cellar/go/1.11.1/libexec/src/runtime/debug/stack.go:24 +0xb5
runtime/debug.PrintStack()
	/usr/local/Cellar/go/1.11.1/libexec/src/runtime/debug/stack.go:16 +0x34
github.com/cockroachdb/cockroach/pkg/testutils.TestGoroutine.func1()
	/Users/tschottdorf/go/src/github.com/cockroachdb/cockroach/pkg/testutils/error_test.go:45 +0x129
created by github.com/cockroachdb/cockroach/pkg/testutils.TestGoroutine
	/Users/tschottdorf/go/src/github.com/cockroachdb/cockroach/pkg/testutils/error_test.go:50 +0x11d
==================
WARNING: DATA RACE
Write at 0x00c0000ba1a8 by goroutine 24:
  github.com/cockroachdb/cockroach/pkg/testutils.TestGoroutine.func1()
      /Users/tschottdorf/go/src/github.com/cockroachdb/cockroach/pkg/testutils/error_test.go:47 +0x138

Previous write at 0x00c0000ba1a8 by goroutine 23:
  github.com/cockroachdb/cockroach/pkg/testutils.TestGoroutine.func1()
      /Users/tschottdorf/go/src/github.com/cockroachdb/cockroach/pkg/testutils/error_test.go:47 +0x138
  github.com/cockroachdb/cockroach/pkg/testutils.TestGoroutine()
      /Users/tschottdorf/go/src/github.com/cockroachdb/cockroach/pkg/testutils/error_test.go:51 +0x126
  testing.tRunner()
      /usr/local/Cellar/go/1.11.1/libexec/src/testing/testing.go:827 +0x162

Goroutine 24 (running) created at:
  github.com/cockroachdb/cockroach/pkg/testutils.TestGoroutine()
      /Users/tschottdorf/go/src/github.com/cockroachdb/cockroach/pkg/testutils/error_test.go:50 +0x11c
  testing.tRunner()
      /usr/local/Cellar/go/1.11.1/libexec/src/testing/testing.go:827 +0x162

Goroutine 23 (running) created at:
  testing.(*T).Run()
      /usr/local/Cellar/go/1.11.1/libexec/src/testing/testing.go:878 +0x650
  testing.runTests.func1()
      /usr/local/Cellar/go/1.11.1/libexec/src/testing/testing.go:1119 +0xa8
  testing.tRunner()
      /usr/local/Cellar/go/1.11.1/libexec/src/testing/testing.go:827 +0x162
  testing.runTests()
      /usr/local/Cellar/go/1.11.1/libexec/src/testing/testing.go:1117 +0x4ee
  testing.(*M).Run()
      /usr/local/Cellar/go/1.11.1/libexec/src/testing/testing.go:1034 +0x2ee
  main.main()
      _testmain.go:56 +0x221
==================
FAIL	github.com/cockroachdb/cockroach/pkg/testutils	1.107s
make: *** [testrace] Error 1
```

Release note: None
@tbg
Copy link
Member

tbg commented Oct 19, 2018

#31639 for the goroutine IDs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot.
Projects
None yet
Development

No branches or pull requests

4 participants