Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

teamcity: failed test: TestStoreRangeMergeWatcher #31167

Closed
cockroach-teamcity opened this issue Oct 9, 2018 · 0 comments
Closed

teamcity: failed test: TestStoreRangeMergeWatcher #31167

cockroach-teamcity opened this issue Oct 9, 2018 · 0 comments
Labels
C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot.
Milestone

Comments

@cockroach-teamcity
Copy link
Member

The following tests appear to have failed on release-2.1 (test): TestStoreRangeMergeWatcher, TestStoreRangeMergeWatcher/inject-failures=false, TestStoreRangeMergeWatcher/inject-failures=true

You may want to check for open issues.

#954952:

TestStoreRangeMergeWatcher
--- FAIL: test/TestStoreRangeMergeWatcher (2.930s)
Test ended in panic.




TestStoreRangeMergeWatcher/inject-failures=true
...roachdb/cockroach/vendor/google.golang.org/grpc/transport/http2_server.go:273 +0xda9

goroutine 47985 [select]:
github.com/cockroachdb/cockroach/pkg/storage.(*NodeLiveness).StartHeartbeat.func1(0x2c076c0, 0xc4214963c0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/node_liveness.go:502 +0x3fc
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunWorker.func1(0xc4214e3fa0, 0xc421c6e360, 0xc4208fcc80)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:199 +0xe9
created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunWorker
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:192 +0xad

goroutine 48004 [semacquire]:
sync.runtime_notifyListWait(0xc420aa0290, 0x5bf3)
	/usr/local/go/src/runtime/sema.go:510 +0x10b
sync.(*Cond).Wait(0xc420aa0280)
	/usr/local/go/src/sync/cond.go:56 +0x80
github.com/cockroachdb/cockroach/pkg/storage.(*raftScheduler).worker(0xc420561380, 0x2c076c0, 0xc420d1f680)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/scheduler.go:196 +0x7c
github.com/cockroachdb/cockroach/pkg/storage.(*raftScheduler).Start.func2(0x2c076c0, 0xc420d1f680)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/scheduler.go:165 +0x3e
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunWorker.func1(0xc4203e7f10, 0xc421c6e360, 0xc4203e7f00)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:199 +0xe9
created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunWorker
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:192 +0xad

goroutine 48005 [semacquire]:
sync.runtime_notifyListWait(0xc420aa0290, 0x5bf1)
	/usr/local/go/src/runtime/sema.go:510 +0x10b
sync.(*Cond).Wait(0xc420aa0280)
	/usr/local/go/src/sync/cond.go:56 +0x80
github.com/cockroachdb/cockroach/pkg/storage.(*raftScheduler).worker(0xc420561380, 0x2c076c0, 0xc420d1f6b0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/scheduler.go:196 +0x7c
github.com/cockroachdb/cockroach/pkg/storage.(*raftScheduler).Start.func2(0x2c076c0, 0xc420d1f6b0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/scheduler.go:165 +0x3e
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunWorker.func1(0xc4203e7f30, 0xc421c6e360, 0xc4203e7f20)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:199 +0xe9
created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunWorker
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:192 +0xad

goroutine 48207 [select, 7 minutes]:
github.com/cockroachdb/cockroach/pkg/storage.(*baseQueue).processLoop.func1(0x2c076c0, 0xc420e7d8f0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/queue.go:596 +0x194
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunWorker.func1(0xc4215dade0, 0xc421c6f0e0, 0xc4203e8920)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:199 +0xe9
created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunWorker
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:192 +0xad

goroutine 47995 [semacquire]:
sync.runtime_notifyListWait(0xc420aa0290, 0x5bea)
	/usr/local/go/src/runtime/sema.go:510 +0x10b
sync.(*Cond).Wait(0xc420aa0280)
	/usr/local/go/src/sync/cond.go:56 +0x80
github.com/cockroachdb/cockroach/pkg/storage.(*raftScheduler).worker(0xc420561380, 0x2c076c0, 0xc420d1f4d0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/scheduler.go:196 +0x7c
github.com/cockroachdb/cockroach/pkg/storage.(*raftScheduler).Start.func2(0x2c076c0, 0xc420d1f4d0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/scheduler.go:165 +0x3e
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunWorker.func1(0xc4203e7d40, 0xc421c6e360, 0xc4203e7d30)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:199 +0xe9
created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunWorker
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:192 +0xad



TestStoreRangeMergeWatcher
--- FAIL: test/TestStoreRangeMergeWatcher (0.000s)
Test ended in panic.




TestStoreRangeMergeWatcher/inject-failures=false
... minor_val:0 patch:0 unstable:0 > build_tag:"" started_at:0 
I181009 22:06:46.734503 47713 gossip/client.go:129  [n2] started gossip client to 127.0.0.1:35967
I181009 22:06:46.785720 47491 storage/store.go:1562  [s3] [n3,s3]: failed initial metrics computation: [n3,s3]: system config not yet available
W181009 22:06:46.787440 47491 gossip/gossip.go:1499  [n3] no incoming or outgoing connections
I181009 22:06:46.787531 47491 gossip/gossip.go:395  [n3] NodeDescriptor set to node_id:3 address:<network_field:"tcp" address_field:"127.0.0.1:33943" > attrs:<> locality:<> ServerVersion:<major_val:0 minor_val:0 patch:0 unstable:0 > build_tag:"" started_at:0 
I181009 22:06:46.788426 47828 gossip/client.go:129  [n3] started gossip client to 127.0.0.1:35967
I181009 22:06:46.836664 47491 rpc/nodedialer/nodedialer.go:92  [s1,r1/1:/M{in-ax}] connection to n2 established
I181009 22:06:46.837233 47491 storage/store_snapshot.go:615  [s1,r1/1:/M{in-ax}] sending preemptive snapshot 3c883234 at applied index 16
I181009 22:06:46.837487 47491 storage/store_snapshot.go:657  [s1,r1/1:/M{in-ax}] streamed snapshot to (n2,s2):?: kv pairs: 49, log entries: 6, rate-limit: 2.0 MiB/sec, 1ms
I181009 22:06:46.837997 46266 storage/replica_raftstorage.go:803  [s2,r1/?:{-}] applying preemptive snapshot at index 16 (id=3c883234, encoded size=8300, 1 rocksdb batches, 6 log entries)
I181009 22:06:46.838727 46266 storage/replica_raftstorage.go:809  [s2,r1/?:/M{in-ax}] applied preemptive snapshot in 1ms [clear=0ms batch=0ms entries=0ms commit=0ms]
I181009 22:06:46.839216 47491 storage/replica_command.go:812  [s1,r1/1:/M{in-ax}] change replicas (ADD_REPLICA (n2,s2):2): read existing descriptor r1:/M{in-ax} [(n1,s1):1, next=2, gen=0]
I181009 22:06:46.841147 47491 storage/replica.go:3836  [s1,r1/1:/M{in-ax},txn=b3a8ce7d] proposing ADD_REPLICA((n2,s2):2): updated=[(n1,s1):1 (n2,s2):2] next=3
I181009 22:06:46.842035 47491 rpc/nodedialer/nodedialer.go:92  [s1,r1/1:/M{in-ax}] connection to n3 established
I181009 22:06:46.842911 47491 storage/store_snapshot.go:615  [s1,r1/1:/M{in-ax}] sending preemptive snapshot 6ef885fc at applied index 18
I181009 22:06:46.843237 47491 storage/store_snapshot.go:657  [s1,r1/1:/M{in-ax}] streamed snapshot to (n3,s3):?: kv pairs: 52, log entries: 8, rate-limit: 2.0 MiB/sec, 1ms
I181009 22:06:46.843550 46267 storage/replica_raftstorage.go:803  [s3,r1/?:{-}] applying preemptive snapshot at index 18 (id=6ef885fc, encoded size=9242, 1 rocksdb batches, 8 log entries)
I181009 22:06:46.844303 46267 storage/replica_raftstorage.go:809  [s3,r1/?:/M{in-ax}] applied preemptive snapshot in 1ms [clear=0ms batch=0ms entries=0ms commit=0ms]
I181009 22:06:46.844788 47491 storage/replica_command.go:812  [s1,r1/1:/M{in-ax}] change replicas (ADD_REPLICA (n3,s3):3): read existing descriptor r1:/M{in-ax} [(n1,s1):1, (n2,s2):2, next=3, gen=0]
I181009 22:06:46.846454 47663 rpc/nodedialer/nodedialer.go:92  connection to n1 established
I181009 22:06:46.848351 47491 storage/replica.go:3836  [s1,r1/1:/M{in-ax},txn=73e1ba5d] proposing ADD_REPLICA((n3,s3):3): updated=[(n1,s1):1 (n2,s2):2 (n3,s3):3] next=4
I181009 22:06:46.988405 47491 storage/replica_command.go:298  [s1,r1/1:/M{in-ax}] initiating a split of this range at key "b" [r2]
I181009 22:06:47.007046 47751 storage/replica_proposal.go:211  [s3,r2/3:{b-/Max}] new range lease repl=(n3,s3):3 seq=2 start=0.000000123,279 epo=1 pro=0.000000123,280 following repl=(n1,s1):1 seq=1 start=0.000000000,0 exp=0.900000123,6 pro=0.000000123,7
I181009 22:06:47.007791 47491 storage/replica_command.go:430  [s1,r1/1:{/Min-b}] initiating a merge of r2:{b-/Max} [(n1,s1):1, (n2,s2):2, (n3,s3):3, next=4, gen=0] into this range
I181009 22:06:47.017490 47544 storage/store.go:2740  [s1,r1/1:{/Min-b},txn=b813d14a] removing replica r2/1
I181009 22:06:47.019879 47648 storage/store.go:2740  [s2,r1/2:{/Min-b}] removing replica r2/2
I181009 22:06:47.021164 47756 storage/store.go:2740  [s3,r1/3:{/Min-b}] removing replica r2/3




Please assign, take a look and update the issue accordingly.

@cockroach-teamcity cockroach-teamcity added this to the 2.1 milestone Oct 9, 2018
@cockroach-teamcity cockroach-teamcity added C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. labels Oct 9, 2018
benesch added a commit to benesch/cockroach that referenced this issue Oct 10, 2018
This test could deadlock if the LHS replica on store2 was shut down
before it processed the split at "b". Teach the test to wait for the LHS
replica on store2 to process the split before blocking Raft traffic to
it.

Fixes cockroachdb#31096.
Fixes cockroachdb#31149.
Fixes cockroachdb#31160.
Fixes cockroachdb#31167.

Release note: None
@tbg tbg closed this as completed Oct 10, 2018
craig bot pushed a commit that referenced this issue Oct 10, 2018
31013: kv: try next replica on RangeNotFoundError r=nvanbenschoten,bdarnell a=tschottdorf

Previously, if a Batch RPC came back with a RangeNotFoundError, we would
immediately stop trying to send to more replicas, evict the range
descriptor, and start a new attempt after a back-off.

This new attempt could end up using the same replica, so if the
RangeNotFoundError persisted for some amount of time, so would the
unsuccessful retries for requests to it as DistSender doesn't aggressively
shuffle the replicas.

It turns out that there are such situations, and the election-after-restart
roachtest spuriously hit one of them:

1. new replica receives a preemptive snapshot and the ConfChange
2. cluster restarts
3. now the new replica is in this state until the range wakes
   up, which may not happen for some time. 4. the first request to the range
   runs into the above problem

@nvanbenschoten: I think there is an issue to be filed about the tendency
of DistSender to get stuck in unfortunate configurations.

Fixes #30613.

Release note (bug fix): Avoid repeatedly trying a replica that was found to
be in the process of being added.

31187: roachtest: add synctest r=bdarnell a=tschottdorf

This new roachtest sets up a charybdefs on a single (Ubuntu) node and runs
the `synctest` cli command against a nemesis that injects random I/O
errors.

The synctest command is new. It simulates a Raft log and can be directed at a
filesystem that is being hit with random failures.

The workload essentially writes ascending keys (flushing each one to disk
synchronously) until an I/O error occurs, at which point it re-opens the
instance to verify that all persisted writes are still there. If the
RocksDB instance was permanently corrupted, it switches to a new, pristine,
directory.
This is used in the roachtest, but is also useful for manual use in user
deployments in which we suspect there is a failure to persist data to disk.

This hasn't found anything, but it's fun to watch and also shows us a
number of errors that we know and love from sentry.

Release note: None

31215: storage: deflake TestStoreRangeMergeWatcher r=tschottdorf a=benesch

This test could deadlock if the LHS replica on store2 was shut down
before it processed the split at "b". Teach the test to wait for the LHS
replica on store2 to process the split before blocking Raft traffic to
it.

Fixes #31096.
Fixes #31149.
Fixes #31160.
Fixes #31167.

Release note: None

31217: importccl: add explicit default to mysql testdata timestamp r=dt a=dt

this makes the testdata work on mysql 8.0.2+, where the timestamp type no longer has the implicit defaults.

Release note: none.

31221: cluster: Create final cluster version for 2.1 r=bdarnell a=bdarnell

Release note: None

Co-authored-by: Tobias Schottdorf <[email protected]>
Co-authored-by: Nikhil Benesch <[email protected]>
Co-authored-by: David Taylor <[email protected]>
Co-authored-by: Ben Darnell <[email protected]>
tbg pushed a commit to tbg/cockroach that referenced this issue Oct 11, 2018
This test could deadlock if the LHS replica on store2 was shut down
before it processed the split at "b". Teach the test to wait for the LHS
replica on store2 to process the split before blocking Raft traffic to
it.

Fixes cockroachdb#31096.
Fixes cockroachdb#31149.
Fixes cockroachdb#31160.
Fixes cockroachdb#31167.

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot.
Projects
None yet
Development

No branches or pull requests

2 participants