storage: TestReplicateQueueDownReplicate failed under stress #27783

cockroach-teamcity · 2018-07-20T07:15:02Z

SHA: https://github.com/cockroachdb/cockroach/commits/f38e4c7a2bb3f0e697b25d21bdd8ea67b163c32f

Parameters:

TAGS=
GOFLAGS=-race

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=788264&tab=buildLog

I180720 07:13:01.965860 136721 storage/replica_raftstorage.go:737  [n5,s5,r23/?:{-}] applying preemptive snapshot at index 18 (id=15c12150, encoded size=2250, 1 rocksdb batches, 8 log entries)
I180720 07:13:02.076341 136750 storage/replica_command.go:743  [replicate,n5,s5,r17/2:/Table/2{0-1}] change replicas (REMOVE_REPLICA (n1,s1):1): read existing descriptor r17:/Table/2{0-1} [(n1,s1):1, (n5,s5):2, (n3,s3):3, (n4,s4):4, next=5]
I180720 07:13:01.975992 136673 storage/replica_command.go:743  [replicate,n1,s1,r10/1:/Table/1{3-4}] change replicas (ADD_REPLICA (n3,s3):4): read existing descriptor r10:/Table/1{3-4} [(n1,s1):1, (n4,s4):2, (n5,s5):3, next=4]
I180720 07:13:01.976089 134273 storage/store.go:3585  [n1,s1,r14/1:/Table/1{7-8}] added to replica GC queue (peer suggestion)
W180720 07:13:02.032755 133280 storage/node_liveness.go:551  [n3,hb] slow heartbeat took 1.3s
I180720 07:13:02.140752 136721 storage/replica_raftstorage.go:743  [n5,s5,r23/?:{m-/Table/Syste…}] applied preemptive snapshot in 66ms [clear=0ms batch=0ms entries=10ms commit=51ms]
I180720 07:13:02.414915 136747 storage/store.go:2513  [replicaGC,n1,s1,r14/1:/Table/1{7-8}] removing replica
I180720 07:13:02.531431 136747 storage/replica.go:836  [replicaGC,n1,s1,r14/1:/Table/1{7-8}] removed 8 (0+8) keys in 115ms [clear=0ms commit=115ms]
I180720 07:13:02.547763 133385 storage/compactor/compactor.go:329  [n3,s3,compactor] purging suggested compaction for range /Table/11 - /Table/12 that contains live data
I180720 07:13:02.659944 132120 storage/replica_command.go:743  [n1,s1,r23/1:{m-/Table/Syste…}] change replicas (ADD_REPLICA (n5,s5):5): read existing descriptor r23:{m-/Table/SystemConfigSpan/Start} [(n1,s1):1, (n2,s2):2, (n4,s4):3, (n3,s3):4, next=5]
W180720 07:13:03.098078 132292 storage/store.go:3710  [n1,s1] handle raft ready: 0.5s [processed=0]
I180720 07:13:03.493186 133571 server/status/runtime.go:255  [n4] runtime stats: 1.5 GiB RSS, 1135 goroutines, 82 MiB/22 MiB/132 MiB GO alloc/idle/total, 163 MiB/206 MiB CGO alloc/total, 1263.56cgo/sec, 1.07/0.05 %(u/s)time, 0.03 %gc (5x)
I180720 07:13:03.904263 136825 storage/store.go:2513  [replicaGC,n5,s5,r23/?:{m-/Table/Syste…}] removing replica
I180720 07:13:03.980834 136825 storage/replica.go:836  [replicaGC,n5,s5,r23/?:{m-/Table/Syste…}] removed 7 (0+7) keys in 24ms [clear=0ms commit=24ms]
I180720 07:13:04.237768 136673 storage/replica.go:3613  [n1,s1,r10/1:/Table/1{3-4}] proposing ADD_REPLICA((n3,s3):4): updated=[(n1,s1):1 (n4,s4):2 (n5,s5):3 (n3,s3):4] next=5
I180720 07:13:04.933176 136899 storage/replica_range_lease.go:554  [replicate,n1,s1,r23/1:{m-/Table/Syste…}] transferring lease to s3
I180720 07:13:05.448184 133896 server/status/runtime.go:255  [n5] runtime stats: 1.5 GiB RSS, 1136 goroutines, 78 MiB/23 MiB/132 MiB GO alloc/idle/total, 163 MiB/206 MiB CGO alloc/total, 1195.28cgo/sec, 1.04/0.05 %(u/s)time, 0.03 %gc (4x)
W180720 07:13:05.816937 133905 storage/node_liveness.go:551  [n5,hb] slow heartbeat took 1.1s
I180720 07:13:06.036087 136899 storage/replica_range_lease.go:617  [replicate,n1,s1,r23/1:{m-/Table/Syste…}] done transferring lease to s3: <nil>
I180720 07:13:06.174385 133289 storage/replica_proposal.go:222  [n3,s3,r23/4:{m-/Table/Syste…}] new range lease repl=(n3,s3):4 seq=4 start=1532070784.933585061,0 epo=1 pro=1532070784.933648526,0 following repl=(n1,s1):1 seq=3 start=1532070719.397101945,0 epo=1 pro=1532070727.988842687,0
I180720 07:13:06.203451 136896 storage/replica_range_lease.go:554  [replicate,n1,s1,r10/1:/Table/1{3-4}] transferring lease to s5
E180720 07:13:06.421222 136935 storage/queue.go:778  [replicate,n3,s3,r23/4:{m-/Table/Syste…}] no removable replicas from range that needs a removal: [raft progress unknown]
I180720 07:13:06.497130 133289 rpc/nodedialer/nodedialer.go:90  [n3] connection to n2 established
I180720 07:13:06.498288 133289 rpc/nodedialer/nodedialer.go:90  [n3] connection to n4 established
W180720 07:13:06.529255 133280 storage/node_liveness.go:551  [n3,hb] slow heartbeat took 1.2s
I180720 07:13:06.740574 133655 rpc/nodedialer/nodedialer.go:90  [n4] connection to n3 established
E180720 07:13:06.847208 136952 storage/queue.go:778  [replicate,n3,s3,r23/4:{m-/Table/Syste…}] no removable replicas from range that needs a removal: [raft progress unknown]
I180720 07:13:07.052951 134045 storage/replica_proposal.go:222  [n5,s5,r10/3:/Table/1{3-4}] new range lease repl=(n5,s5):3 seq=4 start=1532070786.203659772,0 epo=1 pro=1532070786.203677847,0 following repl=(n1,s1):1 seq=3 start=1532070719.397101945,0 epo=1 pro=1532070727.991825687,0
I180720 07:13:07.053611 136896 storage/replica_range_lease.go:617  [replicate,n1,s1,r10/1:/Table/1{3-4}] done transferring lease to s5: <nil>
I180720 07:13:07.270823 136953 storage/replica_raftstorage.go:520  [replicate,n1,s1,r3/1:/System/NodeLiveness{-Max}] generated preemptive snapshot 7599b4c2 at index 120
I180720 07:13:07.524854 136953 storage/store_snapshot.go:605  [replicate,n1,s1,r3/1:/System/NodeLiveness{-Max}] streamed snapshot to (n2,s2):?: kv pairs: 77, log entries: 10, rate-limit: 2.0 MiB/sec, 209ms
I180720 07:13:07.572889 137046 storage/replica_raftstorage.go:737  [n2,s2,r3/?:{-}] applying preemptive snapshot at index 120 (id=7599b4c2, encoded size=5074, 1 rocksdb batches, 10 log entries)
I180720 07:13:07.775217 136750 storage/replica.go:3613  [n5,s5,r17/2:/Table/2{0-1}] proposing REMOVE_REPLICA((n1,s1):1): updated=[(n4,s4):4 (n5,s5):2 (n3,s3):3] next=5
I180720 07:13:07.847211 137046 storage/replica_raftstorage.go:743  [n2,s2,r3/?:/System/NodeLiveness{-Max}] applied preemptive snapshot in 274ms [clear=0ms batch=0ms entries=8ms commit=109ms]
I180720 07:13:08.143240 136953 storage/replica_command.go:743  [replicate,n1,s1,r3/1:/System/NodeLiveness{-Max}] change replicas (ADD_REPLICA (n2,s2):4): read existing descriptor r3:/System/NodeLiveness{-Max} [(n1,s1):1, (n5,s5):2, (n3,s3):3, next=4]
I180720 07:13:08.234336 137017 storage/store.go:2513  [replicaGC,n2,s2,r3/?:/System/NodeLiveness{-Max}] removing replica
I180720 07:13:08.261165 137017 storage/replica.go:836  [replicaGC,n2,s2,r3/?:/System/NodeLiveness{-Max}] removed 13 (5+8) keys in 2ms [clear=0ms commit=1ms]
W180720 07:13:08.447191 133898 server/node.go:892  [n5,summaries] health alerts detected: {Alerts:[{StoreID:5 Category:METRICS Description:queue.replicate.process.failure Value:5}]}
W180720 07:13:08.462869 132383 storage/node_liveness.go:551  [n1,hb] slow heartbeat took 1.3s
I180720 07:13:08.887468 134273 storage/store.go:3585  [n1,s1,r17/1:/Table/2{0-1}] added to replica GC queue (peer suggestion)
I180720 07:13:09.168949 132993 rpc/nodedialer/nodedialer.go:90  [n2] connection to n3 established
W180720 07:13:09.180735 133580 storage/node_liveness.go:551  [n4,hb] slow heartbeat took 1.7s
W180720 07:13:09.254219 136919 storage/intent_resolver.go:642  [n1,s1] failed to push during intent resolution: failed to push "change-replica" id=d3859cdb key=/Local/Range"m"/RangeDescriptor rw=true pri=0.00554840 iso=SERIALIZABLE stat=PENDING epo=0 ts=1532070783.097318309,1 orig=1532070782.472610423,0 max=1532070782.472610423,0 wto=false rop=false seq=2
I180720 07:13:09.267071 134273 storage/store.go:3585  [n1,s1,r17/1:/Table/2{0-1}] added to replica GC queue (peer suggestion)
I180720 07:13:09.398274 136914 storage/replica_command.go:743  [replicate,n5,s5,r10/3:/Table/1{3-4}] change replicas (REMOVE_REPLICA (n1,s1):1): read existing descriptor r10:/Table/1{3-4} [(n1,s1):1, (n4,s4):2, (n5,s5):3, (n3,s3):4, next=5]
I180720 07:13:09.573360 132374 server/status/runtime.go:255  [n1] runtime stats: 1.5 GiB RSS, 1175 goroutines, 85 MiB/18 MiB/132 MiB GO alloc/idle/total, 172 MiB/211 MiB CGO alloc/total, 904.44cgo/sec, 0.95/0.05 %(u/s)time, 0.02 %gc (4x)
I180720 07:13:09.839991 137061 storage/store.go:2513  [replicaGC,n1,s1,r17/1:/Table/2{0-1}] removing replica
I180720 07:13:09.912777 137061 storage/replica.go:836  [replicaGC,n1,s1,r17/1:/Table/2{0-1}] removed 8 (0+8) keys in 68ms [clear=0ms commit=68ms]
I180720 07:13:09.947363 133027 gossip/gossip.go:518  [n2] gossip status (ok, 5 nodes)
gossip client (1/3 cur/max conns)
  1: 127.0.0.1:39259 (1m1s: infos 50/1026 sent/received, bytes 17036B/168027B sent/received)
gossip server (0/3 cur/max conns, infos 50/1026 sent/received, bytes 17036B/168027B sent/received)
I180720 07:13:10.021054 133030 server/status/runtime.go:255  [n2] runtime stats: 1.5 GiB RSS, 1175 goroutines, 86 MiB/18 MiB/132 MiB GO alloc/idle/total, 172 MiB/211 MiB CGO alloc/total, 865.72cgo/sec, 0.94/0.05 %(u/s)time, 0.02 %gc (4x)
I180720 07:13:11.351234 133268 gossip/gossip.go:518  [n3] gossip status (ok, 5 nodes)
gossip client (1/3 cur/max conns)
  1: 127.0.0.1:39259 (1m1s: infos 77/457 sent/received, bytes 26002B/86091B sent/received)
gossip server (1/3 cur/max conns, infos 832/533 sent/received, bytes 153495B/113332B sent/received)
  5: 127.0.0.1:42893 (56s)
I180720 07:13:11.533280 133271 server/status/runtime.go:255  [n3] runtime stats: 1.5 GiB RSS, 1172 goroutines, 94 MiB/11 MiB/132 MiB GO alloc/idle/total, 172 MiB/211 MiB CGO alloc/total, 759.23cgo/sec, 0.91/0.05 %(u/s)time, 0.02 %gc (4x)
I180720 07:13:11.534917 132903 rpc/nodedialer/nodedialer.go:90  [intExec=adopt-job] connection to n1 established
I180720 07:13:11.685316 137148 rpc/nodedialer/nodedialer.go:90  [n1] connection to n2 established
W180720 07:13:11.717088 132335 storage/store.go:3710  [n1,s1] handle raft ready: 0.5s [processed=0]
W180720 07:13:12.038202 133905 storage/node_liveness.go:551  [n5,hb] slow heartbeat took 2.8s
I180720 07:13:12.269629 136987 storage/replica_command.go:743  [replicate,n3,s3,r23/4:{m-/Table/Syste…}] change replicas (REMOVE_REPLICA (n1,s1):1): read existing descriptor r23:{m-/Table/SystemConfigSpan/Start} [(n1,s1):1, (n2,s2):2, (n4,s4):3, (n3,s3):4, next=5]
W180720 07:13:12.913692 133039 storage/node_liveness.go:551  [n2,hb] slow heartbeat took 4.4s
W180720 07:13:13.068786 133273 server/node.go:892  [n3,summaries] health alerts detected: {Alerts:[{StoreID:3 Category:METRICS Description:queue.replicate.process.failure Value:2}]}
I180720 07:13:13.399949 133408 gossip/gossip.go:518  [n4] gossip status (ok, 5 nodes)
gossip client (1/3 cur/max conns)
  1: 127.0.0.1:39259 (1m1s: infos 34/1137 sent/received, bytes 10771B/190258B sent/received)
gossip server (0/3 cur/max conns, infos 34/1137 sent/received, bytes 10771B/190258B sent/received)
I180720 07:13:13.432756 133130 rpc/nodedialer/nodedialer.go:90  [intExec=adopt-job] connection to n1 established
I180720 07:13:13.435987 133571 server/status/runtime.go:255  [n4] runtime stats: 1.5 GiB RSS, 1172 goroutines, 80 MiB/24 MiB/132 MiB GO alloc/idle/total, 172 MiB/211 MiB CGO alloc/total, 814.48cgo/sec, 0.90/0.05 %(u/s)time, 0.02 %gc (3x)
I180720 07:13:13.557927 137271 rpc/nodedialer/nodedialer.go:90  [n1] connection to n3 established
I180720 07:13:13.718688 132120 storage/replica_command.go:743  [n1,s1,r23/1:{m-/Table/Syste…}] change replicas (ADD_REPLICA (n5,s5):5): read existing descriptor r23:{m-/Table/SystemConfigSpan/Start} [(n1,s1):1, (n2,s2):2, (n4,s4):3, (n3,s3):4, next=5]
W180720 07:13:13.948740 132293 storage/store.go:3710  [n1,s1] handle raft ready: 1.6s [processed=0]
I180720 07:13:14.293111 133280 storage/node_liveness.go:810  [n3,hb] retrying liveness update after storage.errRetryLiveness: result is ambiguous (context done during DistSender.Send: context deadline exceeded)
W180720 07:13:14.295175 133280 storage/node_liveness.go:551  [n3,hb] slow heartbeat took 4.5s
W180720 07:13:14.295262 133280 storage/node_liveness.go:487  [n3,hb] failed node liveness heartbeat: context deadline exceeded
I180720 07:13:14.511019 136914 storage/replica.go:3613  [n5,s5,r10/3:/Table/1{3-4}] proposing REMOVE_REPLICA((n1,s1):1): updated=[(n3,s3):4 (n4,s4):2 (n5,s5):3] next=5
I180720 07:13:15.086280 133284 storage/replica_proposal.go:222  [n3,s3,r23/4:{m-/Table/Syste…}] new range lease repl=(n3,s3):4 seq=4 start=1532070784.933585061,0 epo=1 pro=1532070794.685048512,0 following repl=(n3,s3):4 seq=4 start=1532070784.933585061,0 epo=1 pro=1532070784.933648526,0
I180720 07:13:15.176361 133475 rpc/nodedialer/nodedialer.go:90  [intExec=adopt-job] connection to n1 established
I180720 07:13:15.221589 133945 gossip/gossip.go:518  [n5] gossip status (ok, 5 nodes)
gossip client (1/3 cur/max conns)
  3: 127.0.0.1:46295 (1m0s: infos 76/810 sent/received, bytes 27241B/136764B sent/received)
gossip server (0/3 cur/max conns, infos 77/835 sent/received, bytes 27492B/147586B sent/received)
I180720 07:13:15.295349 133896 server/status/runtime.go:255  [n5] runtime stats: 1.5 GiB RSS, 1186 goroutines, 62 MiB/37 MiB/132 MiB GO alloc/idle/total, 173 MiB/212 MiB CGO alloc/total, 889.68cgo/sec, 0.90/0.05 %(u/s)time, 0.02 %gc (4x)
I180720 07:13:15.303550 137327 rpc/nodedialer/nodedialer.go:90  [n1] connection to n4 established
I180720 07:13:15.668038 134273 storage/store.go:3585  [n1,s1,r10/1:/Table/1{3-4}] added to replica GC queue (peer suggestion)
I180720 07:13:15.668485 134273 storage/store.go:3585  [n1,s1,r10/1:/Table/1{3-4}] added to replica GC queue (peer suggestion)
I180720 07:13:15.678764 137215 storage/store.go:2513  [replicaGC,n1,s1,r10/1:/Table/1{3-4}] removing replica
W180720 07:13:15.733590 133580 storage/node_liveness.go:551  [n4,hb] slow heartbeat took 3.8s
I180720 07:13:15.828237 137215 storage/replica.go:836  [replicaGC,n1,s1,r10/1:/Table/1{3-4}] removed 415 (407+8) keys in 148ms [clear=0ms commit=148ms]
W180720 07:13:15.868019 132383 storage/node_liveness.go:551  [n1,hb] slow heartbeat took 4.3s
I180720 07:13:16.165772 137274 internal/client/txn.go:703  [n3] async rollback failed: TransactionStatusError: txn record not found (REASON_TXN_NOT_FOUND): "unnamed" id=3d2c9b81 key=/System/NodeLiveness/3 rw=true pri=0.02117441 iso=SERIALIZABLE stat=PENDING epo=0 ts=1532070789.763637353,0 orig=1532070789.763637353,0 max=1532070790.263637353,0 wto=false rop=false seq=4
W180720 07:13:16.199558 133039 storage/node_liveness.go:551  [n2,hb] slow heartbeat took 3.2s
W180720 07:13:16.315980 133280 storage/node_liveness.go:551  [n3,hb] slow heartbeat took 2.0s
W180720 07:13:16.965561 133905 storage/node_liveness.go:551  [n5,hb] slow heartbeat took 3.2s
W180720 07:13:17.265416 132383 storage/node_liveness.go:551  [n1,hb] slow heartbeat took 1.2s
I180720 07:13:17.284315 133744 rpc/nodedialer/nodedialer.go:90  [intExec=adopt-job] connection to n1 established
I180720 07:13:17.327326 137400 rpc/nodedialer/nodedialer.go:90  [n1] connection to n5 established
I180720 07:13:17.361057 136953 storage/replica.go:3613  [n1,s1,r3/1:/System/NodeLiveness{-Max}] proposing ADD_REPLICA((n2,s2):4): updated=[(n1,s1):1 (n5,s5):2 (n3,s3):3 (n2,s2):4] next=5
I180720 07:13:17.662173 137433 storage/replica_range_lease.go:554  [replicate,n1,s1,r3/1:/System/NodeLiveness{-Max}] transferring lease to s3
I180720 07:13:17.671290 136987 storage/replica_command.go:743  [replicate,n3,s3,r23/4:{m-/Table/Syste…}] change replicas (REMOVE_REPLICA (n1,s1):1): read existing descriptor r23:{m-/Table/SystemConfigSpan/Start} [(n1,s1):1, (n2,s2):2, (n4,s4):3, (n3,s3):4, next=5]
W180720 07:13:18.287209 133580 storage/node_liveness.go:551  [n4,hb] slow heartbeat took 1.9s
I180720 07:13:18.424599 137494 storage/replica_raftstorage.go:520  [raftsnapshot,n1,s1,r3/1:/System/NodeLiveness{-Max}] generated Raft snapshot 9e093f87 at index 140
I180720 07:13:18.496901 137494 storage/store_snapshot.go:605  [raftsnapshot,n1,s1,r3/1:/System/NodeLiveness{-Max}] streamed snapshot to (n2,s2):4: kv pairs: 90, log entries: 30, rate-limit: 8.0 MiB/sec, 71ms
I180720 07:13:18.536640 137377 storage/replica_raftstorage.go:737  [n2,s2,r3/4:{-}] applying Raft snapshot at index 140 (id=9e093f87, encoded size=10194, 1 rocksdb batches, 30 log entries)
I180720 07:13:18.750527 137433 storage/replica_range_lease.go:617  [replicate,n1,s1,r3/1:/System/NodeLiveness{-Max}] done transferring lease to s3: <nil>
I180720 07:13:18.816778 137556 storage/replica_raftstorage.go:520  [replicate,n1,s1,r15/1:/Table/1{8-9}] generated preemptive snapshot 7b6e40ae at index 24
I180720 07:13:19.047046 137556 storage/store_snapshot.go:605  [replicate,n1,s1,r15/1:/Table/1{8-9}] streamed snapshot to (n2,s2):?: kv pairs: 11, log entries: 14, rate-limit: 2.0 MiB/sec, 177ms
I180720 07:13:19.191009 137504 storage/replica_raftstorage.go:737  [n2,s2,r15/?:{-}] applying preemptive snapshot at index 24 (id=7b6e40ae, encoded size=5015, 1 rocksdb batches, 14 log entries)
I180720 07:13:19.345643 137504 storage/replica_raftstorage.go:743  [n2,s2,r15/?:/Table/1{8-9}] applied preemptive snapshot in 154ms [clear=0ms batch=0ms entries=152ms commit=1ms]
I180720 07:13:19.506386 137377 storage/replica_raftstorage.go:743  [n2,s2,r3/4:/System/NodeLiveness{-Max}] applied Raft snapshot in 969ms [clear=0ms batch=0ms entries=966ms commit=2ms]
I180720 07:13:19.587895 133304 storage/replica_proposal.go:222  [n3,s3,r23/4:{m-/Table/Syste…}] new range lease repl=(n3,s3):4 seq=4 start=1532070784.933585061,0 epo=1 pro=1532070799.256224670,0 following repl=(n3,s3):4 seq=4 start=1532070784.933585061,0 epo=1 pro=1532070794.685048512,0
I180720 07:13:19.607420 137556 storage/replica_command.go:743  [replicate,n1,s1,r15/1:/Table/1{8-9}] change replicas (ADD_REPLICA (n2,s2):4): read existing descriptor r15:/Table/1{8-9} [(n1,s1):1, (n5,s5):2, (n4,s4):3, next=4]
I180720 07:13:19.747963 132374 server/status/runtime.go:255  [n1] runtime stats: 1.5 GiB RSS, 1186 goroutines, 75 MiB/30 MiB/132 MiB GO alloc/idle/total, 173 MiB/212 MiB CGO alloc/total, 1102.47cgo/sec, 0.91/0.06 %(u/s)time, 0.02 %gc (4x)
I180720 07:13:19.949230 133030 server/status/runtime.go:255  [n2] runtime stats: 1.5 GiB RSS, 1179 goroutines, 75 MiB/31 MiB/132 MiB GO alloc/idle/total, 173 MiB/212 MiB CGO alloc/total, 1111.97cgo/sec, 0.91/0.06 %(u/s)time, 0.02 %gc (4x)
I180720 07:13:20.118395 137623 storage/store.go:2513  [replicaGC,n2,s2,r15/?:/Table/1{8-9}] removing replica
I180720 07:13:20.143669 137623 storage/replica.go:836  [replicaGC,n2,s2,r15/?:/Table/1{8-9}] removed 7 (0+7) keys in 1ms [clear=0ms commit=1ms]
W180720 07:13:20.796173 133905 storage/node_liveness.go:551  [n5,hb] slow heartbeat took 2.5s
I180720 07:13:21.072990 137591 storage/node_liveness.go:749  [n2,s2,r23/2:{m-/Table/Syste…}] incremented n3 liveness epoch to 2
I180720 07:13:21.146982 133280 storage/node_liveness.go:810  [n3,hb] retrying liveness update after storage.errRetryLiveness: TransactionStatusError: could not commit in one phase as requested (REASON_UNKNOWN)
W180720 07:13:21.147441 133280 storage/node_liveness.go:551  [n3,hb] slow heartbeat took 2.4s
I180720 07:13:21.147547 133280 storage/node_liveness.go:484  [n3,hb] heartbeat failed on epoch increment; retrying
W180720 07:13:21.147963 137651 storage/node_liveness.go:551  [n3,s3,r23/4:{m-/Table/Syste…}] slow heartbeat took 1.5s
E180720 07:13:21.148355 137651 storage/replica_range_lease.go:282  [n3,s3,r23/4:{m-/Table/Syste…}] heartbeat failed on epoch increment
I180720 07:13:21.226472 133271 server/status/runtime.go:255  [n3] runtime stats: 1.5 GiB RSS, 1173 goroutines, 78 MiB/23 MiB/132 MiB GO alloc/idle/total, 172 MiB/212 MiB CGO alloc/total, 1146.58cgo/sec, 0.92/0.06 %(u/s)time, 0.02 %gc (3x)
I180720 07:13:21.558087 137556 storage/replica.go:3613  [n1,s1,r15/1:/Table/1{8-9}] proposing ADD_REPLICA((n2,s2):4): updated=[(n1,s1):1 (n5,s5):2 (n4,s4):3 (n2,s2):4] next=5
I180720 07:13:22.065218 133039 storage/node_liveness.go:810  [n2,hb] retrying liveness update after storage.errRetryLiveness: result is ambiguous (context done during DistSender.Send: context deadline exceeded)
W180720 07:13:22.065597 133039 storage/node_liveness.go:551  [n2,hb] slow heartbeat took 4.5s
W180720 07:13:22.065683 133039 storage/node_liveness.go:487  [n2,hb] failed node liveness heartbeat: context deadline exceeded
I180720 07:13:22.102955 133088 storage/replica_proposal.go:222  [n2,s2,r23/2:{m-/Table/Syste…}] new range lease repl=(n2,s2):2 seq=5 start=1532070799.323860564,0 epo=1 pro=1532070799.324024703,0 following repl=(n3,s3):4 seq=4 start=1532070784.933585061,0 epo=1 pro=1532070799.256224670,0
W180720 07:13:22.274176 132376 server/node.go:892  [n1,summaries] health alerts detected: {Alerts:[{StoreID:1 Category:METRICS Description:ranges.underreplicated Value:9}]}
I180720 07:13:22.708801 137734 storage/replica_range_lease.go:554  [replicate,n1,s1,r15/1:/Table/1{8-9}] transferring lease to s5
W180720 07:13:22.726774 132383 storage/node_liveness.go:551  [n1,hb] slow heartbeat took 2.2s
W180720 07:13:22.763036 137683 storage/node_liveness.go:551  [n3,s3,r23/4:{m-/Table/Syste…}] slow heartbeat took 1.6s
W180720 07:13:22.767597 133580 storage/node_liveness.go:551  [n4,hb] slow heartbeat took 1.8s
W180720 07:13:22.816585 133280 storage/node_liveness.go:551  [n3,hb] slow heartbeat took 1.6s
E180720 07:13:23.080184 133326 storage/replica_proposal.go:351  [n3,s3,r3/3:/System/NodeLiveness{-Max}] couldn't scan node liveness records in span /System/NodeLiveness{-Max}: conflicting intents on /System/NodeLiveness/2
I180720 07:13:23.153197 137665 storage/replica_raftstorage.go:520  [raftsnapshot,n1,s1,r15/1:/Table/1{8-9}] generated Raft snapshot b9823c22 at index 27
I180720 07:13:23.286648 133106 storage/compactor/compactor.go:329  [n2,s2,compactor] purging suggested compaction for range /System/NodeLiveness - /System/NodeLivenessMax that contains live data
I180720 07:13:23.516222 133571 server/status/runtime.go:255  [n4] runtime stats: 1.5 GiB RSS, 1183 goroutines, 88 MiB/16 MiB/132 MiB GO alloc/idle/total, 173 MiB/212 MiB CGO alloc/total, 1061.25cgo/sec, 0.90/0.06 %(u/s)time, 0.03 %gc (4x)
I180720 07:13:23.547040 137665 storage/store_snapshot.go:605  [raftsnapshot,n1,s1,r15/1:/Table/1{8-9}] streamed snapshot to (n2,s2):4: kv pairs: 13, log entries: 17, rate-limit: 8.0 MiB/sec, 248ms
I180720 07:13:23.569984 137779 storage/replica_raftstorage.go:737  [n2,s2,r15/4:{-}] applying Raft snapshot at index 27 (id=b9823c22, encoded size=6724, 1 rocksdb batches, 17 log entries)
I180720 07:13:23.570068 137734 storage/replica_range_lease.go:617  [replicate,n1,s1,r15/1:/Table/1{8-9}] done transferring lease to s5: <nil>
I180720 07:13:23.612534 137779 storage/replica_raftstorage.go:743  [n2,s2,r15/4:/Table/1{8-9}] applied Raft snapshot in 42ms [clear=0ms batch=0ms entries=17ms commit=24ms]
I180720 07:13:23.723461 134022 storage/replica_proposal.go:222  [n5,s5,r15/2:/Table/1{8-9}] new range lease repl=(n5,s5):2 seq=4 start=1532070802.709038741,0 epo=1 pro=1532070802.709078160,0 following repl=(n1,s1):1 seq=3 start=1532070719.397101945,0 epo=1 pro=1532070728.392821434,0
I180720 07:13:23.748673 137707 storage/replica_raftstorage.go:520  [replicate,n1,s1,r8/1:/Table/1{1-2}] generated preemptive snapshot 0add5feb at index 28
I180720 07:13:23.772849 133064 storage/replica_proposal.go:222  [n2,s2,r23/2:{m-/Table/Syste…}] new range lease repl=(n2,s2):2 seq=5 start=1532070799.323860564,0 epo=1 pro=1532070802.217082058,0 following repl=(n2,s2):2 seq=5 start=1532070799.323860564,0 epo=1 pro=1532070799.324024703,0
I180720 07:13:23.900280 137707 storage/store_snapshot.go:605  [replicate,n1,s1,r8/1:/Table/1{1-2}] streamed snapshot to (n5,s5):?: kv pairs: 12, log entries: 18, rate-limit: 2.0 MiB/sec, 150ms
E180720 07:13:23.928047 137811 storage/queue.go:778  [replicate,n5,s5,r15/2:/Table/1{8-9}] no removable replicas from range that needs a removal: [raft progress unknown]
I180720 07:13:23.986050 137798 storage/replica_raftstorage.go:737  [n5,s5,r8/?:{-}] applying preemptive snapshot at index 28 (id=0add5feb, encoded size=6825, 1 rocksdb batches, 18 log entries)
I180720 07:13:24.055082 137798 storage/replica_raftstorage.go:743  [n5,s5,r8/?:/Table/1{1-2}] applied preemptive snapshot in 61ms [clear=0ms batch=0ms entries=59ms commit=0ms]
W180720 07:13:24.193673 133905 storage/node_liveness.go:551  [n5,hb] slow heartbeat took 1.4s
W180720 07:13:24.287662 133273 server/node.go:892  [n3,summaries] health alerts detected: {Alerts:[{StoreID:0 Category:METRICS Description:liveness.heartbeatfailures Value:3}]}
W180720 07:13:24.385144 132306 storage/store.go:3710  [n1,s1] handle raft ready: 1.1s [processed=0]
I180720 07:13:24.524095 137804 storage/store.go:2513  [replicaGC,n5,s5,r8/?:/Table/1{1-2}] removing replica
I180720 07:13:24.532128 137804 storage/replica.go:836  [replicaGC,n5,s5,r8/?:/Table/1{1-2}] removed 8 (1+7) keys in 7ms [clear=0ms commit=7ms]
I180720 07:13:24.553323 137786 storage/replica_command.go:743  [replicate,n5,s5,r15/2:/Table/1{8-9}] change replicas (REMOVE_REPLICA (n1,s1):1): read existing descriptor r15:/Table/1{8-9} [(n1,s1):1, (n5,s5):2, (n4,s4):3, (n2,s2):4, next=5]
I180720 07:13:24.598019 137707 storage/replica_command.go:743  [replicate,n1,s1,r8/1:/Table/1{1-2}] change replicas (ADD_REPLICA (n5,s5):4): read existing descriptor r8:/Table/1{1-2} [(n1,s1):1, (n4,s4):2, (n3,s3):3, next=4]
W180720 07:13:24.986171 133280 storage/node_liveness.go:551  [n3,hb] slow heartbeat took 1.7s
I180720 07:13:25.296091 133896 server/status/runtime.go:255  [n5] runtime stats: 1.5 GiB RSS, 1183 goroutines, 60 MiB/39 MiB/132 MiB GO alloc/idle/total, 174 MiB/216 MiB CGO alloc/total, 992.89cgo/sec, 0.92/0.07 %(u/s)time, 0.03 %gc (4x)
W180720 07:13:25.382179 133573 server/node.go:892  [n4,summaries] health alerts detected: {Alerts:[{StoreID:4 Category:METRICS Description:ranges.underreplicated Value:1}]}
W180720 07:13:25.788450 133039 storage/node_liveness.go:551  [n2,hb] slow heartbeat took 3.7s
W180720 07:13:25.789426 137745 storage/node_liveness.go:551  [n2,s2,r23/2:{m-/Table/Syste…}] slow heartbeat took 1.9s
E180720 07:13:25.841852 137741 storage/replica_range_lease.go:294  [n3,s3,r23/4:{m-/Table/Syste…}] mismatch incrementing epoch for {NodeID:2 Epoch:1 Expiration:1532070802.460660123,0 Draining:false Decommissioning:false}; actual is {NodeID:2 Epoch:1 Expiration:1532070811.565963513,0 Draining:false Decommissioning:false}
W180720 07:13:25.931598 132328 storage/store.go:3710  [n1,s1] handle raft ready: 0.6s [processed=0]
W180720 07:13:26.492486 133580 storage/node_liveness.go:551  [n4,hb] slow heartbeat took 1.1s
E180720 07:13:26.506184 137693 storage/replica_range_lease.go:294  [n4,s4,r23/3:{m-/Table/Syste…}] mismatch incrementing epoch for {NodeID:2 Epoch:1 Expiration:1532070802.460660123,0 Draining:false Decommissioning:false}; actual is {NodeID:2 Epoch:1 Expiration:1532070811.565963513,0 Draining:false Decommissioning:false}
I180720 07:13:27.199266 132993 storage/replica_proposal.go:222  [n2,s2,r23/2:{m-/Table/Syste…}] new range lease repl=(n2,s2):2 seq=5 start=1532070799.323860564,0 epo=1 pro=1532070803.846490469,0 following repl=(n2,s2):2 seq=5 start=1532070799.323860564,0 epo=1 pro=1532070802.217082058,0
I180720 07:13:27.898413 137707 storage/replica.go:3613  [n1,s1,r8/1:/Table/1{1-2}] proposing ADD_REPLICA((n5,s5):4): updated=[(n1,s1):1 (n4,s4):2 (n3,s3):3 (n5,s5):4] next=5
W180720 07:13:28.092934 132309 storage/store.go:3710  [n1,s1] handle raft ready: 0.7s [processed=0]
W180720 07:13:28.764747 133039 storage/node_liveness.go:551  [n2,hb] slow heartbeat took 2.0s
W180720 07:13:29.001730 133898 server/node.go:892  [n5,summaries] health alerts detected: {Alerts:[{StoreID:5 Category:METRICS Description:queue.replicate.process.failure Value:1}]}
I180720 07:13:29.009625 137900 storage/replica_range_lease.go:554  [replicate,n1,s1,r8/1:/Table/1{1-2}] transferring lease to s3
W180720 07:13:29.345394 133905 storage/node_liveness.go:551  [n5,hb] slow heartbeat took 2.1s
I180720 07:13:29.513541 132374 server/status/runtime.go:255  [n1] runtime stats: 1.5 GiB RSS, 1171 goroutines, 77 MiB/27 MiB/132 MiB GO alloc/idle/total, 174 MiB/216 MiB CGO alloc/total, 843.57cgo/sec, 0.94/0.05 %(u/s)time, 0.02 %gc (4x)
I180720 07:13:29.674923 138003 storage/replica_raftstorage.go:520  [raftsnapshot,n1,s1,r8/1:/Table/1{1-2}] generated Raft snapshot d815bada at index 33
I180720 07:13:29.734368 138003 storage/store_snapshot.go:605  [raftsnapshot,n1,s1,r8/1:/Table/1{1-2}] streamed snapshot to (n5,s5):4: kv pairs: 14, log entries: 23, rate-limit: 8.0 MiB/sec, 31ms
I180720 07:13:29.749235 137921 storage/replica_raftstorage.go:737  [n5,s5,r8/4:{-}] applying Raft snapshot at index 33 (id=d815bada, encoded size=8791, 1 rocksdb batches, 23 log entries)
I180720 07:13:29.975054 133030 server/status/runtime.go:255  [n2] runtime stats: 1.5 GiB RSS, 1176 goroutines, 58 MiB/41 MiB/132 MiB GO alloc/idle/total, 174 MiB/216 MiB CGO alloc/total, 856.93cgo/sec, 0.94/0.05 %(u/s)time, 0.02 %gc (4x)
I180720 07:13:30.032557 133323 storage/replica_proposal.go:222  [n3,s3,r8/3:/Table/1{1-2}] new range lease repl=(n3,s3):3 seq=4 start=1532070809.009938080,0 epo=2 pro=1532070809.009978494,0 following repl=(n1,s1):1 seq=3 start=1532070719.397101945,0 epo=1 pro=1532070728.466044520,0
I180720 07:13:30.085354 137900 storage/replica_range_lease.go:617  [replicate,n1,s1,r8/1:/Table/1{1-2}] done transferring lease to s3: <nil>
I180720 07:13:30.091525 137957 storage/replica_raftstorage.go:520  [replicate,n1,s1,r11/1:/Table/1{4-5}] generated preemptive snapshot 1cd660b7 at index 24
W180720 07:13:30.128023 133280 storage/node_liveness.go:551  [n3,hb] slow heartbeat took 2.4s
I180720 07:13:30.200104 137957 storage/store_snapshot.go:605  [replicate,n1,s1,r11/1:/Table/1{4-5}] streamed snapshot to (n5,s5):?: kv pairs: 11, log entries: 14, rate-limit: 2.0 MiB/sec, 107ms
I180720 07:13:30.205190 137960 storage/replica_raftstorage.go:737  [n5,s5,r11/?:{-}] applying preemptive snapshot at index 24 (id=1cd660b7, encoded size=4738, 1 rocksdb batches, 14 log entries)
I180720 07:13:31.212212 133271 server/status/runtime.go:255  [n3] runtime stats: 1.5 GiB RSS, 1180 goroutines, 77 MiB/25 MiB/132 MiB GO alloc/idle/total, 174 MiB/216 MiB CGO alloc/total, 916.30cgo/sec, 0.95/0.05 %(u/s)time, 0.02 %gc (4x)
W180720 07:13:31.683189 132383 storage/node_liveness.go:551  [n1,hb] slow heartbeat took 2.2s
W180720 07:13:31.794581 133032 server/node.go:892  [n2,summaries] health alerts detected: {Alerts:[{StoreID:0 Category:METRICS Description:liveness.heartbeatfailures Value:1}]}
W180720 07:13:31.927129 133580 storage/node_liveness.go:551  [n4,hb] slow heartbeat took 2.0s
I180720 07:13:32.695559 132120 storage/replica_command.go:743  [n1,s1,r23/1:{m-/Table/Syste…}] change replicas (ADD_REPLICA (n5,s5):5): read existing descriptor r23:{m-/Table/SystemConfigSpan/Start} [(n1,s1):1, (n2,s2):2, (n4,s4):3, (n3,s3):4, next=5]
I180720 07:13:32.813460 137786 storage/replica.go:3613  [n5,s5,r15/2:/Table/1{8-9}] proposing REMOVE_REPLICA((n1,s1):1): updated=[(n2,s2):4 (n5,s5):2 (n4,s4):3] next=5
I180720 07:13:33.370786 137960 storage/replica_raftstorage.go:743  [n5,s5,r11/?:/Table/1{4-5}] applied preemptive snapshot in 3159ms [clear=0ms batch=0ms entries=3005ms commit=144ms]
I180720 07:13:33.452365 133571 server/status/runtime.go:255  [n4] runtime stats: 1.5 GiB RSS, 1181 goroutines, 60 MiB/41 MiB/132 MiB GO alloc/idle/total, 174 MiB/216 MiB CGO alloc/total, 1016.19cgo/sec, 0.96/0.05 %(u/s)time, 0.01 %gc (4x)
I180720 07:13:33.534360 137957 storage/replica_command.go:743  [replicate,n1,s1,r11/1:/Table/1{4-5}] change replicas (ADD_REPLICA (n5,s5):4): read existing descriptor r11:/Table/1{4-5} [(n1,s1):1, (n3,s3):2, (n2,s2):3, next=4]
I180720 07:13:33.636468 138056 storage/store.go:2513  [replicaGC,n5,s5,r11/?:/Table/1{4-5}] removing replica
I180720 07:13:33.736596 138058 storage/replica_raftstorage.go:520  [replicate,n4,s4,r2/2:/System/{-NodeLive…}] generated preemptive snapshot b4fd59eb at index 56
I180720 07:13:33.838566 138058 storage/store_snapshot.go:605  [replicate,n4,s4,r2/2:/System/{-NodeLive…}] streamed snapshot to (n3,s3):?: kv pairs: 13, log entries: 6, rate-limit: 2.0 MiB/sec, 58ms
I180720 07:13:33.846735 138061 storage/replica_raftstorage.go:737  [n3,s3,r2/?:{-}] applying preemptive snapshot at index 56 (id=b4fd59eb, encoded size=2061, 1 rocksdb batches, 6 log entries)
W180720 07:13:33.905536 133039 storage/node_liveness.go:551  [n2,hb] slow heartbeat took 2.9s
I180720 07:13:33.979686 138056 storage/replica.go:836  [replicaGC,n5,s5,r11/?:/Table/1{4-5}] removed 7 (0+7) keys in 342ms [clear=12ms commit=330ms]
I180720 07:13:34.119478 137590 storage/replica_command.go:743  [replicate,n2,s2,r23/2:{m-/Table/Syste…}] change replicas (REMOVE_REPLICA (n1,s1):1): read existing descriptor r23:{m-/Table/SystemConfigSpan/Start} [(n1,s1):1, (n2,s2):2, (n4,s4):3, (n3,s3):4, next=5]
I180720 07:13:34.127976 138149 storage/replica_raftstorage.go:520  [replicate,n5,s5,r14/2:/Table/1{7-8}] generated preemptive snapshot cf7aa894 at index 36
I180720 07:13:34.205223 138149 storage/store_snapshot.go:605  [replicate,n5,s5,r14/2:/Table/1{7-8}] streamed snapshot to (n4,s4):?: kv pairs: 13, log entries: 26, rate-limit: 2.0 MiB/sec, 60ms
I180720 07:13:34.340194 134273 storage/store.go:3585  [n1,s1,r15/1:/Table/1{8-9}] added to replica GC queue (peer suggestion)
I180720 07:13:34.362288 138163 storage/replica_raftstorage.go:737  [n4,s4,r14/?:{-}] applying preemptive snapshot at index 36 (id=cf7aa894, encoded size=8317, 1 rocksdb batches, 26 log entries)
I180720 07:13:34.364988 134273 storage/store.go:3585  [n1,s1,r15/1:/Table/1{8-9}] added to replica GC queue (peer suggestion)
I180720 07:13:34.459972 137921 storage/replica_raftstorage.go:743  [n5,s5,r8/4:/Table/1{1-2}] applied Raft snapshot in 4710ms [clear=0ms batch=0ms entries=4543ms commit=166ms]
W180720 07:13:34.464095 133905 storage/node_liveness.go:551  [n5,hb] slow heartbeat took 2.7s
W180720 07:13:34.930417 133280 storage/node_liveness.go:551  [n3,hb] slow heartbeat took 2.6s
I180720 07:13:35.164692 138102 storage/store.go:2513  [replicaGC,n1,s1,r15/1:/Table/1{8-9}] removing replica
W180720 07:13:35.245512 132293 storage/store.go:3710  [n1,s1] handle raft ready: 0.5s [processed=0]
I180720 07:13:35.311015 133896 server/status/runtime.go:255  [n5] runtime stats: 1.5 GiB RSS, 1179 goroutines, 50 MiB/47 MiB/132 MiB GO alloc/idle/total, 175 MiB/216 MiB CGO alloc/total, 1122.22cgo/sec, 0.95/0.05 %(u/s)time, 0.02 %gc (4x)
I180720 07:13:35.344469 138102 storage/replica.go:836  [replicaGC,n1,s1,r15/1:/Table/1{8-9}] removed 8 (0+8) keys in 177ms [clear=0ms commit=177ms]
I180720 07:13:35.356474 138061 storage/replica_raftstorage.go:743  [n3,s3,r2/?:/System/{-NodeLive…}] applied preemptive snapshot in 1509ms [clear=0ms batch=0ms entries=1330ms commit=176ms]
I180720 07:13:35.412146 138058 storage/replica_command.go:743  [replicate,n4,s4,r2/2:/System/{-NodeLive…}] change replicas (ADD_REPLICA (n3,s3):5): read existing descriptor r2:/System/{-NodeLiveness} [(n2,s2):4, (n4,s4):2, (n5,s5):3, next=5]
I180720 07:13:35.477860 138087 storage/store.go:2513  [replicaGC,n3,s3,r2/?:/System/{-NodeLive…}] removing replica
I180720 07:13:35.767604 138087 storage/replica.go:836  [replicaGC,n3,s3,r2/?:/System/{-NodeLive…}] removed 7 (0+7) keys in 248ms [clear=0ms commit=248ms]
W180720 07:13:35.965116 132383 storage/node_liveness.go:551  [n1,hb] slow heartbeat took 1.9s
W180720 07:13:36.747833 133580 storage/node_liveness.go:551  [n4,hb] slow heartbeat took 2.3s
I180720 07:13:36.771907 136987 storage/replica_command.go:743  [replicate,n3,s3,r23/4:{m-/Table/Syste…}] change replicas (REMOVE_REPLICA (n1,s1):1): read existing descriptor r23:{m-/Table/SystemConfigSpan/Start} [(n1,s1):1, (n2,s2):2, (n4,s4):3, (n3,s3):4, next=5]
I180720 07:13:38.221182 137957 storage/replica.go:3613  [n1,s1,r11/1:/Table/1{4-5}] proposing ADD_REPLICA((n5,s5):4): updated=[(n1,s1):1 (n3,s3):2 (n2,s2):3 (n5,s5):4] next=5
W180720 07:13:38.332266 133039 storage/node_liveness.go:551  [n2,hb] slow heartbeat took 2.9s
W180720 07:13:38.522835 133905 storage/node_liveness.go:551  [n5,hb] slow heartbeat took 2.3s
I180720 07:13:38.737719 138262 storage/replica_range_lease.go:554  [replicate,n1,s1,r11/1:/Table/1{4-5}] transferring lease to s2
W180720 07:13:38.903233 133280 storage/node_liveness.go:551  [n3,hb] slow heartbeat took 2.1s
I180720 07:13:38.911166 138163 storage/replica_raftstorage.go:743  [n4,s4,r14/?:/Table/1{7-8}] applied preemptive snapshot in 4548ms [clear=0ms batch=0ms entries=4438ms commit=109ms]
I180720 07:13:39.019017 138149 storage/replica_command.go:743  [replicate,n5,s5,r14/2:/Table/1{7-8}] change replicas (ADD_REPLICA (n4,s4):5): read existing descriptor r14:/Table/1{7-8} [(n3,s3):4, (n5,s5):2, (n2,s2):3, next=5]
I180720 07:13:39.289160 138263 storage/store.go:2513  [replicaGC,n4,s4,r14/?:/Table/1{7-8}] removing replica
I180720 07:13:39.512738 138263 storage/replica.go:836  [replicaGC,n4,s4,r14/?:/Table/1{7-8}] removed 7 (0+7) keys in 190ms [clear=0ms commit=190ms]
I180720 07:13:39.642351 138262 storage/replica_range_lease.go:617  [replicate,n1,s1,r11/1:/Table/1{4-5}] done transferring lease to s2: <nil>
I180720 07:13:39.648231 132374 server/status/runtime.go:255  [n1] runtime stats: 1.5 GiB RSS, 1177 goroutines, 83 MiB/19 MiB/132 MiB GO alloc/idle/total, 175 MiB/216 MiB CGO alloc/total, 1320.23cgo/sec, 0.93/0.05 %(u/s)time, 0.02 %gc (3x)
I180720 07:13:39.668650 138301 storage/replica_raftstorage.go:520  [replicate,n1,s1,r4/1:/System/{NodeLive…-tsd}] generated preemptive snapshot b41ffabd at index 119
I180720 07:13:39.661142 134065 storage/compactor/compactor.go:329  [n5,s5,compactor] purging suggested compaction for range /Table/11 - /Table/12 that contains live data
I180720 07:13:39.708873 138309 storage/replica_raftstorage.go:520  [raftsnapshot,n1,s1,r11/1:/Table/1{4-5}] generated Raft snapshot 054f4692 at index 30
I180720 07:13:39.721723 138301 storage/store_snapshot.go:605  [replicate,n1,s1,r4/1:/System/{NodeLive…-tsd}] streamed snapshot to (n3,s3):?: kv pairs: 47, log entries: 4, rate-limit: 2.0 MiB/sec, 43ms
I180720 07:13:39.769690 133061 storage/replica_proposal.go:222  [n2,s2,r11/3:/Table/1{4-5}] new range lease repl=(n2,s2):3 seq=4 start=1532070818.739099309,0 epo=1 pro=1532070818.739142951,0 following repl=(n1,s1):1 seq=3 start=1532070719.397101945,0 epo=1 pro=1532070728.228591848,0
I180720 07:13:39.848790 138268 storage/replica_raftstorage.go:737  [n3,s3,r4/?:{-}] applying preemptive snapshot at index 119 (id=b41ffabd, encoded size=138142, 1 rocksdb batches, 4 log entries)
I180720 07:13:39.900280 138309 storage/store_snapshot.go:605  [raftsnapshot,n1,s1,r11/1:/Table/1{4-5}] streamed snapshot to (n5,s5):4: kv pairs: 13, log entries: 20, rate-limit: 8.0 MiB/sec, 188ms
I180720 07:13:39.925736 138312 storage/replica_raftstorage.go:737  [n5,s5,r11/4:{-}] applying Raft snapshot at index 30 (id=054f4692, encoded size=7116, 1 rocksdb batches, 20 log entries)
I180720 07:13:40.281424 133030 server/status/runtime.go:255  [n2] runtime stats: 1.5 GiB RSS, 1186 goroutines, 93 MiB/12 MiB/132 MiB GO alloc/idle/total, 176 MiB/216 MiB CGO alloc/total, 1293.56cgo/sec, 0.93/0.05 %(u/s)time, 0.02 %gc (4x)
W180720 07:13:40.327121 132383 storage/node_liveness.go:551  [n1,hb] slow heartbeat took 1.8s
I180720 07:13:40.555520 138312 storage/replica_raftstorage.go:743  [n5,s5,r11/4:/Table/1{4-5}] applied Raft snapshot in 629ms [clear=0ms batch=0ms entries=628ms commit=1ms]
I180720 07:13:40.713751 138268 storage/replica_raftstorage.go:743  [n3,s3,r4/?:/System/{NodeLive…-tsd}] applied preemptive snapshot in 865ms [clear=0ms batch=9ms entries=479ms commit=303ms]
I180720 07:13:40.832940 138301 storage/replica_command.go:743  [replicate,n1,s1,r4/1:/System/{NodeLive…-tsd}] change replicas (ADD_REPLICA (n3,s3):4): read existing descriptor r4:/System/{NodeLivenessMax-tsd} [(n1,s1):1, (n4,s4):2, (n5,s5):3, next=4]
I180720 07:13:41.119315 138058 storage/replica.go:3613  [n4,s4,r2/2:/System/{-NodeLive…}] proposing ADD_REPLICA((n3,s3):5): updated=[(n2,s2):4 (n4,s4):2 (n5,s5):3 (n3,s3):5] next=6
I180720 07:13:41.279850 133271 server/status/runtime.go:255  [n3] runtime stats: 1.5 GiB RSS, 1182 goroutines, 67 MiB/31 MiB/132 MiB GO alloc/idle/total, 175 MiB/216 MiB CGO alloc/total, 1245.30cgo/sec, 0.93/0.05 %(u/s)time, 0.02 %gc (4x)
W180720 07:13:41.859722 133580 storage/node_liveness.go:551  [n4,hb] slow heartbeat took 3.0s
I180720 07:13:42.589014 132905 rpc/nodedialer/nodedialer.go:90  [intExec=adopt-job] connection to n1 established
W180720 07:13:42.709797 133051 storage/store.go:3710  [n2,s2] handle raft ready: 0.5s [processed=0]
I180720 07:13:42.714449 138376 storage/store.go:2513  [replicaGC,n3,s3,r4/?:/System/{NodeLive…-tsd}] removing replica
I180720 07:13:42.783577 138424 rpc/nodedialer/nodedialer.go:90  [n1] connection to n2 established
W180720 07:13:43.010180 133039 storage/node_liveness.go:551  [n2,hb] slow heartbeat took 3.0s
I180720 07:13:43.086395 138376 storage/replica.go:836  [replicaGC,n3,s3,r4/?:/System/{NodeLive…-tsd}] removed 30 (22+8) keys in 371ms [clear=0ms commit=371ms]
I180720 07:13:43.410983 133571 server/status/runtime.go:255  [n4] runtime stats: 1.5 GiB RSS, 1177 goroutines, 59 MiB/40 MiB/132 MiB GO alloc/idle/total, 175 MiB/216 MiB CGO alloc/total, 1209.62cgo/sec, 0.94/0.05 %(u/s)time, 0.03 %gc (4x)
W180720 07:13:43.548738 133905 storage/node_liveness.go:551  [n5,hb] slow heartbeat took 2.7s
W180720 07:13:43.669944 132317 storage/store.go:3710  [n1,s1] handle raft ready: 0.7s [processed=0]
I180720 07:13:43.869827 138534 storage/replica_command.go:743  [replicate,n4,s4,r2/2:/System/{-NodeLive…}] change replicas (REMOVE_REPLICA (n2,s2):4): read existing descriptor r2:/System/{-NodeLiveness} [(n2,s2):4, (n4,s4):2, (n5,s5):3, (n3,s3):5, next=6]
W180720 07:13:44.149530 133322 storage/store.go:3710  [n3,s3] handle raft ready: 0.5s [processed=0]
I180720 07:13:44.174920 138476 storage/replica_raftstorage.go:520  [raftsnapshot,n4,s4,r2/2:/System/{-NodeLive…}] generated Raft snapshot 98f9de4d at index 67
I180720 07:13:44.339695 138476 storage/store_snapshot.go:605  [raftsnapshot,n4,s4,r2/2:/System/{-NodeLive…}] streamed snapshot to (n3,s3):5: kv pairs: 15, log entries: 17, rate-limit: 8.0 MiB/sec, 162ms
I180720 07:13:44.354398 138479 storage/replica_raftstorage.go:737  [n3,s3,r2/5:{-}] applying Raft snapshot at index 67 (id=98f9de4d, encoded size=5322, 1 rocksdb batches, 17 log entries)
I180720 07:13:44.841108 138479 storage/replica_raftstorage.go:743  [n3,s3,r2/5:/System/{-NodeLive…}] applied Raft snapshot in 486ms [clear=0ms batch=0ms entries=471ms commit=14ms]
I180720 07:13:44.983372 133133 rpc/nodedialer/nodedialer.go:90  [intExec=adopt-job] connection to n1 established
I180720 07:13:45.018750 138567 rpc/nodedialer/nodedialer.go:90  [n1] connection to n3 established
I180720 07:13:45.296024 133896 server/status/runtime.go:255  [n5] runtime stats: 1.5 GiB RSS, 1183 goroutines, 86 MiB/16 MiB/132 MiB GO alloc/idle/total, 175 MiB/216 MiB CGO alloc/total, 1208.81cgo/sec, 0.92/0.05 %(u/s)time, 0.02 %gc (3x)
W180720 07:13:46.248942 133280 storage/node_liveness.go:551  [n3,hb] slow heartbeat took 5.0s
I180720 07:13:46.263955 133377 rpc/nodedialer/nodedialer.go:90  [intExec=adopt-job] connection to n1 established
W180720 07:13:46.285814 132307 storage/store.go:3710  [n1,s1] handle raft ready: 0.6s [processed=0]
I180720 07:13:46.482760 138545 rpc/nodedialer/nodedialer.go:90  [n1] connection to n4 established
W180720 07:13:46.529505 133580 storage/node_liveness.go:551  [n4,hb] slow heartbeat took 3.1s
W180720 07:13:46.640106 132383 storage/node_liveness.go:551  [n1,hb] slow heartbeat took 3.6s
I180720 07:13:46.958216 138149 storage/replica.go:3613  [n5,s5,r14/2:/Table/1{7-8}] proposing ADD_REPLICA((n4,s4):5): updated=[(n3,s3):4 (n5,s5):2 (n2,s2):3 (n4,s4):5] next=6
I180720 07:13:47.109777 137590 storage/replica.go:3613  [n2,s2,r23/2:{m-/Table/Syste…}] proposing REMOVE_REPLICA((n1,s1):1): updated=[(n3,s3):4 (n2,s2):2 (n4,s4):3] next=5
I180720 07:13:47.433556 138534 storage/replica.go:3613  [n4,s4,r2/2:/System/{-NodeLive…}] proposing REMOVE_REPLICA((n2,s2):4): updated=[(n3,s3):5 (n4,s4):2 (n5,s5):3] next=6
W180720 07:13:47.518052 133039 storage/node_liveness.go:551  [n2,hb] slow heartbeat took 3.0s
W180720 07:13:47.662002 133280 storage/node_liveness.go:551  [n3,hb] slow heartbeat took 1.4s
W180720 07:13:47.750123 133905 storage/node_liveness.go:551  [n5,hb] slow heartbeat took 2.5s
I180720 07:13:47.843919 138301 storage/replica.go:3613  [n1,s1,r4/1:/System/{NodeLive…-tsd}] proposing ADD_REPLICA((n3,s3):4): updated=[(n1,s1):1 (n4,s4):2 (n5,s5):3 (n3,s3):4] next=5
I180720 07:13:48.051060 138667 storage/replica_command.go:743  [replicate,n5,s5,r14/2:/Table/1{7-8}] change replicas (REMOVE_REPLICA (n3,s3):4): read existing descriptor r14:/Table/1{7-8} [(n3,s3):4, (n5,s5):2, (n2,s2):3, (n4,s4):5, next=6]
I180720 07:13:48.442847 133646 storage/store.go:3585  [n1,s1,r23/1:{m-/Table/Syste…}] added to replica GC queue (peer suggestion)
I180720 07:13:48.444143 132120 storage/replica_command.go:743  [n1,s1,r23/1:{m-/Table/Syste…}] change replicas (ADD_REPLICA (n5,s5):5): read existing descriptor r23:{m-/Table/SystemConfigSpan/Start} [(n3,s3):4, (n2,s2):2, (n4,s4):3, next=5]
I180720 07:13:48.509093 138670 storage/replica_command.go:743  [replicate,n2,s2,r11/3:/Table/1{4-5}] change replicas (REMOVE_REPLICA (n1,s1):1): read existing descriptor r11:/Table/1{4-5} [(n1,s1):1, (n3,s3):2, (n2,s2):3, (n5,s5):4, next=5]
I180720 07:13:48.659405 133646 storage/store.go:3585  [n1,s1,r23/1:{m-/Table/Syste…}] added to replica GC queue (peer suggestion)
I180720 07:13:48.776968 136987 storage/replica_command.go:743  [replicate,n3,s3,r23/4:{m-/Table/Syste…}] change replicas (REMOVE_REPLICA (n1,s1):1): read existing descriptor r23:{m-/Table/SystemConfigSpan/Start} [(n3,s3):4, (n2,s2):2, (n4,s4):3, next=5]
I180720 07:13:48.853987 138658 storage/replica_raftstorage.go:520  [raftsnapshot,n5,s5,r14/2:/Table/1{7-8}] generated Raft snapshot e1ecb6c7 at index 44
I180720 07:13:48.947060 135509 storage/store.go:3585  [n2,s2,r2/4:/System/{-NodeLive…}] added to replica GC queue (peer suggestion)
I180720 07:13:48.948475 138728 util/stop/stopper.go:473  quiescing; tasks left:
3      node.Node: batch
1      ts.poller: poll
1      storage.intentResolver: processing intents
1      node.Node: writing summary
1      [async] storage.replicate: processing replica
1      [async] storage.raftsnapshot: processing replica
1      [async] kv.TxnCoordSender: heartbeat loop
I180720 07:13:48.971151 138724 util/stop/stopper.go:473  quiescing; tasks left:
8      node.Node: batch
2      [async] kv.TxnCoordSender: heartbeat loop
1      ts.poller: poll
1      [async] transport racer
1      [async] storage.replicate: processing replica
1      [async] storage.replicaGC: processing replica
1      [async] storage.pendingLeaseRequest: requesting lease
I180720 07:13:49.041336 138728 util/stop/stopper.go:473  quiescing; tasks left:
3      node.Node: batch
1      ts.poller: poll
1      storage.intentResolver: processing intents
1      node.Node: writing summary
1      [async] storage.replicate: processing replica
1      [async] storage.raftsnapshot: processing replica
I180720 07:13:49.059761 138725 util/stop/stopper.go:473  quiescing; tasks left:
1      ts.poller: poll
1      storage.intentResolver: processing intents
1      kv.DistSender: sending partial batch
1      [async] storage.replicate: processing replica
1      [async] storage.replicaGC: processing replica
1      [async] kv.TxnCoordSender: heartbeat loop
W180720 07:13:49.063366 134271 storage/raft_transport.go:575  [n1] while processing outgoing Raft queue to node 5: EOF:
W180720 07:13:49.067251 135382 storage/raft_transport.go:575  [n4] while processing outgoing Raft queue to node 5: EOF:
W180720 07:13:49.067314 136554 storage/raft_transport.go:575  [n2] while processing outgoing Raft queue to node 5: EOF:
W180720 07:13:48.999221 138667 internal/client/txn.go:595  [replicate,n5,s5,r14/2:/Table/1{7-8}] failure aborting transaction: node unavailable; try another peer; abort caused by: log-range-event: node unavailable; try another peer
I180720 07:13:49.000439 138727 util/stop/stopper.go:473  quiescing; tasks left:
1      ts.poller: poll
1      storage.intentResolver: processing intents
1      kv.DistSender: sending partial batch
1      [async] storage.replicaGC: processing replica
1      [async] storage.pendingLeaseRequest: requesting lease
1      [async] kv.TxnCoordSender: heartbeat loop
W180720 07:13:49.102305 133786 storage/raft_transport.go:575  [n2] while processing outgoing Raft queue to node 1: rpc error: code = Unavailable desc = transport is closing:
I180720 07:13:49.115624 132135 kv/transport_race.go:67  transport race promotion: ran 24 iterations on up to 3318 requests
I180720 07:13:48.986193 138726 util/stop/stopper.go:473  quiescing; tasks left:
2      node.Node: batch
2      [async] kv.TxnCoordSender: heartbeat loop
1      ts.poller: poll
1      [async] storage.replicate: processing replica
E180720 07:13:49.130973 138667 storage/queue.go:778  [replicate,n5,s5,r14/2:/Table/1{7-8}] change replicas of r14 failed: log-range-event: node unavailable; try another peer
I180720 07:13:49.136542 138728 util/stop/stopper.go:473  quiescing; tasks left:
3      node.Node: batch
1      ts.poller: poll
1      storage.intentResolver: processing intents
1      node.Node: writing summary
1      [async] storage.raftsnapshot: processing replica
W180720 07:13:49.152038 136976 storage/raft_transport.go:575  [n2] while processing outgoing Raft queue to node 3: rpc error: code = Canceled desc = grpc: the client connection is closing:
W180720 07:13:49.183279 132937 gossip/gossip.go:1325  [n2] no incoming or outgoing connections
W180720 07:13:49.185371 136929 storage/raft_transport.go:575  [n4] while processing outgoing Raft queue to node 3: EOF:
W180720 07:13:49.192112 135414 storage/raft_transport.go:575  [n5] while processing outgoing Raft queue to node 4: rpc error: code = Unavailable desc = transport is closing:
I180720 07:13:49.185563 138724 util/stop/stopper.go:473  quiescing; tasks left:
8      node.Node: batch
2      [async] kv.TxnCoordSender: heartbeat loop
1      ts.poller: poll
1      [async] transport racer
1      [async] storage.replicate: processing replica
1      [async] storage.replicaGC: processing replica
I180720 07:13:49.186170 138727 util/stop/stopper.go:473  quiescing; tasks left:
1      ts.poller: poll
1      storage.intentResolver: processing intents
1      kv.DistSender: sending partial batch
1      [async] storage.replicaGC: processing replica
1      [async] storage.pendingLeaseRequest: requesting lease
W180720 07:13:49.192987 133426 storage/raft_transport.go:575  [n1] while processing outgoing Raft queue to node 3: EOF:
W180720 07:13:49.201082 133887 gossip/gossip.go:1325  [n5] no incoming or outgoing connections
W180720 07:13:49.201274 136473 storage/raft_transport.go:575  [n5] while processing outgoing Raft queue to node 3: EOF:
I180720 07:13:49.214073 138725 util/stop/stopper.go:473  quiescing; tasks left:
1      ts.poller: poll
1      storage.intentResolver: processing intents
1      [async] storage.replicate: processing replica
1      [async] storage.replicaGC: processing replica
1      [async] kv.TxnCoordSender: heartbeat loop
W180720 07:13:49.222707 138669 storage/intent_resolver.go:710  [n2,s2] failed to cleanup transaction intents: failed to resolve intents: failed to send RPC: sending to all 3 replicas failed; last error: <nil> context canceled
W180720 07:13:49.231333 136470 storage/raft_transport.go:575  [n5] while processing outgoing Raft queue to node 2: rpc error: code = Canceled desc = grpc: the client connection is closing:
W180720 07:13:49.234525 138670 internal/client/txn.go:595  [replicate,n2,s2,r11/3:/Table/1{4-5}] failure aborting transaction: node unavailable; try another peer; abort caused by: log-range-event: failed to send RPC: sending to all 3 replicas failed; last error: <nil> rpc error: code = Unavailable desc = transport is closing
I180720 07:13:49.240059 138726 util/stop/stopper.go:473  quiescing; tasks left:
2      node.Node: batch
1      ts.poller: poll
1      [async] storage.replicate: processing replica
1      [async] kv.TxnCoordSender: heartbeat loop
W180720 07:13:49.248637 134339 storage/raft_transport.go:575  [n5] while processing outgoing Raft queue to node 1: rpc error: code = Canceled desc = context canceled:
I180720 07:13:49.249127 138727 util/stop/stopper.go:473  quiescing; tasks left:
1      ts.poller: poll
1      storage.intentResolver: processing intents
1      kv.DistSender: sending partial batch
1      [async] storage.replicaGC: processing replica
I180720 07:13:49.251464 138724 util/stop/stopper.go:473  quiescing; tasks left:
6      node.Node: batch
2      [async] kv.TxnCoordSender: heartbeat loop
1      ts.poller: poll
1      [async] storage.replicate: processing replica
1      [async] storage.replicaGC: processing replica
W180720 07:13:49.265964 133851 storage/raft_transport.go:575  [n4] while processing outgoing Raft queue to node 1: EOF:
W180720 07:13:49.267631 133925 storage/raft_transport.go:575  [n1] while processing outgoing Raft queue to node 4: EOF:
W180720 07:13:49.307277 133580 internal/client/txn.go:595  [n4,hb] failure aborting transaction: node unavailable; try another peer; abort caused by: result is ambiguous (error=context canceled [exhausted])
W180720 07:13:49.327812 136630 storage/raft_transport.go:575  [n3] while processing outgoing Raft queue to node 5: EOF:
W180720 07:13:49.336205 138615 storage/intent_resolver.go:710  [n5,s5] failed to cleanup transaction intents: could not GC completed transaction anchored at /Local/Range/Table/17/RangeDescriptor: node unavailable; try another peer
I180720 07:13:49.336737 138728 util/stop/stopper.go:473  quiescing; tasks left:
2      node.Node: batch
1      ts.poller: poll
1      storage.intentResolver: processing intents
1      node.Node: writing summary
1      [async] storage.raftsnapshot: processing replica
I180720 07:13:49.341298 138725 util/stop/stopper.go:473  quiescing; tasks left:
1      ts.poller: poll
1      [async] storage.replicate: processing replica
1      [async] storage.replicaGC: processing replica
W180720 07:13:49.311767 133341 gossip/gossip.go:1325  [n4] no incoming or outgoing connections
E180720 07:13:49.361630 138670 storage/queue.go:778  [replicate,n2,s2,r11/3:/Table/1{4-5}] change replicas of r11 failed: log-range-event: failed to send RPC: sending to all 3 replicas failed; last error: <nil> rpc error: code = Unavailable desc = transport is closing
W180720 07:13:49.361979 133644 storage/raft_transport.go:575  [n1] while processing outgoing Raft queue to node 2: EOF:
W180720 07:13:49.375059 133524 storage/raft_transport.go:575  [n3] while processing outgoing Raft queue to node 1: EOF:
W180720 07:13:49.383601 133897 ts/db.go:171  [n5,ts-poll] error writing time series data: failed to send RPC: sending to all 3 replicas failed; last error: <nil> context canceled
E180720 07:13:49.443110 134003 sql/jobs/registry.go:285  error while adopting jobs: adopt-job: failed to send RPC: sending to all 3 replicas failed; last error: <nil> context canceled
W180720 07:13:49.445911 133212 gossip/gossip.go:1325  [n3] no incoming or outgoing connections
I180720 07:13:49.449898 138726 util/stop/stopper.go:473  quiescing; tasks left:
1      ts.poller: poll
1      node.Node: batch
1      [async] storage.replicate: processing replica
E180720 07:13:49.456521 138715 storage/queue.go:778  [replicaGC,n1,s1,r23/1:{m-/Table/Syste…}] failed to send RPC: sending to all 3 replicas failed; last error: <nil> context canceled
W180720 07:13:49.461249 136881 storage/raft_transport.go:575  [n3] while processing outgoing Raft queue to node 2: rpc error: code = Canceled desc = grpc: the client connection is closing:
I180720 07:13:49.473491 133580 storage/node_liveness.go:810  [n4,hb] retrying liveness update after storage.errRetryLiveness: result is ambiguous (error=context canceled [exhausted])
E180720 07:13:49.476926 138635 storage/queue.go:778  [replicaGC,n4,s4,r17/4:/Table/2{0-1}] failed to send RPC: sending to all 3 replicas failed; last error: <nil> context canceled
W180720 07:13:49.478174 133898 server/node.go:863  [n5,summaries] error recording status summaries: failed to send RPC: sending to all 3 replicas failed; last error: <nil> context canceled
W180720 07:13:49.482974 135407 storage/raft_transport.go:575  [n4] while processing outgoing Raft queue to node 2: EOF:
W180720 07:13:49.492946 132383 internal/client/txn.go:595  [n1,hb] failure aborting transaction: node unavailable; try another peer; abort caused by: result is ambiguous (error=rpc error: code = Unavailable desc = transport is closing [exhausted])
W180720 07:13:49.499801 133051 storage/store.go:3710  [n2,s2] handle raft ready: 0.5s [processed=0]
W180720 07:13:49.500450 138301 storage/replica.go:3097  [n1,s1,r4/1:/System/{NodeLive…-tsd}] during async intent resolution: node unavailable; try another peer
I180720 07:13:49.500964 138724 util/stop/stopper.go:473  quiescing; tasks left:
2      node.Node: batch
1      ts.poller: poll
1      [async] storage.replicate: processing replica
1      [async] storage.replicaGC: processing replica
1      [async] kv.TxnCoordSender: heartbeat loop
I180720 07:13:49.522223 138727 util/stop/stopper.go:473  quiescing; tasks left:
1      ts.poller: poll
1      storage.intentResolver: processing intents
1      [async] storage.replicaGC: processing replica
I180720 07:13:49.528368 132374 server/status/runtime.go:255  [n1] runtime stats: 1.5 GiB RSS, 795 goroutines, 64 MiB/36 MiB/132 MiB GO alloc/idle/total, 145 MiB/203 MiB CGO alloc/total, 1092.81cgo/sec, 0.91/0.05 %(u/s)time, 0.02 %gc (4x)
W180720 07:13:49.528770 136882 storage/raft_transport.go:575  [n3] while processing outgoing Raft queue to node 4: rpc error: code = Canceled desc = context canceled:
I180720 07:13:49.529187 138725 util/stop/stopper.go:473  quiescing; tasks left:
1      ts.poller: poll
W180720 07:13:49.531315 138690 storage/intent_resolver.go:710  [n4,s4] failed to cleanup transaction intents: failed to resolve intents: failed to send RPC: sending to all 3 replicas failed; last error: <nil> context canceled
W180720 07:13:49.531718 133572 ts/db.go:171  [n4,ts-poll] error writing time series data: failed to send RPC: sending to all 2 replicas failed; last error: (err: node unavailable; try another peer) <nil>
W180720 07:13:49.531940 135433 storage/raft_transport.go:575  [n2] while processing outgoing Raft queue to node 4: rpc error: code = Canceled desc = grpc: the client connection is closing:
I180720 07:13:49.538978 132383 storage/node_liveness.go:810  [n1,hb] retrying liveness update after storage.errRetryLiveness: result is ambiguous (error=rpc error: code = Unavailable desc = transport is closing [exhausted])
I180720 07:13:49.541674 138727 util/stop/stopper.go:473  quiescing; tasks left:
1      ts.poller: poll
1      storage.intentResolver: processing intents
W180720 07:13:49.549398 133572 ts/db.go:174  [n4,ts-poll] node unavailable; try another peer
I180720 07:13:49.555233 138728 util/stop/stopper.go:473  quiescing; tasks left:
2      node.Node: batch
1      ts.poller: poll
1      storage.intentResolver: processing intents
1      [async] storage.raftsnapshot: processing replica
W180720 07:13:49.555572 133897 ts/db.go:174  [n5,ts-poll] node unavailable; try another peer
I180720 07:13:49.555880 138728 util/stop/stopper.go:473  quiescing; tasks left:
2      node.Node: batch
1      storage.intentResolver: processing intents
1      [async] storage.raftsnapshot: processing replica
I180720 07:13:49.557517 138728 util/stop/stopper.go:473  quiescing; tasks left:
2      node.Node: batch
1      [async] storage.raftsnapshot: processing replica
I180720 07:13:49.558335 138728 util/stop/stopper.go:473  quiescing; tasks left:
1      node.Node: batch
1      [async] storage.raftsnapshot: processing replica
I180720 07:13:49.559103 138724 util/stop/stopper.go:473  quiescing; tasks left:
2      node.Node: batch
1      ts.poller: poll
1      [async] storage.replicate: processing replica
1      [async] kv.TxnCoordSender: heartbeat loop
I180720 07:13:49.565310 138724 util/stop/stopper.go:473  quiescing; tasks left:
2      node.Node: batch
1      ts.poller: poll
1      [async] storage.replicate: processing replica
W180720 07:13:49.589808 133031 ts/db.go:171  [n2,ts-poll] error writing time series data: node unavailable; try another peer
W180720 07:13:49.590153 133031 ts/db.go:174  [n2,ts-poll] node unavailable; try another peer
I180720 07:13:49.607158 138728 util/stop/stopper.go:473  quiescing; tasks left:
1      [async] storage.raftsnapshot: processing replica
I180720 07:13:49.608889 138724 util/stop/stopper.go:473  quiescing; tasks left:
1      ts.poller: poll
1      node.Node: batch
W180720 07:13:49.609681 136987 internal/client/txn.go:595  [replicate,n3,s3,r23/4:{m-/Table/Syste…}] failure aborting transaction: node unavailable; try another peer; abort caused by: node unavailable; try another peer
E180720 07:13:49.610658 136987 storage/queue.go:778  [replicate,n3,s3,r23/4:{m-/Table/Syste…}] change replicas of r23 failed: node unavailable; try another peer
I180720 07:13:49.618879 138726 util/stop/stopper.go:473  quiescing; tasks left:
1      ts.poller: poll
1      node.Node: batch
W180720 07:13:49.626044 133039 internal/client/txn.go:595  [n2,hb] failure aborting transaction: kv/txn_interceptor_intent_collector.go:102: cannot commit a read-only transaction; abort caused by: node unavailable; try another peer
W180720 07:13:49.626441 133039 storage/node_liveness.go:487  [n2,hb] failed node liveness heartbeat: node unavailable; try another peer
E180720 07:13:49.626656 133073 storage/replica_proposal.go:758  [n2,s2,r2/4:/System/{-NodeLive…}] unable to add to replica GC queue: queue stopped
W180720 07:13:49.627471 133292 gossip/infostore.go:303  [n3] node unavailable; try another peer
I180720 07:13:49.636918 138726 util/stop/stopper.go:473  quiescing; tasks left:
1      ts.poller: poll
W180720 07:13:49.645268 133580 internal/client/txn.go:595  [n4,hb] failure aborting transaction: kv/txn_interceptor_intent_collector.go:102: cannot commit a read-only transaction; abort caused by: node unavailable; try another peer
W180720 07:13:49.645710 133580 storage/node_liveness.go:551  [n4,hb] slow heartbeat took 1.7s
W180720 07:13:49.645788 133580 storage/node_liveness.go:487  [n4,hb] failed node liveness heartbeat: node unavailable; try another peer
W180720 07:13:49.646780 132383 internal/client/txn.go:595  [n1,hb] failure aborting transaction: kv/txn_interceptor_intent_collector.go:102: cannot commit a read-only transaction; abort caused by: node unavailable; try another peer
W180720 07:13:49.647152 132383 storage/node_liveness.go:551  [n1,hb] slow heartbeat took 2.1s
W180720 07:13:49.647252 132383 storage/node_liveness.go:487  [n1,hb] failed node liveness heartbeat: node unavailable; try another peer
E180720 07:13:49.659030 138658 storage/queue.go:778  [raftsnapshot,n5,s5,r14/2:/Table/1{7-8}] snapshot failed: rpc error: code = Unknown desc = node unavailable; try another peer
W180720 07:13:49.705077 132375 ts/db.go:171  [n1,ts-poll] error writing time series data: node unavailable; try another peer
W180720 07:13:49.709766 132375 ts/db.go:174  [n1,ts-poll] node unavailable; try another peer
I180720 07:13:49.710006 138724 util/stop/stopper.go:473  quiescing; tasks left:
1      node.Node: batch
W180720 07:13:49.749153 133292 gossip/infostore.go:303  [n3] node unavailable; try another peer
W180720 07:13:49.891264 134132 rpc/nodedialer/nodedialer.go:87  [n5] unable to connect to n3: unable to dial n3: breaker open
W180720 07:13:49.936072 133272 ts/db.go:171  [n3,ts-poll] error writing time series data: node unavailable; try another peer
W180720 07:13:49.936400 133272 ts/db.go:174  [n3,ts-poll] node unavailable; try another peer
I180720 07:13:50.026326 133030 server/status/runtime.go:255  [n2] runtime stats: 1.4 GiB RSS, 722 goroutines, 67 MiB/35 MiB/132 MiB GO alloc/idle/total, 140 MiB/198 MiB CGO alloc/total, 1085.57cgo/sec, 0.90/0.06 %(u/s)time, 0.02 %gc (3x)
--- FAIL: TestReplicateQueueDownReplicate (111.77s)
	soon.go:37: condition failed to evaluate within 45s: AdminChangeReplicas error: change replicas of r23 failed: descriptor changed: [expected] r23:{m-/Table/SystemConfigSpan/Start} [(n1,s1):1, (n2,s2):2, (n4,s4):3, (n3,s3):4, next=5] != [actual] r23:{m-/Table/SystemConfigSpan/Start} [(n3,s3):4, (n2,s2):2, (n4,s4):3, next=5]: unexpected value: raw_bytes:"\301D\017\305\003\010\027\022\001m\032\001\210\"\006\010\003\020\003\030\004\"\006\010\002\020\002\030\002\"\006\010\004\020\004\030\003(\005" timestamp:<wall_time:1532070808296102263 logical:1 > 
		goroutine 132120 [running]:
		runtime/debug.Stack(0xa7a358200, 0xc422993f20, 0x38c9b20)
			/usr/local/go/src/runtime/debug/stack.go:24 +0xb5
		github.com/cockroachdb/cockroach/pkg/testutils.SucceedsSoon(0x3921ba0, 0xc4218be3c0, 0xc425331f20)
			/go/src/github.com/cockroachdb/cockroach/pkg/testutils/soon.go:38 +0x93
		github.com/cockroachdb/cockroach/pkg/storage_test.TestReplicateQueueDownReplicate(0xc4218be3c0)
			/go/src/github.com/cockroachdb/cockroach/pkg/storage/replicate_queue_test.go:218 +0x4f7
		testing.tRunner(0xc4218be3c0, 0x32249e8)
			/usr/local/go/src/testing/testing.go:777 +0x16e
		created by testing.(*T).Run
			/usr/local/go/src/testing/testing.go:824 +0x565

The text was updated successfully, but these errors were encountered:

vivekmenezes · 2018-07-23T02:07:10Z

same as #27736

tbg · 2018-07-23T07:29:15Z

No, different failure.

cockroach-teamcity · 2018-07-23T08:01:59Z

SHA: https://github.com/cockroachdb/cockroach/commits/54a811d075f9bf46de57a5de4ab40e523193e677

Parameters:

TAGS=
GOFLAGS=-race

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=791983&tab=buildLog

tbg · 2018-07-23T18:42:33Z

--- FAIL: TestReplicateQueueDownReplicate (22.02s)
	test_server_shim.go:176: had 21 ranges at startup, expected 22

latest iteration is again this.

tbg · 2018-07-23T20:23:37Z

no luck on teamcity agent, at least not with the default 100 runs of the package.

100 runs completed, 0 failures, over 22m13s
SUCCESS
ok  	github.com/cockroachdb/cockroach/pkg/storage	1332.534s

cockroach-teamcity · 2018-07-25T07:11:49Z

SHA: https://github.com/cockroachdb/cockroach/commits/62f3b753d49758b22203c558b8682b394773f732

Parameters:

TAGS=
GOFLAGS=-race

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=795912&tab=buildLog

cockroach-teamcity · 2018-07-31T05:53:09Z

SHA: https://github.com/cockroachdb/cockroach/commits/dda801b21f2668537221f58d06a9960d2c05555f

Parameters:

TAGS=
GOFLAGS=-race

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=806649&tab=buildLog

andreimatei · 2018-08-02T21:40:50Z

last failure is a data race; opened #28222

andreimatei · 2018-08-02T21:42:11Z

Other failures haven't happened in 9 days; before that they seemed to happed more frequently. Can't repro on GCE. Closing and hoping.

cockroach-teamcity added this to the 2.1 milestone Jul 20, 2018

cockroach-teamcity assigned vivekmenezes Jul 20, 2018

cockroach-teamcity added C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. labels Jul 20, 2018

tbg added the A-kv-replication Relating to Raft, consensus, and coordination. label Jul 22, 2018

vivekmenezes closed this as completed Jul 23, 2018

vivekmenezes removed their assignment Jul 23, 2018

tbg reopened this Jul 23, 2018

benesch changed the title ~~github.com/cockroachdb/cockroach/pkg/storage: TestReplicateQueueDownReplicate failed under stress~~ storage: TestReplicateQueueDownReplicate failed under stress Jul 23, 2018

tbg self-assigned this Jul 23, 2018

tbg assigned andreimatei and unassigned tbg Jul 24, 2018

andreimatei mentioned this issue Aug 2, 2018

datarace in LeaseManager exposed by TestReplicateQueueDownReplicate #28222

Closed

andreimatei closed this as completed Aug 2, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage: TestReplicateQueueDownReplicate failed under stress #27783

storage: TestReplicateQueueDownReplicate failed under stress #27783

cockroach-teamcity commented Jul 20, 2018

vivekmenezes commented Jul 23, 2018

tbg commented Jul 23, 2018

cockroach-teamcity commented Jul 23, 2018

tbg commented Jul 23, 2018

tbg commented Jul 23, 2018 •

edited

Loading

cockroach-teamcity commented Jul 25, 2018

cockroach-teamcity commented Jul 31, 2018

andreimatei commented Aug 2, 2018

andreimatei commented Aug 2, 2018

storage: TestReplicateQueueDownReplicate failed under stress #27783

storage: TestReplicateQueueDownReplicate failed under stress #27783

Comments

cockroach-teamcity commented Jul 20, 2018

vivekmenezes commented Jul 23, 2018

tbg commented Jul 23, 2018

cockroach-teamcity commented Jul 23, 2018

tbg commented Jul 23, 2018

tbg commented Jul 23, 2018 • edited Loading

cockroach-teamcity commented Jul 25, 2018

cockroach-teamcity commented Jul 31, 2018

andreimatei commented Aug 2, 2018

andreimatei commented Aug 2, 2018

tbg commented Jul 23, 2018 •

edited

Loading