Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: TestReplicateQueueRebalance failed under stress #28563

Closed
cockroach-teamcity opened this issue Aug 14, 2018 · 11 comments
Closed

storage: TestReplicateQueueRebalance failed under stress #28563

cockroach-teamcity opened this issue Aug 14, 2018 · 11 comments
Assignees
Labels
A-kv-client Relating to the KV client and the KV interface. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot.
Milestone

Comments

@cockroach-teamcity
Copy link
Member

SHA: https://github.com/cockroachdb/cockroach/commits/97d734c6f9acf9a673ce722ddcec2077f9fc30e6

Parameters:

TAGS=
GOFLAGS=-race

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=834674&tab=buildLog

I180814 05:52:35.354784 136234 storage/replica.go:3711  [n1,s1,r3/1:/System/NodeLiveness{-Max}] proposing ADD_REPLICA((n2,s2):4): updated=[(n1,s1):1 (n3,s3):2 (n5,s5):3 (n2,s2):4] next=5
I180814 05:52:35.499077 136434 storage/replica_range_lease.go:554  [replicate,n1,s1,r3/1:/System/NodeLiveness{-Max}] transferring lease to s3
I180814 05:52:35.541435 133819 gossip/gossip.go:518  [n4] gossip status (ok, 5 nodes)
gossip client (1/3 cur/max conns)
  1: 127.0.0.1:35453 (1m2s: infos 20/757 sent/received, bytes 7062B/255621B sent/received)
gossip server (0/3 cur/max conns, infos 20/757 sent/received, bytes 7062B/255621B sent/received)
I180814 05:52:35.669529 133879 server/status/runtime.go:424  [n4] runtime stats: 1.4 GiB RSS, 1253 goroutines, 79 MiB/31 MiB/140 MiB GO alloc/idle/total, 175 MiB/219 MiB CGO alloc/total, 418.47cgo/sec, 0.92/0.06 %(u/s)time, 0.02 %gc (4x)
I180814 05:52:35.832296 136434 storage/replica_range_lease.go:617  [replicate,n1,s1,r3/1:/System/NodeLiveness{-Max}] done transferring lease to s3: <nil>
I180814 05:52:35.890642 136579 storage/replica_raftstorage.go:521  [replicate,n1,s1,r22/1:/Table/5{1-2}] generated preemptive snapshot 521c8dc1 at index 25
I180814 05:52:35.946755 136579 storage/store_snapshot.go:641  [replicate,n1,s1,r22/1:/Table/5{1-2}] streamed snapshot to (n4,s4):?: kv pairs: 10, log entries: 15, rate-limit: 2.0 MiB/sec, 55ms
I180814 05:52:35.976274 133881 server/status/recorder.go:652  [n4,summaries] available memory from cgroups (8.0 EiB) exceeds system memory 16 GiB, using system memory
I180814 05:52:36.134966 132407 storage/replicate_queue_test.go:95  not balanced: [23 19 17 9 11]
I180814 05:52:36.135419 132407 storage/replicate_queue_test.go:90  SucceedsSoon: not balanced: [23 19 17 9 11]
I180814 05:52:36.145329 136582 storage/replica_raftstorage.go:758  [n4,s4,r22/?:{-}] applying preemptive snapshot at index 25 (id=521c8dc1, encoded size=4702, 1 rocksdb batches, 15 log entries)
I180814 05:52:36.378206 136582 storage/replica_raftstorage.go:764  [n4,s4,r22/?:/Table/5{1-2}] applied preemptive snapshot in 232ms [clear=0ms batch=0ms entries=219ms commit=1ms]
I180814 05:52:36.461785 136579 storage/replica_command.go:750  [replicate,n1,s1,r22/1:/Table/5{1-2}] change replicas (ADD_REPLICA (n4,s4):4): read existing descriptor r22:/Table/5{1-2} [(n1,s1):1, (n2,s2):2, (n3,s3):3, next=4, gen=842430730128]
I180814 05:52:37.230744 132407 storage/replicate_queue_test.go:95  not balanced: [23 19 17 9 11]
I180814 05:52:37.231182 132407 storage/replicate_queue_test.go:90  SucceedsSoon: not balanced: [23 19 17 9 11]
I180814 05:52:38.253965 132407 storage/replicate_queue_test.go:95  not balanced: [23 19 17 9 11]
I180814 05:52:38.254252 132407 storage/replicate_queue_test.go:90  SucceedsSoon: not balanced: [23 19 17 9 11]
I180814 05:52:38.453390 133674 rpc/nodedialer/nodedialer.go:92  [intExec=adopt-job] connection to n1 established
I180814 05:52:38.597214 136562 rpc/nodedialer/nodedialer.go:92  [n1] connection to n4 established
W180814 05:52:38.745272 133307 storage/node_liveness.go:558  [n2,hb] slow heartbeat took 1.8s
I180814 05:52:39.085726 134438 server/status/runtime.go:424  [n5] runtime stats: 1.5 GiB RSS, 1263 goroutines, 86 MiB/25 MiB/140 MiB GO alloc/idle/total, 176 MiB/219 MiB CGO alloc/total, 368.99cgo/sec, 0.93/0.04 %(u/s)time, 0.03 %gc (3x)
I180814 05:52:39.123470 132247 server/status/runtime.go:424  [n1] runtime stats: 1.5 GiB RSS, 1263 goroutines, 87 MiB/24 MiB/140 MiB GO alloc/idle/total, 176 MiB/219 MiB CGO alloc/total, 367.96cgo/sec, 0.93/0.04 %(u/s)time, 0.03 %gc (3x)
I180814 05:52:39.189092 134435 gossip/gossip.go:518  [n5] gossip status (ok, 5 nodes)
gossip client (1/3 cur/max conns)
  3: 127.0.0.1:33753 (1m0s: infos 63/852 sent/received, bytes 25345B/309393B sent/received)
gossip server (0/3 cur/max conns, infos 64/877 sent/received, bytes 25645B/320638B sent/received)
I180814 05:52:39.349198 132407 storage/replicate_queue_test.go:95  not balanced: [23 19 17 9 11]
I180814 05:52:39.349568 132407 storage/replicate_queue_test.go:90  SucceedsSoon: not balanced: [23 19 17 9 11]
I180814 05:52:39.395525 136572 storage/replica_command.go:750  [replicate,n3,s3,r6/3:/{System/tse-Table/System…}] change replicas (REMOVE_REPLICA (n1,s1):1): read existing descriptor r6:/{System/tse-Table/SystemConfigSpan/Start} [(n1,s1):1, (n2,s2):2, (n3,s3):3, (n4,s4):4, next=5, gen=842398966824]
I180814 05:52:39.470257 136579 storage/replica.go:3711  [n1,s1,r22/1:/Table/5{1-2}] proposing ADD_REPLICA((n4,s4):4): updated=[(n1,s1):1 (n2,s2):2 (n3,s3):3 (n4,s4):4] next=5
I180814 05:52:39.480661 133266 server/status/runtime.go:424  [n2] runtime stats: 1.5 GiB RSS, 1263 goroutines, 95 MiB/16 MiB/140 MiB GO alloc/idle/total, 176 MiB/219 MiB CGO alloc/total, 349.85cgo/sec, 0.93/0.04 %(u/s)time, 0.03 %gc (3x)
W180814 05:52:40.114342 133983 storage/node_liveness.go:558  [n4,hb] slow heartbeat took 1.5s
I180814 05:52:40.152927 132249 server/status/recorder.go:652  [n1,summaries] available memory from cgroups (8.0 EiB) exceeds system memory 16 GiB, using system memory
I180814 05:52:40.197691 134440 server/status/recorder.go:652  [n5,summaries] available memory from cgroups (8.0 EiB) exceeds system memory 16 GiB, using system memory
I180814 05:52:40.311508 133300 server/status/recorder.go:652  [n2,summaries] available memory from cgroups (8.0 EiB) exceeds system memory 16 GiB, using system memory
I180814 05:52:40.322736 136622 storage/replica_range_lease.go:554  [replicate,n1,s1,r22/1:/Table/5{1-2}] transferring lease to s2
I180814 05:52:40.364157 132407 storage/replicate_queue_test.go:95  not balanced: [23 19 17 9 11]
I180814 05:52:40.364412 132407 storage/replicate_queue_test.go:90  SucceedsSoon: not balanced: [23 19 17 9 11]
I180814 05:52:41.088202 133193 storage/replica_proposal.go:214  [n2,s2,r22/2:/Table/5{1-2}] new range lease repl=(n2,s2):2 seq=4 start=1534225960.364111112,0 epo=1 pro=1534225960.364196001,0 following repl=(n1,s1):1 seq=3 start=1534225868.888551998,0 epo=1 pro=1534225877.645342106,0
I180814 05:52:41.347581 136572 storage/replica.go:3711  [n3,s3,r6/3:/{System/tse-Table/System…}] proposing REMOVE_REPLICA((n1,s1):1): updated=[(n4,s4):4 (n2,s2):2 (n3,s3):3] next=5
I180814 05:52:41.374285 132407 storage/replicate_queue_test.go:95  not balanced: [23 19 17 9 11]
I180814 05:52:41.374568 132407 storage/replicate_queue_test.go:90  SucceedsSoon: not balanced: [23 19 17 9 11]
W180814 05:52:41.625399 133594 storage/node_liveness.go:558  [n3,hb] slow heartbeat took 1.8s
I180814 05:52:41.641051 136622 storage/replica_range_lease.go:617  [replicate,n1,s1,r22/1:/Table/5{1-2}] done transferring lease to s2: <nil>
I180814 05:52:41.695911 136773 storage/replica_raftstorage.go:521  [replicate,n1,s1,r23/1:/Table/5{2-3}] generated preemptive snapshot 65ea74da at index 23
I180814 05:52:42.148567 136773 storage/store_snapshot.go:641  [replicate,n1,s1,r23/1:/Table/5{2-3}] streamed snapshot to (n4,s4):?: kv pairs: 8, log entries: 13, rate-limit: 2.0 MiB/sec, 311ms
I180814 05:52:42.155648 136717 storage/replica_raftstorage.go:521  [replicate,n3,s3,r6/3:/{System/tse-Table/System…}] generated preemptive snapshot 7075307d at index 41
I180814 05:52:42.157290 133799 storage/store.go:3618  [n1,s1,r6/1:/{System/tse-Table/System…}] added to replica GC queue (peer suggestion)
I180814 05:52:42.158087 133799 storage/store.go:3618  [n1,s1,r6/1:/{System/tse-Table/System…}] added to replica GC queue (peer suggestion)
I180814 05:52:42.183932 136777 storage/replica_raftstorage.go:758  [n4,s4,r23/?:{-}] applying preemptive snapshot at index 23 (id=65ea74da, encoded size=3144, 1 rocksdb batches, 13 log entries)
I180814 05:52:42.388376 136777 storage/replica_raftstorage.go:764  [n4,s4,r23/?:/Table/5{2-3}] applied preemptive snapshot in 204ms [clear=0ms batch=0ms entries=149ms commit=53ms]
I180814 05:52:42.462911 132407 storage/replicate_queue_test.go:95  not balanced: [23 19 17 10 11]
I180814 05:52:42.463220 132407 storage/replicate_queue_test.go:90  SucceedsSoon: not balanced: [23 19 17 10 11]
I180814 05:52:42.498384 134113 rpc/nodedialer/nodedialer.go:92  [intExec=adopt-job] connection to n1 established
W180814 05:52:42.699603 132588 storage/store.go:3743  [n1,s1] handle raft ready: 0.6s [processed=0]
I180814 05:52:42.757538 133799 storage/store.go:3618  [n1,s1,r6/1:/{System/tse-Table/System…}] added to replica GC queue (peer suggestion)
I180814 05:52:42.773040 136717 storage/store_snapshot.go:641  [replicate,n3,s3,r6/3:/{System/tse-Table/System…}] streamed snapshot to (n5,s5):?: kv pairs: 14, log entries: 31, rate-limit: 2.0 MiB/sec, 616ms
I180814 05:52:42.826152 133585 server/status/runtime.go:424  [n3] runtime stats: 1.5 GiB RSS, 1275 goroutines, 100 MiB/13 MiB/140 MiB GO alloc/idle/total, 177 MiB/220 MiB CGO alloc/total, 332.55cgo/sec, 0.93/0.04 %(u/s)time, 0.03 %gc (4x)
I180814 05:52:42.924964 136773 storage/replica_command.go:750  [replicate,n1,s1,r23/1:/Table/5{2-3}] change replicas (ADD_REPLICA (n4,s4):4): read existing descriptor r23:/Table/5{2-3} [(n1,s1):1, (n2,s2):2, (n3,s3):3, next=4, gen=842350800576]
I180814 05:52:42.946100 136781 rpc/nodedialer/nodedialer.go:92  [n1] connection to n5 established
I180814 05:52:42.996743 136810 storage/replica_raftstorage.go:758  [n5,s5,r6/?:{-}] applying preemptive snapshot at index 41 (id=7075307d, encoded size=11010, 1 rocksdb batches, 31 log entries)
I180814 05:52:43.055059 133587 server/status/recorder.go:652  [n3,summaries] available memory from cgroups (8.0 EiB) exceeds system memory 16 GiB, using system memory
I180814 05:52:43.470678 132407 storage/replicate_queue_test.go:95  not balanced: [23 19 17 10 12]
I180814 05:52:43.471038 132407 storage/replicate_queue_test.go:90  SucceedsSoon: not balanced: [23 19 17 10 12]
W180814 05:52:43.520355 133307 storage/node_liveness.go:558  [n2,hb] slow heartbeat took 2.0s
W180814 05:52:43.524943 132544 storage/store.go:3743  [n1,s1] handle raft ready: 0.6s [processed=1]
I180814 05:52:43.607170 136763 storage/store.go:2555  [replicaGC,n1,s1,r6/1:/{System/tse-Table/System…}] removing replica
I180814 05:52:43.770643 136763 storage/replica.go:858  [replicaGC,n1,s1,r6/1:/{System/tse-Table/System…}] removed 8 (0+8) keys in 145ms [clear=0ms commit=144ms]
W180814 05:52:43.907534 134330 storage/node_liveness.go:558  [n5,hb] slow heartbeat took 1.8s
W180814 05:52:44.360836 133983 storage/node_liveness.go:558  [n4,hb] slow heartbeat took 1.2s
I180814 05:52:44.478434 132407 storage/replicate_queue_test.go:95  not balanced: [22 19 17 10 12]
I180814 05:52:44.478790 132407 storage/replicate_queue_test.go:90  SucceedsSoon: not balanced: [22 19 17 10 12]
I180814 05:52:44.788503 136810 storage/replica_raftstorage.go:764  [n5,s5,r6/?:/{System/tse-Table/System…}] applied preemptive snapshot in 1789ms [clear=0ms batch=0ms entries=1783ms commit=5ms]
I180814 05:52:44.804333 136717 storage/replica_command.go:750  [replicate,n3,s3,r6/3:/{System/tse-Table/System…}] change replicas (ADD_REPLICA (n5,s5):5): read existing descriptor r6:/{System/tse-Table/SystemConfigSpan/Start} [(n4,s4):4, (n2,s2):2, (n3,s3):3, next=5, gen=842434270344]
I180814 05:52:44.863800 136773 storage/replica.go:3711  [n1,s1,r23/1:/Table/5{2-3}] proposing ADD_REPLICA((n4,s4):4): updated=[(n1,s1):1 (n2,s2):2 (n3,s3):3 (n4,s4):4] next=5
I180814 05:52:45.161934 136817 storage/replica_range_lease.go:554  [replicate,n1,s1,r23/1:/Table/5{2-3}] transferring lease to s2
I180814 05:52:45.484469 132407 storage/replicate_queue_test.go:95  not balanced: [22 19 17 10 12]
I180814 05:52:45.484803 132407 storage/replicate_queue_test.go:90  SucceedsSoon: not balanced: [22 19 17 10 12]
I180814 05:52:45.588254 133879 server/status/runtime.go:424  [n4] runtime stats: 1.5 GiB RSS, 1265 goroutines, 77 MiB/36 MiB/140 MiB GO alloc/idle/total, 176 MiB/220 MiB CGO alloc/total, 348.73cgo/sec, 0.93/0.05 %(u/s)time, 0.03 %gc (4x)
I180814 05:52:45.701208 133214 storage/replica_proposal.go:214  [n2,s2,r23/2:/Table/5{2-3}] new range lease repl=(n2,s2):2 seq=4 start=1534225965.162225171,0 epo=1 pro=1534225965.162247460,0 following repl=(n1,s1):1 seq=3 start=1534225868.888551998,0 epo=1 pro=1534225877.645342106,0
I180814 05:52:45.765574 136817 storage/replica_range_lease.go:617  [replicate,n1,s1,r23/1:/Table/5{2-3}] done transferring lease to s2: <nil>
I180814 05:52:46.057402 136878 storage/replica_raftstorage.go:521  [replicate,n1,s1,r19/1:/Table/2{2-3}] generated preemptive snapshot 0933dd4e at index 24
I180814 05:52:46.078490 133881 server/status/recorder.go:652  [n4,summaries] available memory from cgroups (8.0 EiB) exceeds system memory 16 GiB, using system memory
I180814 05:52:46.241206 136878 storage/store_snapshot.go:641  [replicate,n1,s1,r19/1:/Table/2{2-3}] streamed snapshot to (n4,s4):?: kv pairs: 10, log entries: 14, rate-limit: 2.0 MiB/sec, 182ms
I180814 05:52:46.371849 136980 storage/replica_raftstorage.go:758  [n4,s4,r19/?:{-}] applying preemptive snapshot at index 24 (id=0933dd4e, encoded size=4311, 1 rocksdb batches, 14 log entries)
I180814 05:52:46.393172 136980 storage/replica_raftstorage.go:764  [n4,s4,r19/?:/Table/2{2-3}] applied preemptive snapshot in 21ms [clear=0ms batch=0ms entries=5ms commit=15ms]
I180814 05:52:46.456086 136878 storage/replica_command.go:750  [replicate,n1,s1,r19/1:/Table/2{2-3}] change replicas (ADD_REPLICA (n4,s4):4): read existing descriptor r19:/Table/2{2-3} [(n1,s1):1, (n5,s5):2, (n3,s3):3, next=4, gen=842375776464]
I180814 05:52:46.587546 132407 storage/replicate_queue_test.go:95  not balanced: [22 19 17 11 12]
I180814 05:52:46.670800 132407 storage/replicate_queue_test.go:90  SucceedsSoon: not balanced: [22 19 17 11 12]
I180814 05:52:46.614889 136717 storage/replica.go:3711  [n3,s3,r6/3:/{System/tse-Table/System…}] proposing ADD_REPLICA((n5,s5):5): updated=[(n4,s4):4 (n2,s2):2 (n3,s3):3 (n5,s5):5] next=6
I180814 05:52:47.773648 132407 storage/replicate_queue_test.go:95  not balanced: [22 19 17 11 12]
I180814 05:52:47.773940 132407 storage/replicate_queue_test.go:90  SucceedsSoon: not balanced: [22 19 17 11 12]
I180814 05:52:47.977613 136997 storage/replica_command.go:750  [replicate,n3,s3,r7/2:/Table/{SystemCon…-11}] change replicas (REMOVE_REPLICA (n1,s1):1): read existing descriptor r7:/Table/{SystemConfigSpan/Start-11} [(n1,s1):1, (n3,s3):2, (n2,s2):3, (n4,s4):4, next=5, gen=842354545784]
I180814 05:52:48.852012 132407 storage/replicate_queue_test.go:95  not balanced: [22 19 17 11 12]
I180814 05:52:48.852487 132407 storage/replicate_queue_test.go:90  SucceedsSoon: not balanced: [22 19 17 11 12]
W180814 05:52:48.886694 133983 storage/node_liveness.go:558  [n4,hb] slow heartbeat took 1.2s
W180814 05:52:48.912242 134330 storage/node_liveness.go:558  [n5,hb] slow heartbeat took 2.2s
I180814 05:52:49.128890 134438 server/status/runtime.go:424  [n5] runtime stats: 1.5 GiB RSS, 1265 goroutines, 102 MiB/11 MiB/140 MiB GO alloc/idle/total, 176 MiB/220 MiB CGO alloc/total, 343.81cgo/sec, 0.93/0.04 %(u/s)time, 0.03 %gc (4x)
I180814 05:52:49.171842 132247 server/status/runtime.go:424  [n1] runtime stats: 1.5 GiB RSS, 1266 goroutines, 102 MiB/11 MiB/140 MiB GO alloc/idle/total, 176 MiB/220 MiB CGO alloc/total, 343.64cgo/sec, 0.93/0.04 %(u/s)time, 0.03 %gc (4x)
I180814 05:52:49.467597 133266 server/status/runtime.go:424  [n2] runtime stats: 1.5 GiB RSS, 1268 goroutines, 88 MiB/24 MiB/140 MiB GO alloc/idle/total, 176 MiB/220 MiB CGO alloc/total, 340.95cgo/sec, 0.92/0.04 %(u/s)time, 0.03 %gc (4x)
W180814 05:52:49.835526 132582 storage/store.go:3743  [n1,s1] handle raft ready: 0.6s [processed=0]
I180814 05:52:49.863387 132407 storage/replicate_queue_test.go:95  not balanced: [22 19 17 11 12]
I180814 05:52:49.868105 132407 storage/replicate_queue_test.go:90  SucceedsSoon: not balanced: [22 19 17 11 12]
I180814 05:52:50.069398 132249 server/status/recorder.go:652  [n1,summaries] available memory from cgroups (8.0 EiB) exceeds system memory 16 GiB, using system memory
W180814 05:52:50.083929 132661 storage/node_liveness.go:558  [n1,hb] slow heartbeat took 1.9s
I180814 05:52:50.190844 134440 server/status/recorder.go:652  [n5,summaries] available memory from cgroups (8.0 EiB) exceeds system memory 16 GiB, using system memory
I180814 05:52:50.306915 133300 server/status/recorder.go:652  [n2,summaries] available memory from cgroups (8.0 EiB) exceeds system memory 16 GiB, using system memory
I180814 05:52:50.922542 132407 storage/replicate_queue_test.go:95  not balanced: [22 19 17 11 12]
I180814 05:52:50.922782 132407 storage/replicate_queue_test.go:90  SucceedsSoon: not balanced: [22 19 17 11 12]
I180814 05:52:50.963464 137059 storage/replica_command.go:750  [replicate,n2,s2,r22/2:/Table/5{1-2}] change replicas (REMOVE_REPLICA (n1,s1):1): read existing descriptor r22:/Table/5{1-2} [(n1,s1):1, (n2,s2):2, (n3,s3):3, (n4,s4):4, next=5, gen=842434146688]
W180814 05:52:51.148622 133594 storage/node_liveness.go:558  [n3,hb] slow heartbeat took 2.4s
W180814 05:52:51.173870 132481 storage/store.go:3743  [n1,s1] handle raft ready: 0.8s [processed=0]
I180814 05:52:51.929567 132407 storage/replicate_queue_test.go:95  not balanced: [22 19 17 11 12]
I180814 05:52:51.929890 132407 storage/replicate_queue_test.go:90  SucceedsSoon: not balanced: [22 19 17 11 12]
I180814 05:52:52.023978 136997 storage/replica.go:3711  [n3,s3,r7/2:/Table/{SystemCon…-11}] proposing REMOVE_REPLICA((n1,s1):1): updated=[(n4,s4):4 (n3,s3):2 (n2,s2):3] next=5
W180814 05:52:52.046494 133541 storage/store.go:3743  [n3,s3] handle raft ready: 0.7s [processed=0]
W180814 05:52:52.131917 133307 storage/node_liveness.go:558  [n2,hb] slow heartbeat took 1.7s
I180814 05:52:52.205712 133585 server/status/runtime.go:424  [n3] runtime stats: 1.5 GiB RSS, 1268 goroutines, 83 MiB/29 MiB/140 MiB GO alloc/idle/total, 176 MiB/220 MiB CGO alloc/total, 310.83cgo/sec, 0.92/0.04 %(u/s)time, 0.03 %gc (3x)
I180814 05:52:52.939701 132407 storage/replicate_queue_test.go:95  not balanced: [22 19 17 11 12]
I180814 05:52:52.944530 132407 storage/replicate_queue_test.go:90  SucceedsSoon: not balanced: [22 19 17 11 12]
W180814 05:52:52.989064 134330 storage/node_liveness.go:558  [n5,hb] slow heartbeat took 1.9s
I180814 05:52:53.068761 133587 server/status/recorder.go:652  [n3,summaries] available memory from cgroups (8.0 EiB) exceeds system memory 16 GiB, using system memory
I180814 05:52:53.163360 137038 storage/replica_command.go:750  [replicate,n3,s3,r6/3:/{System/tse-Table/System…}] change replicas (REMOVE_REPLICA (n2,s2):2): read existing descriptor r6:/{System/tse-Table/SystemConfigSpan/Start} [(n4,s4):4, (n2,s2):2, (n3,s3):3, (n5,s5):5, next=6, gen=842354369056]
I180814 05:52:53.202038 136878 storage/replica.go:3711  [n1,s1,r19/1:/Table/2{2-3}] proposing ADD_REPLICA((n4,s4):4): updated=[(n1,s1):1 (n5,s5):2 (n3,s3):3 (n4,s4):4] next=5
W180814 05:52:53.792662 132579 storage/store.go:3743  [n1,s1] handle raft ready: 0.6s [processed=0]
W180814 05:52:53.844254 132592 storage/store.go:3743  [n1,s1] handle raft ready: 0.6s [processed=0]
W180814 05:52:53.930369 133551 storage/store.go:3743  [n3,s3] handle raft ready: 0.6s [processed=1]
I180814 05:52:53.957374 132407 storage/replicate_queue_test.go:95  not balanced: [22 19 17 11 12]
I180814 05:52:53.957739 132407 storage/replicate_queue_test.go:90  SucceedsSoon: not balanced: [22 19 17 11 12]
I180814 05:52:53.979729 133799 storage/store.go:3618  [n1,s1,r7/1:/Table/{SystemCon…-11}] added to replica GC queue (peer suggestion)
I180814 05:52:54.013975 133799 storage/store.go:3618  [n1,s1,r7/1:/Table/{SystemCon…-11}] added to replica GC queue (peer suggestion)
I180814 05:52:54.014687 133799 storage/store.go:3618  [n1,s1,r7/1:/Table/{SystemCon…-11}] added to replica GC queue (peer suggestion)
I180814 05:52:54.186071 137114 storage/store.go:2555  [replicaGC,n1,s1,r7/1:/Table/{SystemCon…-11}] removing replica
I180814 05:52:54.434019 137041 storage/replica_range_lease.go:554  [replicate,n1,s1,r19/1:/Table/2{2-3}] transferring lease to s5
I180814 05:52:54.566743 137114 storage/replica.go:858  [replicaGC,n1,s1,r7/1:/Table/{SystemCon…-11}] removed 55 (48+7) keys in 330ms [clear=1ms commit=329ms]
I180814 05:52:54.959387 132407 storage/replicate_queue_test.go:95  not balanced: [21 19 17 11 12]
I180814 05:52:54.959627 132407 storage/replicate_queue_test.go:90  SucceedsSoon: not balanced: [21 19 17 11 12]
W180814 05:52:55.137528 133983 storage/node_liveness.go:558  [n4,hb] slow heartbeat took 3.1s
W180814 05:52:55.275568 132661 storage/node_liveness.go:558  [n1,hb] slow heartbeat took 2.6s
I180814 05:52:55.277461 137041 storage/replica_range_lease.go:617  [replicate,n1,s1,r19/1:/Table/2{2-3}] done transferring lease to s5: <nil>
I180814 05:52:55.281618 137120 storage/replica_raftstorage.go:521  [replicate,n1,s1,r11/1:/Table/1{4-5}] generated preemptive snapshot 394fc959 at index 24
I180814 05:52:55.293476 137120 storage/store_snapshot.go:641  [replicate,n1,s1,r11/1:/Table/1{4-5}] streamed snapshot to (n4,s4):?: kv pairs: 10, log entries: 14, rate-limit: 2.0 MiB/sec, 10ms
I180814 05:52:55.298431 134364 storage/replica_proposal.go:214  [n5,s5,r19/2:/Table/2{2-3}] new range lease repl=(n5,s5):2 seq=4 start=1534225974.434640051,0 epo=1 pro=1534225974.435039185,0 following repl=(n1,s1):1 seq=3 start=1534225868.888551998,0 epo=1 pro=1534225877.645342106,0
I180814 05:52:55.335467 137171 storage/replica_raftstorage.go:758  [n4,s4,r11/?:{-}] applying preemptive snapshot at index 24 (id=394fc959, encoded size=4311, 1 rocksdb batches, 14 log entries)
I180814 05:52:55.674871 133879 server/status/runtime.go:424  [n4] runtime stats: 1.5 GiB RSS, 1274 goroutines, 84 MiB/26 MiB/140 MiB GO alloc/idle/total, 180 MiB/223 MiB CGO alloc/total, 314.67cgo/sec, 0.91/0.04 %(u/s)time, 0.03 %gc (3x)
W180814 05:52:55.857061 133594 storage/node_liveness.go:558  [n3,hb] slow heartbeat took 2.7s
I180814 05:52:55.960404 132407 storage/replicate_queue_test.go:95  not balanced: [21 19 17 12 12]
I180814 05:52:55.960657 132407 storage/replicate_queue_test.go:90  SucceedsSoon: not balanced: [21 19 17 12 12]
I180814 05:52:56.017691 133881 server/status/recorder.go:652  [n4,summaries] available memory from cgroups (8.0 EiB) exceeds system memory 16 GiB, using system memory
I180814 05:52:56.997189 132407 storage/replicate_queue_test.go:95  not balanced: [21 19 17 12 12]
I180814 05:52:57.001954 132407 storage/replicate_queue_test.go:90  SucceedsSoon: not balanced: [21 19 17 12 12]
I180814 05:52:57.004825 137206 storage/replica_command.go:750  [replicate,n5,s5,r19/2:/Table/2{2-3}] change replicas (REMOVE_REPLICA (n1,s1):1): read existing descriptor r19:/Table/2{2-3} [(n1,s1):1, (n5,s5):2, (n3,s3):3, (n4,s4):4, next=5, gen=842434915488]
I180814 05:52:57.028950 137059 storage/replica.go:3711  [n2,s2,r22/2:/Table/5{1-2}] proposing REMOVE_REPLICA((n1,s1):1): updated=[(n4,s4):4 (n2,s2):2 (n3,s3):3] next=5
W180814 05:52:57.199350 133307 storage/node_liveness.go:558  [n2,hb] slow heartbeat took 2.3s
I180814 05:52:57.553659 137038 storage/replica.go:3711  [n3,s3,r6/3:/{System/tse-Table/System…}] proposing REMOVE_REPLICA((n2,s2):2): updated=[(n4,s4):4 (n5,s5):5 (n3,s3):3] next=6
I180814 05:52:57.704172 137171 storage/replica_raftstorage.go:764  [n4,s4,r11/?:/Table/1{4-5}] applied preemptive snapshot in 2364ms [clear=0ms batch=0ms entries=2337ms commit=26ms]
I180814 05:52:57.852766 137120 storage/replica_command.go:750  [replicate,n1,s1,r11/1:/Table/1{4-5}] change replicas (ADD_REPLICA (n4,s4):4): read existing descriptor r11:/Table/1{4-5} [(n1,s1):1, (n3,s3):2, (n2,s2):3, next=4, gen=842354993696]
W180814 05:52:57.964942 134330 storage/node_liveness.go:558  [n5,hb] slow heartbeat took 2.4s
I180814 05:52:58.011582 132407 storage/replicate_queue_test.go:95  not balanced: [21 19 17 12 12]
I180814 05:52:58.011931 132407 storage/replicate_queue_test.go:90  SucceedsSoon: not balanced: [21 19 17 12 12]
I180814 05:52:58.120508 136057 storage/store.go:3618  [n2,s2,r6/2:/{System/tse-Table/System…}] added to replica GC queue (peer suggestion)
I180814 05:52:58.184542 137252 storage/replica_raftstorage.go:521  [replicate,n2,s2,r22/2:/Table/5{1-2}] generated preemptive snapshot 3bec24b8 at index 44
I180814 05:52:58.221987 137165 storage/replica_command.go:750  [replicate,n3,s3,r3/2:/System/NodeLiveness{-Max}] change replicas (REMOVE_REPLICA (n1,s1):1): read existing descriptor r3:/System/NodeLiveness{-Max} [(n1,s1):1, (n3,s3):2, (n5,s5):3, (n2,s2):4, next=5, gen=842367215856]
I180814 05:52:58.223407 133673 storage/store.go:3618  [n1,s1,r22/1:/Table/5{1-2}] added to replica GC queue (peer suggestion)
I180814 05:52:58.224012 133673 storage/store.go:3618  [n1,s1,r22/1:/Table/5{1-2}] added to replica GC queue (peer suggestion)
I180814 05:52:58.242526 133673 storage/store.go:3618  [n1,s1,r22/1:/Table/5{1-2}] added to replica GC queue (peer suggestion)
I180814 05:52:58.262228 137252 storage/store_snapshot.go:641  [replicate,n2,s2,r22/2:/Table/5{1-2}] streamed snapshot to (n5,s5):?: kv pairs: 13, log entries: 34, rate-limit: 2.0 MiB/sec, 76ms
I180814 05:52:58.266989 137283 storage/replica_raftstorage.go:758  [n5,s5,r22/?:{-}] applying preemptive snapshot at index 44 (id=3bec24b8, encoded size=8996, 1 rocksdb batches, 34 log entries)
W180814 05:52:58.321559 133983 storage/node_liveness.go:558  [n4,hb] slow heartbeat took 1.7s
W180814 05:52:58.610312 132661 storage/node_liveness.go:558  [n1,hb] slow heartbeat took 1.4s
I180814 05:52:58.614199 137150 storage/store.go:2555  [replicaGC,n1,s1,r22/1:/Table/5{1-2}] removing replica
I180814 05:52:58.822823 137150 storage/replica.go:858  [replicaGC,n1,s1,r22/1:/Table/5{1-2}] removed 7 (0+7) keys in 184ms [clear=0ms commit=183ms]
I180814 05:52:58.890999 137151 storage/store.go:2555  [replicaGC,n2,s2,r6/2:/{System/tse-Table/System…}] removing replica
W180814 05:52:58.892814 133594 storage/node_liveness.go:558  [n3,hb] slow heartbeat took 1.2s
I180814 05:52:59.013036 132407 storage/replicate_queue_test.go:95  not balanced: [20 19 17 12 13]
I180814 05:52:59.013415 132407 storage/replicate_queue_test.go:90  SucceedsSoon: not balanced: [20 19 17 12 13]
I180814 05:52:59.112958 137151 storage/replica.go:858  [replicaGC,n2,s2,r6/2:/{System/tse-Table/System…}] removed 8 (0+8) keys in 213ms [clear=0ms commit=213ms]
I180814 05:52:59.369436 134438 server/status/runtime.go:424  [n5] runtime stats: 1.5 GiB RSS, 1272 goroutines, 92 MiB/21 MiB/140 MiB GO alloc/idle/total, 179 MiB/223 MiB CGO alloc/total, 352.62cgo/sec, 0.87/0.08 %(u/s)time, 0.03 %gc (4x)
I180814 05:52:59.432695 132247 server/status/runtime.go:424  [n1] runtime stats: 1.5 GiB RSS, 1273 goroutines, 78 MiB/35 MiB/140 MiB GO alloc/idle/total, 179 MiB/223 MiB CGO alloc/total, 355.14cgo/sec, 0.88/0.08 %(u/s)time, 0.03 %gc (4x)
I180814 05:52:59.441043 133266 server/status/runtime.go:424  [n2] runtime stats: 1.5 GiB RSS, 1269 goroutines, 78 MiB/35 MiB/140 MiB GO alloc/idle/total, 179 MiB/223 MiB CGO alloc/total, 367.78cgo/sec, 0.88/0.08 %(u/s)time, 0.03 %gc (4x)
I180814 05:53:00.028382 132407 storage/replicate_queue_test.go:95  not balanced: [20 18 17 12 13]
I180814 05:53:00.028650 132407 storage/replicate_queue_test.go:90  SucceedsSoon: not balanced: [20 18 17 12 13]
I180814 05:53:00.080412 132249 server/status/recorder.go:652  [n1,summaries] available memory from cgroups (8.0 EiB) exceeds system memory 16 GiB, using system memory
I180814 05:53:00.209006 134440 server/status/recorder.go:652  [n5,summaries] available memory from cgroups (8.0 EiB) exceeds system memory 16 GiB, using system memory
I180814 05:53:00.302714 133300 server/status/recorder.go:652  [n2,summaries] available memory from cgroups (8.0 EiB) exceeds system memory 16 GiB, using system memory
I180814 05:53:00.809742 137120 storage/replica.go:3711  [n1,s1,r11/1:/Table/1{4-5}] proposing ADD_REPLICA((n4,s4):4): updated=[(n1,s1):1 (n3,s3):2 (n2,s2):3 (n4,s4):4] next=5
I180814 05:53:00.844533 137206 storage/replica.go:3711  [n5,s5,r19/2:/Table/2{2-3}] proposing REMOVE_REPLICA((n1,s1):1): updated=[(n4,s4):4 (n5,s5):2 (n3,s3):3] next=5
I180814 05:53:01.053608 132407 storage/replicate_queue_test.go:95  not balanced: [20 18 17 12 13]
I180814 05:53:01.053971 132407 storage/replicate_queue_test.go:90  SucceedsSoon: not balanced: [20 18 17 12 13]
W180814 05:53:01.227644 134330 storage/node_liveness.go:558  [n5,hb] slow heartbeat took 1.1s
I180814 05:53:01.346215 137296 storage/replica_range_lease.go:554  [replicate,n1,s1,r11/1:/Table/1{4-5}] transferring lease to s2
I180814 05:53:01.629390 137165 storage/replica.go:3711  [n3,s3,r3/2:/System/NodeLiveness{-Max}] proposing REMOVE_REPLICA((n1,s1):1): updated=[(n2,s2):4 (n3,s3):2 (n5,s5):3] next=5
I180814 05:53:02.065357 137283 storage/replica_raftstorage.go:764  [n5,s5,r22/?:/Table/5{1-2}] applied preemptive snapshot in 3798ms [clear=0ms batch=3ms entries=3793ms commit=1ms]
I180814 05:53:02.136376 132407 storage/replicate_queue_test.go:95  not balanced: [20 18 17 12 13]
I180814 05:53:02.136748 132407 storage/replicate_queue_test.go:90  SucceedsSoon: not balanced: [20 18 17 12 13]
I180814 05:53:02.148369 132994 rpc/nodedialer/nodedialer.go:92  [intExec=adopt-job] connection to n1 established
I180814 05:53:02.213484 137252 storage/replica_command.go:750  [replicate,n2,s2,r22/2:/Table/5{1-2}] change replicas (ADD_REPLICA (n5,s5):5): read existing descriptor r22:/Table/5{1-2} [(n4,s4):4, (n2,s2):2, (n3,s3):3, next=5, gen=842413733352]
I180814 05:53:02.238984 133585 server/status/runtime.go:424  [n3] runtime stats: 1.5 GiB RSS, 1272 goroutines, 59 MiB/50 MiB/140 MiB GO alloc/idle/total, 179 MiB/224 MiB CGO alloc/total, 414.91cgo/sec, 0.87/0.09 %(u/s)time, 0.02 %gc (4x)
I180814 05:53:02.323726 137218 rpc/nodedialer/nodedialer.go:92  [n1] connection to n2 established
I180814 05:53:02.386357 137296 storage/replica_range_lease.go:617  [replicate,n1,s1,r11/1:/Table/1{4-5}] done transferring lease to s2: <nil>
I180814 05:53:02.527335 134689 storage/store.go:3618  [n1,s1,r19/1:/Table/2{2-3}] added to replica GC queue (peer suggestion)
I180814 05:53:02.528055 134689 storage/store.go:3618  [n1,s1,r19/1:/Table/2{2-3}] added to replica GC queue (peer suggestion)
I180814 05:53:02.531295 137324 storage/store.go:2555  [replicaGC,n1,s1,r19/1:/Table/2{2-3}] removing replica
I180814 05:53:02.537941 137324 storage/replica.go:858  [replicaGC,n1,s1,r19/1:/Table/2{2-3}] removed 7 (0+7) keys in 5ms [clear=0ms commit=4ms]
I180814 05:53:02.613640 133201 storage/replica_proposal.go:214  [n2,s2,r11/3:/Table/1{4-5}] new range lease repl=(n2,s2):3 seq=4 start=1534225981.346598311,0 epo=1 pro=1534225981.346647453,0 following repl=(n1,s1):1 seq=3 start=1534225868.888551998,0 epo=1 pro=1534225877.645342106,0
I180814 05:53:02.661015 137369 storage/replica_raftstorage.go:521  [replicate,n1,s1,r9/1:/Table/1{2-3}] generated preemptive snapshot 263114a0 at index 43
I180814 05:53:02.809685 137428 storage/replica_raftstorage.go:521  [replicate,n3,s3,r7/2:/Table/{SystemCon…-11}] generated preemptive snapshot 6af12962 at index 58
I180814 05:53:03.088715 133587 server/status/recorder.go:652  [n3,summaries] available memory from cgroups (8.0 EiB) exceeds system memory 16 GiB, using system memory
I180814 05:53:03.112016 137369 storage/store_snapshot.go:641  [replicate,n1,s1,r9/1:/Table/1{2-3}] streamed snapshot to (n5,s5):?: kv pairs: 65, log entries: 33, rate-limit: 2.0 MiB/sec, 364ms
I180814 05:53:03.090290 137428 storage/replicate_queue.go:241  [replicate,n3,s3,r7/2:/Table/{SystemCon…-11}] snapshot failed: (n5,s5):?: remote declined snapshot: store busy applying snapshots
I180814 05:53:03.175590 132407 storage/replicate_queue_test.go:95  not balanced: [19 18 17 12 13]
I180814 05:53:03.175875 132407 storage/replicate_queue_test.go:90  SucceedsSoon: not balanced: [19 18 17 12 13]
I180814 05:53:03.187712 137330 storage/replica_raftstorage.go:758  [n5,s5,r9/?:{-}] applying preemptive snapshot at index 43 (id=263114a0, encoded size=24915, 1 rocksdb batches, 33 log entries)
I180814 05:53:03.189991 133799 storage/store.go:3618  [n1,s1,r3/1:/System/NodeLiveness{-Max}] added to replica GC queue (peer suggestion)
I180814 05:53:03.255358 133799 storage/store.go:3618  [n1,s1,r3/1:/System/NodeLiveness{-Max}] added to replica GC queue (peer suggestion)
I180814 05:53:03.255871 133799 storage/store.go:3618  [n1,s1,r3/1:/System/NodeLiveness{-Max}] added to replica GC queue (peer suggestion)
W180814 05:53:03.373062 133594 storage/node_liveness.go:558  [n3,hb] slow heartbeat took 1.2s
I180814 05:53:03.501426 137372 storage/store.go:2555  [replicaGC,n1,s1,r3/1:/System/NodeLiveness{-Max}] removing replica
I180814 05:53:03.601118 137372 storage/replica.go:858  [replicaGC,n1,s1,r3/1:/System/NodeLiveness{-Max}] removed 13 (5+8) keys in 87ms [clear=6ms commit=81ms]
W180814 05:53:03.817808 133983 storage/node_liveness.go:558  [n4,hb] slow heartbeat took 2.7s
W180814 05:53:03.909835 132661 storage/node_liveness.go:558  [n1,hb] slow heartbeat took 2.2s
I180814 05:53:04.189970 132407 storage/replicate_queue_test.go:95  not balanced: [18 18 17 12 14]
I180814 05:53:04.190388 132407 storage/replicate_queue_test.go:90  SucceedsSoon: not balanced: [18 18 17 12 14]
I180814 05:53:04.448960 133336 rpc/nodedialer/nodedialer.go:92  [intExec=adopt-job] connection to n1 established
I180814 05:53:04.458773 137468 rpc/nodedialer/nodedialer.go:92  [n1] connection to n3 established
I180814 05:53:04.628572 137252 storage/replica.go:3711  [n2,s2,r22/2:/Table/5{1-2}] proposing ADD_REPLICA((n5,s5):5): updated=[(n4,s4):4 (n2,s2):2 (n3,s3):3 (n5,s5):5] next=6
I180814 05:53:04.902668 137330 storage/replica_raftstorage.go:764  [n5,s5,r9/?:/Table/1{2-3}] applied preemptive snapshot in 1662ms [clear=0ms batch=0ms entries=1622ms commit=33ms]
I180814 05:53:04.906249 137474 storage/replica_range_lease.go:554  [replicate,n2,s2,r22/2:/Table/5{1-2}] transferring lease to s4
I180814 05:53:05.057972 137369 storage/replica_command.go:750  [replicate,n1,s1,r9/1:/Table/1{2-3}] change replicas (ADD_REPLICA (n5,s5):4): read existing descriptor r9:/Table/1{2-3} [(n1,s1):1, (n3,s3):2, (n4,s4):3, next=4, gen=842387789168]
I180814 05:53:05.195437 132407 storage/replicate_queue_test.go:95  not balanced: [18 18 17 12 14]
I180814 05:53:05.195765 132407 storage/replicate_queue_test.go:90  SucceedsSoon: not balanced: [18 18 17 12 14]
I180814 05:53:05.271159 137474 storage/replica_range_lease.go:617  [replicate,n2,s2,r22/2:/Table/5{1-2}] done transferring lease to s4: <nil>
I180814 05:53:05.318457 133919 storage/replica_proposal.go:214  [n4,s4,r22/4:/Table/5{1-2}] new range lease repl=(n4,s4):4 seq=5 start=1534225984.906490116,0 epo=1 pro=1534225984.906742493,0 following repl=(n2,s2):2 seq=4 start=1534225960.364111112,0 epo=1 pro=1534225960.364196001,0
I180814 05:53:05.612755 133879 server/status/runtime.go:424  [n4] runtime stats: 1.5 GiB RSS, 1264 goroutines, 70 MiB/38 MiB/140 MiB GO alloc/idle/total, 179 MiB/224 MiB CGO alloc/total, 438.93cgo/sec, 0.89/0.09 %(u/s)time, 0.02 %gc (4x)
I180814 05:53:05.968756 133881 server/status/recorder.go:652  [n4,summaries] available memory from cgroups (8.0 EiB) exceeds system memory 16 GiB, using system memory
I180814 05:53:06.221587 132407 storage/replicate_queue_test.go:95  not balanced: [18 18 17 12 14]
I180814 05:53:06.221947 132407 storage/replicate_queue_test.go:90  SucceedsSoon: not balanced: [18 18 17 12 14]
I180814 05:53:07.074262 137369 storage/replica.go:3711  [n1,s1,r9/1:/Table/1{2-3}] proposing ADD_REPLICA((n5,s5):4): updated=[(n1,s1):1 (n3,s3):2 (n4,s4):3 (n5,s5):4] next=5
I180814 05:53:07.240283 132407 storage/replicate_queue_test.go:95  not balanced: [18 18 17 12 14]
I180814 05:53:07.240641 132407 storage/replicate_queue_test.go:90  SucceedsSoon: not balanced: [18 18 17 12 14]
I180814 05:53:07.428801 137551 storage/replica_range_lease.go:554  [replicate,n1,s1,r9/1:/Table/1{2-3}] transferring lease to s4
W180814 05:53:07.617725 132602 storage/store.go:3743  [n1,s1] handle raft ready: 0.7s [processed=0]
I180814 05:53:07.666701 137551 storage/replica_range_lease.go:617  [replicate,n1,s1,r9/1:/Table/1{2-3}] done transferring lease to s4: <nil>
I180814 05:53:07.763231 133941 storage/replica_proposal.go:214  [n4,s4,r9/3:/Table/1{2-3}] new range lease repl=(n4,s4):3 seq=4 start=1534225987.434793284,0 epo=1 pro=1534225987.434858804,0 following repl=(n1,s1):1 seq=3 start=1534225868.888551998,0 epo=1 pro=1534225877.645342106,0
I180814 05:53:07.910897 137538 storage/replica_raftstorage.go:521  [replicate,n1,s1,r20/1:/Table/{23-50}] generated preemptive snapshot a1e8c659 at index 24
I180814 05:53:08.185436 137538 storage/store_snapshot.go:641  [replicate,n1,s1,r20/1:/Table/{23-50}] streamed snapshot to (n4,s4):?: kv pairs: 14, log entries: 14, rate-limit: 2.0 MiB/sec, 175ms
I180814 05:53:08.208064 137594 storage/replica_raftstorage.go:758  [n4,s4,r20/?:{-}] applying preemptive snapshot at index 24 (id=a1e8c659, encoded size=4443, 1 rocksdb batches, 14 log entries)
I180814 05:53:08.247699 132407 storage/replicate_queue_test.go:95  not balanced: [18 18 17 13 14]
I180814 05:53:08.248067 132407 storage/replicate_queue_test.go:90  SucceedsSoon: not balanced: [18 18 17 13 14]
I180814 05:53:08.273258 137667 storage/replica_command.go:750  [replicate,n4,s4,r9/3:/Table/1{2-3}] change replicas (REMOVE_REPLICA (n1,s1):1): read existing descriptor r9:/Table/1{2-3} [(n1,s1):1, (n3,s3):2, (n4,s4):3, (n5,s5):4, next=5, gen=842449523568]
I180814 05:53:08.358026 137594 storage/replica_raftstorage.go:764  [n4,s4,r20/?:/Table/{23-50}] applied preemptive snapshot in 143ms [clear=0ms batch=0ms entries=102ms commit=9ms]
I180814 05:53:08.626542 137538 storage/replica_command.go:750  [replicate,n1,s1,r20/1:/Table/{23-50}] change replicas (ADD_REPLICA (n4,s4):4): read existing descriptor r20:/Table/{23-50} [(n1,s1):1, (n3,s3):2, (n2,s2):3, next=4, gen=842425100560]
I180814 05:53:09.224425 134438 server/status/runtime.go:424  [n5] runtime stats: 1.5 GiB RSS, 1270 goroutines, 80 MiB/29 MiB/140 MiB GO alloc/idle/total, 184 MiB/228 MiB CGO alloc/total, 398.53cgo/sec, 0.90/0.05 %(u/s)time, 0.02 %gc (3x)
I180814 05:53:09.234701 132247 server/status/runtime.go:424  [n1] runtime stats: 1.5 GiB RSS, 1268 goroutines, 80 MiB/28 MiB/140 MiB GO alloc/idle/total, 184 MiB/228 MiB CGO alloc/total, 396.86cgo/sec, 0.91/0.05 %(u/s)time, 0.02 %gc (3x)
I180814 05:53:09.244713 132399 gossip/gossip.go:518  [n1] gossip status (ok, 5 nodes)
gossip client (0/3 cur/max conns)
gossip server (3/3 cur/max conns, infos 4190/135 sent/received, bytes 797678B/53300B sent/received)
  2: 127.0.0.1:46529 (1m40s)
  3: 127.0.0.1:33753 (1m37s)
  4: 127.0.0.1:39341 (1m34s)
I180814 05:53:09.278597 132407 storage/replicate_queue_test.go:95  not balanced: [18 18 17 13 14]
I180814 05:53:09.278872 132407 storage/replicate_queue_test.go:90  SucceedsSoon: not balanced: [18 18 17 13 14]
I180814 05:53:09.486198 137690 storage/replica_command.go:750  [replicate,n2,s2,r23/2:/Table/5{2-3}] change replicas (REMOVE_REPLICA (n1,s1):1): read existing descriptor r23:/Table/5{2-3} [(n1,s1):1, (n2,s2):2, (n3,s3):3, (n4,s4):4, next=5, gen=842430347432]
I180814 05:53:09.515441 133266 server/status/runtime.go:424  [n2] runtime stats: 1.5 GiB RSS, 1266 goroutines, 84 MiB/24 MiB/140 MiB GO alloc/idle/total, 184 MiB/228 MiB CGO alloc/total, 390.93cgo/sec, 0.90/0.05 %(u/s)time, 0.02 %gc (3x)
I180814 05:53:10.132943 132249 server/status/recorder.go:652  [n1,summaries] available memory from cgroups (8.0 EiB) exceeds system memory 16 GiB, using system memory
I180814 05:53:10.193504 134440 server/status/recorder.go:652  [n5,summaries] available memory from cgroups (8.0 EiB) exceeds system memory 16 GiB, using system memory
I180814 05:53:10.319456 133300 server/status/recorder.go:652  [n2,summaries] available memory from cgroups (8.0 EiB) exceeds system memory 16 GiB, using system memory
I180814 05:53:10.384448 132407 storage/replicate_queue_test.go:95  not balanced: [18 18 17 13 14]
I180814 05:53:10.384833 132407 storage/replicate_queue_test.go:90  SucceedsSoon: not balanced: [18 18 17 13 14]
W180814 05:53:10.390223 134330 storage/node_liveness.go:558  [n5,hb] slow heartbeat took 1.2s
I180814 05:53:11.406046 132407 storage/replicate_queue_test.go:95  not balanced: [18 18 17 13 14]
I180814 05:53:11.406446 132407 storage/replicate_queue_test.go:90  SucceedsSoon: not balanced: [18 18 17 13 14]
W180814 05:53:11.938566 132603 storage/store.go:3743  [n1,s1] handle raft ready: 0.7s [processed=0]
I180814 05:53:12.198686 133585 server/status/runtime.go:424  [n3] runtime stats: 1.5 GiB RSS, 1273 goroutines, 82 MiB/27 MiB/140 MiB GO alloc/idle/total, 189 MiB/232 MiB CGO alloc/total, 329.64cgo/sec, 0.90/0.04 %(u/s)time, 0.02 %gc (3x)
I180814 05:53:12.219429 133677 rpc/nodedialer/nodedialer.go:92  [intExec=adopt-job] connection to n1 established
I180814 05:53:12.243110 137780 rpc/nodedialer/nodedialer.go:92  [n1] connection to n4 established
I180814 05:53:12.416105 132407 storage/replicate_queue_test.go:95  not balanced: [18 18 17 13 14]
I180814 05:53:12.416435 132407 storage/replicate_queue_test.go:90  SucceedsSoon: not balanced: [18 18 17 13 14]
W180814 05:53:12.781769 133246 storage/store.go:3743  [n2,s2] handle raft ready: 0.6s [processed=0]
W180814 05:53:12.885001 132599 storage/store.go:3743  [n1,s1] handle raft ready: 0.6s [processed=0]
W180814 05:53:12.936229 133983 storage/node_liveness.go:558  [n4,hb] slow heartbeat took 2.7s
I180814 05:53:13.409197 133587 server/status/recorder.go:652  [n3,summaries] available memory from cgroups (8.0 EiB) exceeds system memory 16 GiB, using system memory
I180814 05:53:13.495538 132407 storage/replicate_queue_test.go:95  not balanced: [18 18 17 13 14]
I180814 05:53:13.495787 132407 storage/replicate_queue_test.go:90  SucceedsSoon: not balanced: [18 18 17 13 14]
W180814 05:53:13.677736 133594 storage/node_liveness.go:558  [n3,hb] slow heartbeat took 2.4s
W180814 05:53:13.830802 132661 storage/node_liveness.go:558  [n1,hb] slow heartbeat took 3.1s
W180814 05:53:14.008971 132552 storage/store.go:3743  [n1,s1] handle raft ready: 0.6s [processed=0]
I180814 05:53:14.085782 137538 storage/replica.go:3711  [n1,s1,r20/1:/Table/{23-50}] proposing ADD_REPLICA((n4,s4):4): updated=[(n1,s1):1 (n3,s3):2 (n2,s2):3 (n4,s4):4] next=5
I180814 05:53:14.642439 132407 storage/replicate_queue_test.go:95  not balanced: [18 18 17 13 14]
I180814 05:53:14.642713 132407 storage/replicate_queue_test.go:90  SucceedsSoon: not balanced: [18 18 17 13 14]
W180814 05:53:15.066906 133307 storage/node_liveness.go:558  [n2,hb] slow heartbeat took 2.1s
I180814 05:53:15.169529 137740 storage/replica_range_lease.go:554  [replicate,n1,s1,r20/1:/Table/{23-50}] transferring lease to s3
I180814 05:53:15.359865 134114 rpc/nodedialer/nodedialer.go:92  [intExec=adopt-job] connection to n1 established
I180814 05:53:15.368067 137821 rpc/nodedialer/nodedialer.go:92  [n1] connection to n5 established
I180814 05:53:15.646267 132407 storage/replicate_queue_test.go:95  not balanced: [18 18 17 13 14]
I180814 05:53:15.646572 132407 storage/replicate_queue_test.go:90  SucceedsSoon: not balanced: [18 18 17 13 14]
I180814 05:53:15.813073 133879 server/status/runtime.go:424  [n4] runtime stats: 1.5 GiB RSS, 1270 goroutines, 100 MiB/13 MiB/140 MiB GO alloc/idle/total, 189 MiB/232 MiB CGO alloc/total, 273.74cgo/sec, 0.90/0.03 %(u/s)time, 0.03 %gc (4x)
W180814 05:53:15.923845 134330 storage/node_liveness.go:558  [n5,hb] slow heartbeat took 2.3s
I180814 05:53:15.970660 133881 server/status/recorder.go:652  [n4,summaries] available memory from cgroups (8.0 EiB) exceeds system memory 16 GiB, using system memory
I180814 05:53:16.209535 137690 storage/replica.go:3711  [n2,s2,r23/2:/Table/5{2-3}] proposing REMOVE_REPLICA((n1,s1):1): updated=[(n4,s4):4 (n2,s2):2 (n3,s3):3] next=5
W180814 05:53:16.415762 133983 storage/node_liveness.go:558  [n4,hb] slow heartbeat took 1.8s
I180814 05:53:16.563516 137740 storage/replica_range_lease.go:617  [replicate,n1,s1,r20/1:/Table/{23-50}] done transferring lease to s3: <nil>
I180814 05:53:16.567837 137742 storage/replica_raftstorage.go:521  [replicate,n1,s1,r17/1:/Table/2{0-1}] generated preemptive snapshot 6471ce2c at index 24
I180814 05:53:16.652596 137742 storage/store_snapshot.go:641  [replicate,n1,s1,r17/1:/Table/2{0-1}] streamed snapshot to (n4,s4):?: kv pairs: 10, log entries: 14, rate-limit: 2.0 MiB/sec, 83ms
I180814 05:53:16.667327 132407 storage/replicate_queue_test.go:95  not balanced: [18 18 17 13 14]
I180814 05:53:16.667736 132407 storage/replicate_queue_test.go:90  SucceedsSoon: not balanced: [18 18 17 13 14]
I180814 05:53:16.674474 137867 storage/replica_raftstorage.go:758  [n4,s4,r17/?:{-}] applying preemptive snapshot at index 24 (id=6471ce2c, encoded size=4478, 1 rocksdb batches, 14 log entries)
I180814 05:53:16.679993 137837 storage/replica_range_lease.go:554  [replicate,n2,s2,r11/3:/Table/1{4-5}] transferring lease to s4
I180814 05:53:16.806246 137667 storage/replica.go:3711  [n4,s4,r9/3:/Table/1{2-3}] proposing REMOVE_REPLICA((n1,s1):1): updated=[(n5,s5):4 (n3,s3):2 (n4,s4):3] next=5
I180814 05:53:16.832421 133517 storage/replica_proposal.go:214  [n3,s3,r20/2:/Table/{23-50}] new range lease repl=(n3,s3):2 seq=4 start=1534225995.169809389,0 epo=1 pro=1534225995.169870328,0 following repl=(n1,s1):1 seq=3 start=1534225868.888551998,0 epo=1 pro=1534225877.645342106,0
I180814 05:53:16.867066 133673 storage/store.go:3618  [n1,s1,r23/1:/Table/5{2-3}] added to replica GC queue (peer suggestion)
I180814 05:53:16.891426 137867 storage/replica_raftstorage.go:764  [n4,s4,r17/?:/Table/2{0-1}] applied preemptive snapshot in 217ms [clear=0ms batch=0ms entries=128ms commit=7ms]
I180814 05:53:16.934379 133950 storage/replica_proposal.go:214  [n4,s4,r11/4:/Table/1{4-5}] new range lease repl=(n4,s4):4 seq=5 start=1534225996.680368789,0 epo=1 pro=1534225996.680442069,0 following repl=(n2,s2):3 seq=4 start=1534225981.346598311,0 epo=1 pro=1534225981.346647453,0
I180814 05:53:16.954075 133673 storage/store.go:3618  [n1,s1,r23/1:/Table/5{2-3}] added to replica GC queue (peer suggestion)
I180814 05:53:17.030673 137742 storage/replica_command.go:750  [replicate,n1,s1,r17/1:/Table/2{0-1}] change replicas (ADD_REPLICA (n4,s4):4): read existing descriptor r17:/Table/2{0-1} [(n1,s1):1, (n5,s5):2, (n3,s3):3, next=4, gen=842424627976]
I180814 05:53:17.117564 137837 storage/replica_range_lease.go:617  [replicate,n2,s2,r11/3:/Table/1{4-5}] done transferring lease to s4: <nil>
I180814 05:53:17.311931 137883 storage/replica_raftstorage.go:521  [replicate,n2,s2,r23/2:/Table/5{2-3}] generated preemptive snapshot ec7e4cdb at index 38
W180814 05:53:17.510084 132661 storage/node_liveness.go:558  [n1,hb] slow heartbeat took 2.3s
W180814 05:53:17.529832 133594 storage/node_liveness.go:558  [n3,hb] slow heartbeat took 1.9s
I180814 05:53:17.619115 134916 storage/store.go:3618  [n1,s1,r9/1:/Table/1{2-3}] added to replica GC queue (peer suggestion)
I180814 05:53:17.619835 134916 storage/store.go:3618  [n1,s1,r9/1:/Table/1{2-3}] added to replica GC queue (peer suggestion)
I180814 05:53:17.636757 137839 storage/store.go:2555  [replicaGC,n1,s1,r23/1:/Table/5{2-3}] removing replica
I180814 05:53:17.718691 137883 storage/store_snapshot.go:641  [replicate,n2,s2,r23/2:/Table/5{2-3}] streamed snapshot to (n5,s5):?: kv pairs: 11, log entries: 28, rate-limit: 2.0 MiB/sec, 405ms
I180814 05:53:17.727324 137858 util/stop/stopper.go:537  quiescing; tasks left:
1      ts.poller: poll
1      node.Node: batch
I180814 05:53:17.734061 137857 util/stop/stopper.go:537  quiescing; tasks left:
1      ts.poller: poll
1      storage.intentResolver: processing intents
1      kv.DistSender: sending partial batch
1      [async] storage.replicate: processing replica
1      [async] gossip.infoStore: callback
I180814 05:53:17.741996 137856 util/stop/stopper.go:537  quiescing; tasks left:
4      node.Node: batch
1      ts.poller: poll
1      storage.intentResolver: processing intents
1      [async] transport racer
1      [async] storage.replicate: processing replica
1      [async] storage.replicaGC: processing replica
1      [async] kv.TxnCoordSender: heartbeat loop
W180814 05:53:17.765722 133090 gossip/infostore.go:303  [n2] node unavailable; try another peer
W180814 05:53:17.857011 133307 internal/client/txn.go:514  [n2,hb] failure aborting transaction: node unavailable; try another peer; abort caused by: result is ambiguous (error=context canceled [exhausted])
I180814 05:53:17.857432 133307 storage/node_liveness.go:817  [n2,hb] retrying liveness update after storage.errRetryLiveness: result is ambiguous (error=context canceled [exhausted])
W180814 05:53:17.858749 133307 internal/client/txn.go:514  [n2,hb] failure aborting transaction: node unavailable; try another peer; abort caused by: node unavailable; try another peer
W180814 05:53:17.859114 133307 storage/node_liveness.go:494  [n2,hb] failed node liveness heartbeat: node unavailable; try another peer
W180814 05:53:17.776955 133534 storage/raft_transport.go:574  [n2] while processing outgoing Raft queue to node 1: rpc error: code = Unavailable desc = transport is closing:
I180814 05:53:17.811692 137988 util/stop/stopper.go:537  quiescing; tasks left:
2      [async] gossip.infoStore: callback
1      ts.poller: poll
W180814 05:53:17.860939 133154 gossip/gossip.go:1325  [n2] no incoming or outgoing connections
I180814 05:53:17.861719 137857 util/stop/stopper.go:537  quiescing; tasks left:
1      ts.poller: poll
1      storage.intentResolver: processing intents
1      kv.DistSender: sending partial batch
1      [async] storage.replicate: processing replica
W180814 05:53:17.812716 135842 storage/raft_transport.go:574  [n2] while processing outgoing Raft queue to node 5: rpc error: code = Unavailable desc = transport is closing:
I180814 05:53:17.864916 137883 storage/replicate_queue.go:241  [replicate,n2,s2,r23/2:/Table/5{2-3}] snapshot failed: (n5,s5):?: remote failed to apply snapshot: rpc error: code = Unavailable desc = transport is closing
I180814 05:53:17.874945 137856 util/stop/stopper.go:537  quiescing; tasks left:
3      node.Node: batch
1      ts.poller: poll
1      storage.intentResolver: processing intents
1      [async] transport racer
1      [async] storage.replicate: processing replica
1      [async] storage.replicaGC: processing replica
1      [async] kv.TxnCoordSender: heartbeat loop
W180814 05:53:17.893889 133533 storage/raft_transport.go:574  [n1] while processing outgoing Raft queue to node 2: EOF:
W180814 05:53:17.850312 137742 internal/client/txn.go:514  [replicate,n1,s1,r17/1:/Table/2{0-1}] failure aborting transaction: node unavailable; try another peer; abort caused by: log-range-event: node unavailable; try another peer
E180814 05:53:18.013964 137742 storage/queue.go:788  [replicate,n1,s1,r17/1:/Table/2{0-1}] change replicas of r17 failed: log-range-event: node unavailable; try another peer
W180814 05:53:17.851651 135744 storage/raft_transport.go:574  [n5] while processing outgoing Raft queue to node 2: rpc error: code = Canceled desc = grpc: the client connection is closing:
I180814 05:53:18.015354 137856 util/stop/stopper.go:537  quiescing; tasks left:
3      node.Node: batch
1      ts.poller: poll
1      storage.intentResolver: processing intents
1      [async] transport racer
1      [async] storage.replicaGC: processing replica
1      [async] kv.TxnCoordSender: heartbeat loop
W180814 05:53:17.851830 135975 storage/raft_transport.go:574  [n5] while processing outgoing Raft queue to node 4: rpc error: code = Canceled desc = context canceled:
W180814 05:53:17.894669 133797 storage/raft_transport.go:574  [n1] while processing outgoing Raft queue to node 3: EOF:
W180814 05:53:18.017230 135880 storage/raft_transport.go:574  [n4] while processing outgoing Raft queue to node 5: rpc error: code = Canceled desc = context canceled:
E180814 05:53:18.020212 137888 storage/queue.go:788  [replicate,n4,s4,r22/4:/Table/5{1-2}] change replicas of r22 failed: r22 was not found
W180814 05:53:18.021352 135910 storage/raft_transport.go:574  [n4] while processing outgoing Raft queue to node 2: rpc error: code = Canceled desc = context canceled:
I180814 05:53:17.841807 137987 util/stop/stopper.go:537  quiescing; tasks left:
1      ts.poller: poll
1      storage.intentResolver: processing intents
1      node.Node: writing summary
1      kv.DistSender: sending partial batch
1      [async] storage.replicate: processing replica
I180814 05:53:17.914676 137988 util/stop/stopper.go:537  quiescing; tasks left:
1      ts.poller: poll
1      [async] gossip.infoStore: callback
W180814 05:53:18.044612 133586 ts/db.go:195  [n3,ts-poll] error writing time series data: failed to send RPC: sending to all 3 replicas failed; last error: <nil> context canceled
W180814 05:53:18.045124 133586 ts/db.go:198  [n3,ts-poll] node unavailable; try another peer
W180814 05:53:18.047938 137761 storage/intent_resolver.go:733  [n1,s1] failed to cleanup transaction intents: could not GC completed transaction anchored at /Local/Range/Table/23/RangeDescriptor: failed to send RPC: sending to all 3 replicas failed; last error: (err: node unavailable; try another peer) <nil>
I180814 05:53:18.048645 137856 util/stop/stopper.go:537  quiescing; tasks left:
3      node.Node: batch
1      ts.poller: poll
1      [async] transport racer
1      [async] storage.replicaGC: processing replica
1      [async] kv.TxnCoordSender: heartbeat loop
I180814 05:53:18.057328 137987 util/stop/stopper.go:537  quiescing; tasks left:
1      ts.poller: poll
1      storage.intentResolver: processing intents
1      node.Node: writing summary
1      kv.DistSender: sending partial batch
W180814 05:53:17.953391 134903 storage/raft_transport.go:574  [n4] while processing outgoing Raft queue to node 1: rpc error: code = Canceled desc = context canceled:
W180814 05:53:17.954216 134155 gossip/gossip.go:1325  [n5] no incoming or outgoing connections
W180814 05:53:17.954759 136542 storage/raft_transport.go:574  [n5] while processing outgoing Raft queue to node 3: EOF:
I180814 05:53:18.079120 137988 util/stop/stopper.go:537  quiescing; tasks left:
1      ts.poller: poll
I180814 05:53:17.954939 137857 util/stop/stopper.go:537  quiescing; tasks left:
1      ts.poller: poll
1      storage.intentResolver: processing intents
1      [async] storage.replicate: processing replica
W180814 05:53:18.081285 137835 storage/intent_resolver.go:733  [n2,s2] failed to cleanup transaction intents: failed to resolve intents: failed to send RPC: sending to all 3 replicas failed; last error: <nil> context canceled
I180814 05:53:18.081976 137857 util/stop/stopper.go:537  quiescing; tasks left:
1      ts.poller: poll
1      [async] storage.replicate: processing replica
W180814 05:53:17.964837 134687 storage/raft_transport.go:574  [n1] while processing outgoing Raft queue to node 5: rpc error: code = Canceled desc = grpc: the client connection is closing:
I180814 05:53:17.990010 137883 storage/replica_raftstorage.go:521  [replicate,n2,s2,r23/2:/Table/5{2-3}] generated preemptive snapshot a1dbba22 at index 38
W180814 05:53:18.083521 137883 rpc/nodedialer/nodedialer.go:89  [replicate,n2,s2,r23/2:/Table/5{2-3}] unable to connect to n5: unable to dial n5: breaker open
W180814 05:53:18.084606 136072 storage/raft_transport.go:574  [n4] while processing outgoing Raft queue to node 3: rpc error: code = Canceled desc = grpc: the client connection is closing:
W180814 05:53:17.990814 133756 gossip/gossip.go:1325  [n4] no incoming or outgoing connections
W180814 05:53:17.991375 134842 storage/raft_transport.go:574  [n1] while processing outgoing Raft queue to node 4: rpc error: code = Canceled desc = grpc: the client connection is closing:
W180814 05:53:18.124996 135896 storage/raft_transport.go:574  [n2] while processing outgoing Raft queue to node 4: EOF:
W180814 05:53:18.132728 133881 server/node.go:856  [n4,summaries] error recording status summaries: failed to send RPC: sending to all 3 replicas failed; last error: <nil> context canceled
I180814 05:53:18.133204 137987 util/stop/stopper.go:537  quiescing; tasks left:
1      ts.poller: poll
1      storage.intentResolver: processing intents
1      kv.DistSender: sending partial batch
W180814 05:53:18.146355 134439 ts/db.go:195  [n5,ts-poll] error writing time series data: node unavailable; try another peer
W180814 05:53:18.146819 134439 ts/db.go:198  [n5,ts-poll] node unavailable; try another peer
W180814 05:53:18.162949 137778 storage/intent_resolver.go:733  [n4,s4] failed to cleanup transaction intents: failed to resolve intents: failed to send RPC: sending to all 3 replicas failed; last error: <nil> context canceled
W180814 05:53:18.165634 136055 storage/raft_transport.go:574  [n2] while processing outgoing Raft queue to node 3: EOF:
I180814 05:53:18.179444 137987 util/stop/stopper.go:537  quiescing; tasks left:
1      ts.poller: poll
1      storage.intentResolver: processing intents
I180814 05:53:18.255250 137987 util/stop/stopper.go:537  quiescing; tasks left:
1      ts.poller: poll
I180814 05:53:18.194984 137883 storage/replicate_queue.go:241  [replicate,n2,s2,r23/2:/Table/5{2-3}] snapshot failed: unable to dial n5: breaker open
W180814 05:53:18.360237 137981 rpc/nodedialer/nodedialer.go:89  [ct-client] unable to connect to n4: unable to dial n4: breaker open
I180814 05:53:18.364041 137883 storage/replica_raftstorage.go:521  [replicate,n2,s2,r23/2:/Table/5{2-3}] generated preemptive snapshot 1d56761a at index 38
I180814 05:53:18.364792 137883 storage/replicate_queue.go:241  [replicate,n2,s2,r23/2:/Table/5{2-3}] snapshot failed: unable to dial n5: breaker open
I180814 05:53:18.383386 137839 storage/replica.go:858  [replicaGC,n1,s1,r23/1:/Table/5{2-3}] removed 7 (0+7) keys in 510ms [clear=0ms commit=510ms]
W180814 05:53:18.384355 137839 storage/store.go:1706  [replicaGC,n1,s1,r23/1:/Table/5{2-3}] unable to gossip on capacity change: node unavailable; try another peer
W180814 05:53:18.386234 136041 storage/raft_transport.go:574  [n3] while processing outgoing Raft queue to node 4: EOF:
W180814 05:53:18.412185 132480 storage/store.go:3743  [n1,s1] handle raft ready: 0.5s [processed=0]
W180814 05:53:18.432087 138019 rpc/nodedialer/nodedialer.go:89  [ct-client] unable to connect to n2: unable to dial n2: breaker open
W180814 05:53:18.483609 137936 rpc/nodedialer/nodedialer.go:89  [ct-client] unable to connect to n3: unable to dial n3: breaker open
W180814 05:53:18.568738 133649 storage/raft_transport.go:574  [n3] while processing outgoing Raft queue to node 1: rpc error: code = Canceled desc = grpc: the client connection is closing:
W180814 05:53:18.569784 136040 storage/raft_transport.go:574  [n3] while processing outgoing Raft queue to node 2: rpc error: code = Canceled desc = grpc: the client connection is closing:
I180814 05:53:18.581233 137883 storage/replica_raftstorage.go:521  [replicate,n2,s2,r23/2:/Table/5{2-3}] generated preemptive snapshot 4004024c at index 38
I180814 05:53:18.582083 137883 storage/replicate_queue.go:241  [replicate,n2,s2,r23/2:/Table/5{2-3}] snapshot failed: unable to dial n5: breaker open
W180814 05:53:18.584685 137984 rpc/nodedialer/nodedialer.go:89  [ct-client] unable to connect to n5: unable to dial n5: breaker open
W180814 05:53:18.620732 134668 storage/raft_transport.go:574  [n5] while processing outgoing Raft queue to node 1: rpc error: code = Canceled desc = context canceled:
W180814 05:53:18.621556 136521 storage/raft_transport.go:574  [n3] while processing outgoing Raft queue to node 5: rpc error: code = Canceled desc = grpc: the client connection is closing:
W180814 05:53:18.624045 138016 rpc/nodedialer/nodedialer.go:89  [ct-client] unable to connect to n2: unable to dial n2: breaker open
W180814 05:53:18.638321 137917 rpc/nodedialer/nodedialer.go:89  [ct-client] unable to connect to n5: unable to dial n5: breaker open
I180814 05:53:18.668938 137856 util/stop/stopper.go:537  quiescing; tasks left:
1      ts.poller: poll
1      [async] transport racer
W180814 05:53:18.720411 133967 rpc/nodedialer/nodedialer.go:89  [n4] unable to connect to n1: unable to dial n1: breaker open
W180814 05:53:18.721934 133400 gossip/gossip.go:1325  [n3] no incoming or outgoing connections
W180814 05:53:18.777599 137991 rpc/nodedialer/nodedialer.go:89  [ct-client] unable to connect to n2: unable to dial n2: breaker open
W180814 05:53:18.818091 137943 rpc/nodedialer/nodedialer.go:89  [ct-client] unable to connect to n3: context canceled
W180814 05:53:18.847502 137993 rpc/nodedialer/nodedialer.go:89  [ct-client] unable to connect to n1: context canceled
W180814 05:53:18.871353 137945 rpc/nodedialer/nodedialer.go:89  [ct-client] unable to connect to n4: context canceled
W180814 05:53:18.872393 137944 rpc/nodedialer/nodedialer.go:89  [ct-client] unable to connect to n2: context canceled
W180814 05:53:18.909025 133880 ts/db.go:195  [n4,ts-poll] error writing time series data: node unavailable; try another peer
W180814 05:53:18.909443 133880 ts/db.go:198  [n4,ts-poll] node unavailable; try another peer
I180814 05:53:18.941960 137883 storage/replica_raftstorage.go:521  [replicate,n2,s2,r23/2:/Table/5{2-3}] generated preemptive snapshot 155d1650 at index 38
I180814 05:53:18.942735 137883 storage/replicate_queue.go:241  [replicate,n2,s2,r23/2:/Table/5{2-3}] snapshot failed: unable to dial n5: breaker open
W180814 05:53:19.003666 137970 rpc/nodedialer/nodedialer.go:89  [ct-client] unable to connect to n4: context canceled
W180814 05:53:19.007273 137969 rpc/nodedialer/nodedialer.go:89  [ct-client] unable to connect to n3: unable to dial n3: breaker open
W180814 05:53:19.009461 137994 rpc/nodedialer/nodedialer.go:89  [ct-client] unable to connect to n4: unable to dial n4: breaker open
W180814 05:53:19.051997 137946 rpc/nodedialer/nodedialer.go:89  [ct-client] unable to connect to n3: context canceled
W180814 05:53:19.077556 133299 ts/db.go:195  [n2,ts-poll] error writing time series data: node unavailable; try another peer
W180814 05:53:19.078002 133299 ts/db.go:198  [n2,ts-poll] node unavailable; try another peer
I180814 05:53:19.078387 137857 util/stop/stopper.go:537  quiescing; tasks left:
1      [async] storage.replicate: processing replica
I180814 05:53:19.078772 134438 server/status/runtime.go:424  [n5] runtime stats: 1.4 GiB RSS, 727 goroutines, 57 MiB/57 MiB/140 MiB GO alloc/idle/total, 155 MiB/214 MiB CGO alloc/total, 310.29cgo/sec, 0.92/0.03 %(u/s)time, 0.03 %gc (4x)
I180814 05:53:19.145259 132247 server/status/runtime.go:424  [n1] runtime stats: 1.4 GiB RSS, 727 goroutines, 53 MiB/60 MiB/140 MiB GO alloc/idle/total, 155 MiB/214 MiB CGO alloc/total, 308.86cgo/sec, 0.91/0.03 %(u/s)time, 0.03 %gc (4x)
W180814 05:53:19.186450 138018 rpc/nodedialer/nodedialer.go:89  [ct-client] unable to connect to n1: unable to dial n1: breaker open
W180814 05:53:19.263796 138051 rpc/nodedialer/nodedialer.go:89  [ct-client] unable to connect to n1: unable to dial n1: breaker open
W180814 05:53:19.378988 138052 rpc/nodedialer/nodedialer.go:89  [ct-client] unable to connect to n5: unable to dial n5: breaker open
I180814 05:53:19.424553 133266 server/status/runtime.go:424  [n2] runtime stats: 1.4 GiB RSS, 727 goroutines, 47 MiB/66 MiB/140 MiB GO alloc/idle/total, 155 MiB/214 MiB CGO alloc/total, 299.58cgo/sec, 0.90/0.03 %(u/s)time, 0.03 %gc (4x)
W180814 05:53:19.575315 132248 ts/db.go:195  [n1,ts-poll] error writing time series data: node unavailable; try another peer
I180814 05:53:19.577169 137856 util/stop/stopper.go:537  quiescing; tasks left:
1      [async] transport racer
I180814 05:53:19.654361 137883 storage/replica_raftstorage.go:521  [replicate,n2,s2,r23/2:/Table/5{2-3}] generated preemptive snapshot 2d966660 at index 38
I180814 05:53:19.655521 137883 storage/replicate_queue.go:241  [replicate,n2,s2,r23/2:/Table/5{2-3}] snapshot failed: context canceled
E180814 05:53:19.655843 137883 storage/queue.go:788  [replicate,n2,s2,r23/2:/Table/5{2-3}] failed to replicate after 5 retries
W180814 05:53:20.068089 132249 server/node.go:856  [n1,summaries] error recording status summaries: node unavailable; try another peer
W180814 05:53:20.186850 134440 server/node.go:856  [n5,summaries] error recording status summaries: node unavailable; try another peer
W180814 05:53:20.291667 133300 server/node.go:856  [n2,summaries] error recording status summaries: node unavailable; try another peer
W180814 05:53:21.784414 132248 ts/db.go:198  [n1,ts-poll] node unavailable; try another peer
I180814 05:53:22.027670 132209 kv/transport_race.go:67  transport race promotion: ran 34 iterations on up to 2797 requests
--- FAIL: TestReplicateQueueRebalance (135.00s)
	soon.go:49: condition failed to evaluate within 45s: not balanced: [18 18 17 13 14]
		goroutine 132407 [running]:
		runtime/debug.Stack(0xa7a358200, 0xc420d662a0, 0x3a1c840)
			/usr/local/go/src/runtime/debug/stack.go:24 +0xb5
		github.com/cockroachdb/cockroach/pkg/testutils.SucceedsSoon(0x3a77ae0, 0xc42024d2c0, 0xc420f63800)
			/go/src/github.com/cockroachdb/cockroach/pkg/testutils/soon.go:50 +0x172
		github.com/cockroachdb/cockroach/pkg/storage_test.TestReplicateQueueRebalance(0xc42024d2c0)
			/go/src/github.com/cockroachdb/cockroach/pkg/storage/replicate_queue_test.go:90 +0x652
		testing.tRunner(0xc42024d2c0, 0x3363eb0)
			/usr/local/go/src/testing/testing.go:777 +0x16e
		created by testing.(*T).Run
			/usr/local/go/src/testing/testing.go:824 +0x565

@cockroach-teamcity cockroach-teamcity added this to the 2.1 milestone Aug 14, 2018
@cockroach-teamcity cockroach-teamcity added C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. labels Aug 14, 2018
@tbg
Copy link
Member

tbg commented Aug 14, 2018

Are we just exhausting resources by running a 5x cluster under stress with the race transport and everything? Let's see if this reoccurs.

@tbg tbg added the A-kv-client Relating to the KV client and the KV interface. label Aug 14, 2018
@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/3cdd221e01048154f705e1efb130afb42338b462

Parameters:

TAGS=
GOFLAGS=

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stressrace instead of stress and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
make stress TESTS=TestReplicateQueueRebalance PKG=github.com/cockroachdb/cockroach/pkg/storage TESTTIMEOUT=5m STRESSFLAGS='-stderr=false -maxtime 20m -timeout 10m'

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=862451&tab=buildLog

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/3cdd221e01048154f705e1efb130afb42338b462

Parameters:

TAGS=
GOFLAGS=-race

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stressrace instead of stress and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
make stress TESTS=TestReplicateQueueRebalance PKG=github.com/cockroachdb/cockroach/pkg/storage TESTTIMEOUT=5m STRESSFLAGS='-stderr=false -maxtime 20m -timeout 10m'

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=862681&tab=buildLog

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/3cdd221e01048154f705e1efb130afb42338b462

Parameters:

TAGS=
GOFLAGS=

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stressrace instead of stress and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
make stress TESTS=TestReplicateQueueRebalance PKG=github.com/cockroachdb/cockroach/pkg/storage TESTTIMEOUT=5m STRESSFLAGS='-stderr=false -maxtime 20m -timeout 10m'

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=862911&tab=buildLog

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/3cdd221e01048154f705e1efb130afb42338b462

Parameters:

TAGS=
GOFLAGS=-race

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stressrace instead of stress and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
make stress TESTS=TestReplicateQueueRebalance PKG=github.com/cockroachdb/cockroach/pkg/storage TESTTIMEOUT=5m STRESSFLAGS='-stderr=false -maxtime 20m -timeout 10m'

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=863141&tab=buildLog

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/dbf5d6ed987c2c240bcbf0bb2b514c8e2186f855

Parameters:

TAGS=
GOFLAGS=

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stressrace instead of stress and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
make stress TESTS=TestReplicateQueueRebalance PKG=github.com/cockroachdb/cockroach/pkg/storage TESTTIMEOUT=5m STRESSFLAGS='-stderr=false -maxtime 20m -timeout 10m'

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=865483&tab=buildLog

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/dbf5d6ed987c2c240bcbf0bb2b514c8e2186f855

Parameters:

TAGS=
GOFLAGS=

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stressrace instead of stress and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
make stress TESTS=TestReplicateQueueRebalance PKG=github.com/cockroachdb/cockroach/pkg/storage TESTTIMEOUT=5m STRESSFLAGS='-stderr=false -maxtime 20m -timeout 10m'

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=865943&tab=buildLog

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/dbf5d6ed987c2c240bcbf0bb2b514c8e2186f855

Parameters:

TAGS=
GOFLAGS=-race

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stressrace instead of stress and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
make stress TESTS=TestReplicateQueueRebalance PKG=github.com/cockroachdb/cockroach/pkg/storage TESTTIMEOUT=5m STRESSFLAGS='-stderr=false -maxtime 20m -timeout 10m'

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=866173&tab=buildLog

@petermattis
Copy link
Collaborator

Reproduced in ~1m with make stress PKG=./storage/ TESTS=TestReplicateQueueRebalance.

@petermattis
Copy link
Collaborator

No reproduction in 5m on ff70e8d (merged Aug 8). Time to bisect.

@petermattis
Copy link
Collaborator

@benesch Bisecting the TestReplicateQueueRebalance failure points the finger at 98ca1d0. Disabling the merge queue by default fixes the flake on current master.

petermattis added a commit to petermattis/cockroach that referenced this issue Aug 28, 2018
Temporarily disable the merge queue in this test.

Fixes cockroachdb#28563

Release note: None
petermattis added a commit to petermattis/cockroach that referenced this issue Aug 29, 2018
Disable the merge queue in this test as it interferes with the splits
that are performed manually.

Fixes cockroachdb#28563

Release note: None
craig bot pushed a commit that referenced this issue Aug 29, 2018
29221: storage: deflake TestReplicateQueueRebalance r=a-robinson a=petermattis

Temporarily disable the merge queue in this test.

Fixes #28563

Release note: None

Co-authored-by: Peter Mattis <[email protected]>
@craig craig bot closed this as completed in #29221 Aug 29, 2018
benesch pushed a commit to benesch/cockroach that referenced this issue Aug 29, 2018
Disable the merge queue in this test as it interferes with the splits
that are performed manually.

Fixes cockroachdb#28563

Release note: None
craig bot pushed a commit that referenced this issue Aug 29, 2018
29237: backport-2.1: storage: deflake TestReplicateQueueRebalance r=tschottdorf a=benesch

Backport 1/1 commits from #29221.

/cc @cockroachdb/release

---

Temporarily disable the merge queue in this test.

Fixes #28563

Release note: None


Co-authored-by: Peter Mattis <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-client Relating to the KV client and the KV interface. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot.
Projects
None yet
Development

No branches or pull requests

3 participants