Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: crazy back-and-forth between two nodes #18456

Closed
knz opened this issue Sep 12, 2017 · 3 comments
Closed

storage: crazy back-and-forth between two nodes #18456

knz opened this issue Sep 12, 2017 · 3 comments
Assignees
Milestone

Comments

@knz
Copy link
Contributor

knz commented Sep 12, 2017

Steps to reproduce:

  1. start node 1.
  2. cockroach zone set .default --disable-replication
  3. start node 2, join node 1
  4. observe logs

I get an incessant stream of the following:

On node 1:

ext=99]
I170912 20:49:10.518127 50759 storage/replica.go:3030  [n1,s1,r15/1:/Table/1{8-9}] proposing REMOVE_REPLICA((n2,s2):98): updated=[(n1,s1):1] next=99
I170912 20:49:10.522264 357 storage/allocator_scorer.go:612  [replicate,n1,s1,r14/1:/Table/1{7-8}] s1: should-rebalance(ranges-overfull): rangeCount=16, mean=8.50, overfull-threshold=9
I170912 20:49:10.522391 357 storage/replica_raftstorage.go:515  [replicate,n1,s1,r14/1:/Table/1{7-8}] generated preemptive snapshot 6946d0aa at index 514
I170912 20:49:10.522900 357 storage/store.go:3479  [replicate,n1,s1,r14/1:/Table/1{7-8}] streamed snapshot to (n2,s2):?: kv pairs: 169, log entries: 14, rate-limit: 2.0 MiB/sec, 0ms
I170912 20:49:10.527315 357 storage/replica_command.go:3646  [replicate,n1,s1,r14/1:/Table/1{7-8}] change replicas (ADD_REPLICA (n2,s2):81): read existing descriptor r14:/Table/1{7-8} [(n1,s1):1, next=81]
I170912 20:49:10.533490 50832 storage/replica.go:3030  [n1,s1,r14/1:/Table/1{7-8}] proposing ADD_REPLICA((n2,s2):81): updated=[(n1,s1):1 (n2,s2):81] next=82
I170912 20:49:10.535488 357 storage/replica_command.go:3646  [replicate,n1,s1,r14/1:/Table/1{7-8}] change replicas (REMOVE_REPLICA (n2,s2):81): read existing descriptor r14:/Table/1{7-8} [(n1,s1):1, (n2,s2):81, next=82]
I170912 20:49:10.549156 50836 storage/replica.go:3030  [n1,s1,r14/1:/Table/1{7-8}] proposing REMOVE_REPLICA((n2,s2):81): updated=[(n1,s1):1] next=82
I170912 20:49:10.552896 357 storage/allocator_scorer.go:612  [replicate,n1,s1,r4/1:/System/{NodeLive…-tsd}] s1: should-rebalance(ranges-overfull): rangeCount=16, mean=9.00, overfull-threshold=10
I170912 20:49:10.553043 357 storage/replica_raftstorage.go:515  [replicate,n1,s1,r4/1:/System/{NodeLive…-tsd}] generated preemptive snapshot 412b2d97 at index 356
I170912 20:49:10.553845 357 storage/store.go:3479  [replicate,n1,s1,r4/1:/System/{NodeLive…-tsd}] streamed snapshot to (n2,s2):?: kv pairs: 121, log entries: 4, rate-limit: 2.0 MiB/sec, 1ms
I170912 20:49:10.618841 357 storage/replica_command.go:3646  [replicate,n1,s1,r4/1:/System/{NodeLive…-tsd}] change replicas (ADD_REPLICA (n2,s2):46): read existing descriptor r4:/System/{NodeLivenessMax-tsd} [(n1,s1):1, next=46]
I170912 20:49:10.630656 50796 storage/replica.go:3030  [n1,s1,r4/1:/System/{NodeLive…-tsd}] proposing ADD_REPLICA((n2,s2):46): updated=[(n1,s1):1 (n2,s2):46] next=47
I170912 20:49:10.633891 357 storage/replica_command.go:3646  [replicate,n1,s1,r4/1:/System/{NodeLive…-tsd}] change replicas (REMOVE_REPLICA (n2,s2):46): read existing descriptor r4:/System/{NodeLivenessMax-tsd} [(n1,s1):1, (n2,s2):46, next=47]
I170912 20:49:10.648900 50801 storage/replica.go:3030  [n1,s1,r4/1:/System/{NodeLive…-tsd}] proposing REMOVE_REPLICA((n2,s2):46): updated=[(n1,s1):1] next=47
I170912 20:49:10.657168 357 storage/allocator_scorer.go:612  [replicate,n1,s1,r15/1:/Table/1{8-9}] s1: should-rebalance(ranges-overfull): rangeCount=16, mean=9.00, overfull-threshold=10
I170912 20:49:10.657309 357 storage/replica_raftstorage.go:515  [replicate,n1,s1,r15/1:/Table/1{8-9}] generated preemptive snapshot 8650a2d0 at index 625
I170912 20:49:10.657914 357 storage/store.go:3479  [replicate,n1,s1,r15/1:/Table/1{8-9}] streamed snapshot to (n2,s2):?: kv pairs: 205, log entries: 16, rate-limit: 2.0 MiB/sec, 1ms
I170912 20:49:10.663800 357 storage/replica_command.go:3646  [replicate,n1,s1,r15/1:/Table/1{8-9}] change replicas (ADD_REPLICA (n2,s2):99): read existing descriptor r15:/Table/1{8-9} [(n1,s1):1, next=99]
I170912 20:49:10.670069 50912 storage/replica.go:3030  [n1,s1,r15/1:/Table/1{8-9}] proposing ADD_REPLICA((n2,s2):99): updated=[(n1,s1):1 (n2,s2):99] next=100
I170912 20:49:10.671923 357 storage/replica_command.go:3646  [replicate,n1,s1,r15/1:/Table/1{8-9}] change replicas (REMOVE_REPLICA (n2,s2):99): read existing descriptor r15:/Table/1{8-9} [(n1,s1):1, (n2,s2):99, next=100]
I170912 20:49:10.704074 50880 storage/replica.go:3030  [n1,s1,r15/1:/Table/1{8-9}] proposing REMOVE_REPLICA((n2,s2):99): updated=[(n1,s1):1] next=100
I170912 20:49:10.707914 357 storage/allocator_scorer.go:612  [replicate,n1,s1,r14/1:/Table/1{7-8}] s1: should-rebalance(ranges-overfull): rangeCount=16, mean=9.00, overfull-threshold=10
I170912 20:49:10.708049 357 storage/replica_raftstorage.go:515  [replicate,n1,s1,r14/1:/Table/1{7-8}] generated preemptive snapshot 40db99ed at index 520
I170912 20:49:10.708627 357 storage/store.go:3479  [replicate,n1,s1,r14/1:/Table/1{7-8}] streamed snapshot to (n2,s2):?: kv pairs: 171, log entries: 20, rate-limit: 2.0 MiB/sec, 1ms
I170912 20:49:10.718660 357 storage/replica_command.go:3646  [replicate,n1,s1,r14/1:/Table/1{7-8}] change replicas (ADD_REPLICA (n2,s2):82): read existing descriptor r14:/Table/1{7-8} [(n1,s1):1, next=82]
I170912 20:49:10.726088 50937 storage/replica.go:3030  [n1,s1,r14/1:/Table/1{7-8}] proposing ADD_REPLICA((n2,s2):82): updated=[(n1,s1):1 (n2,s2):82] next=83
I170912 20:49:10.728278 357 storage/replica_command.go:3646  [replicate,n1,s1,r14/1:/Table/1{7-8}] change replicas (REMOVE_REPLICA (n2,s2):82): read existing descriptor r14:/Table/1{7-8} [(n1,s1):1, (n2,s2):82, next=83]

On node 2:

I170912 20:47:17.160424 3552 storage/replica_raftstorage.go:722  [n2,s2,r5/?:{-}] applying preemptive snapshot at index 204 (id=ebd445f1, encoded size=220332, 1 rocksdb batches, 26 log entries)
I170912 20:47:17.168466 3552 storage/replica_raftstorage.go:728  [n2,s2,r5/?:/System/ts{d-e}] applied preemptive snapshot in 8ms [clear=3ms batch=0ms entries=0ms commit=5ms]
I170912 20:47:17.168552 232 storage/store.go:2159  [replicaGC,n2,s2,r4/12:/System/{NodeLive…-tsd}] removing replica
I170912 20:47:17.170563 232 storage/replica.go:760  [replicaGC,n2,s2,r4/12:/System/{NodeLive…-tsd}] removed 28 (16+12) keys in 2ms [clear=0ms commit=2ms]
I170912 20:47:17.598221 2799 storage/replica_raftstorage.go:722  [n2,s2,r10/28:/Table/1{3-4}] applying preemptive snapshot at index 2119 (id=28b4dda8, encoded size=1007322, 4 rocksdb batches, 117 log entries)
I170912 20:47:17.617188 2799 storage/replica_raftstorage.go:728  [n2,s2,r10/28:/Table/1{3-4}] applied preemptive snapshot in 19ms [clear=1ms batch=0ms entries=0ms commit=17ms]
I170912 20:47:17.617333 232 storage/store.go:2159  [replicaGC,n2,s2,r14/46:/Table/1{7-8}] removing replica
I170912 20:47:17.620664 232 storage/replica.go:760  [replicaGC,n2,s2,r14/46:/Table/1{7-8}] removed 11 (0+11) keys in 3ms [clear=2ms commit=2ms]
I170912 20:47:17.643738 959 storage/replica_raftstorage.go:722  [n2,s2,r1/?:{-}] applying preemptive snapshot at index 2177 (id=f4f4bf49, encoded size=54692, 1 rocksdb batches, 8 log entries)
I170912 20:47:17.650326 959 storage/replica_raftstorage.go:728  [n2,s2,r1/?:/{Min-System/}] applied preemptive snapshot in 7ms [clear=3ms batch=0ms entries=0ms commit=4ms]
I170912 20:47:17.682645 232 storage/store.go:2159  [replicaGC,n2,s2,r5/28:/System/ts{d-e}] removing replica
I170912 20:47:17.685073 3560 storage/replica_raftstorage.go:722  [n2,s2,r2/?:{-}] applying preemptive snapshot at index 400 (id=4908312c, encoded size=16006, 1 rocksdb batches, 14 log entries)
I170912 20:47:17.694603 232 storage/replica.go:760  [replicaGC,n2,s2,r5/28:/System/ts{d-e}] removed 419 (407+12) keys in 12ms [clear=1ms commit=10ms]
I170912 20:47:17.706806 3560 storage/replica_raftstorage.go:728  [n2,s2,r2/?:/System/{-NodeLive…}] applied preemptive snapshot in 22ms [clear=4ms batch=0ms entries=0ms commit=17ms]
I170912 20:47:17.736017 3563 storage/replica_raftstorage.go:722  [n2,s2,r8/?:{-}] applying preemptive snapshot at index 270 (id=b5115ee2, encoded size=46397, 1 rocksdb batches, 67 log entries)
I170912 20:47:17.739392 3563 storage/replica_raftstorage.go:728  [n2,s2,r8/?:/Table/1{1-2}] applied preemptive snapshot in 3ms [clear=1ms batch=0ms entries=0ms commit=2ms]
I170912 20:47:17.748795 232 storage/store.go:2159  [replicaGC,n2,s2,r10/29:/Table/1{3-4}] removing replica
I170912 20:47:17.752876 232 storage/replica.go:760  [replicaGC,n2,s2,r10/29:/Table/1{3-4}] removed 4971 (4960+11) keys in 4ms [clear=2ms commit=2ms]
I170912 20:47:17.783663 3619 storage/replica_raftstorage.go:722  [n2,s2,r5/?:{-}] applying preemptive snapshot at index 210 (id=8ae520af, encoded size=224428, 1 rocksdb batches, 32 log entries)
I170912 20:47:17.803480 3619 storage/replica_raftstorage.go:728  [n2,s2,r5/?:/System/ts{d-e}] applied preemptive snapshot in 20ms [clear=14ms batch=0ms entries=0ms commit=5ms]
I170912 20:47:17.807295 232 storage/store.go:2159  [replicaGC,n2,s2,r2/61:/System/{-NodeLive…}] removing replica
I170912 20:47:17.814419 232 storage/replica.go:760  [replicaGC,n2,s2,r2/61:/System/{-NodeLive…}] removed 10 (0+10) keys in 7ms [clear=2ms commit=5ms]
I170912 20:47:18.235015 2737 storage/replica_raftstorage.go:722  [n2,s2,r10/?:{-}] applying preemptive snapshot at index 2144 (id=7c99425d, encoded size=916513, 4 rocksdb batches, 26 log entries)
I170912 20:47:18.265172 2737 storage/replica_raftstorage.go:728  [n2,s2,r10/?:/Table/1{3-4}] applied preemptive snapshot in 30ms [clear=14ms batch=0ms entries=0ms commit=16ms]
I170912 20:47:18.265317 232 storage/store.go:2159  [replicaGC,n2,s2,r8/42:/Table/1{1-2}] removing replica
I170912 20:47:18.268229 232 storage/replica.go:760  [replicaGC,n2,s2,r8/42:/Table/1{1-2}] removed 11 (0+11) keys in 3ms [clear=1ms commit=2ms]
I170912 20:47:18.295662 3652 storage/replica_raftstorage.go:722  [n2,s2,r1/24:/{Min-System/}] applying preemptive snapshot at index 2199 (id=3c74bac3, encoded size=67379, 1 rocksdb batches, 30 log entries)
I170912 20:47:18.309525 3652 storage/replica_raftstorage.go:728  [n2,s2,r1/24:/{Min-System/}] applied preemptive snapshot in 14ms [clear=3ms batch=0ms entries=0ms commit=11ms]
I170912 20:47:18.328922 232 storage/store.go:2159  [replicaGC,n2,s2,r5/29:/System/ts{d-e}] removing replica
I170912 20:47:18.336520 232 storage/replica.go:760  [replicaGC,n2,s2,r5/29:/System/ts{d-e}] removed 419 (407+12) keys in 8ms [clear=1ms commit=6ms]
I170912 20:47:18.344505 3638 storage/replica_raftstorage.go:722  [n2,s2,r2/?:{-}] applying preemptive snapshot at index 406 (id=6a6b15c6, encoded size=20128, 1 rocksdb batches, 20 log entries)
I170912 20:47:18.351560 3638 storage/replica_raftstorage.go:728  [n2,s2,r2/?:/System/{-NodeLive…}] applied preemptive snapshot in 7ms [clear=4ms batch=0ms entries=0ms commit=3ms]
I170912 20:47:18.388955 232 storage/store.go:2159  [replicaGC,n2,s2,r10/30:/Table/1{3-4}] removing replica
I170912 20:47:18.396092 232 storage/replica.go:760  [replicaGC,n2,s2,r10/30:/Table/1{3-4}] removed 5021 (5010+11) keys in 7ms [clear=2ms commit=5ms]
I170912 20:47:18.401454 3641 storage/replica_raftstorage.go:722  [n2,s2,r8/?:{-}] applying preemptive snapshot at index 276 (id=3abf3184, encoded size=50355, 1 rocksdb batches, 73 log entries)
I170912 20:47:18.419497 3641 storage/replica_raftstorage.go:728  [n2,s2,r8/?:/Table/1{1-2}] applied preemptive snapshot in 18ms [clear=14ms batch=0ms entries=0ms commit=3ms]
I170912 20:47:18.448174 232 storage/store.go:2159  [replicaGC,n2,s2,r1/25:/{Min-System/}] removing replica
I170912 20:47:18.473763 232 storage/replica.go:760  [replicaGC,n2,s2,r1/25:/{Min-System/}] removed 27 (17+10) keys in 26ms [clear=17ms commit=9ms]
I170912 20:47:18.528193 232 storage/store.go:2159  [replicaGC,n2,s2,r8/43:/Table/1{1-2}] removing replica
I170912 20:47:18.545605 232 storage/replica.go:760  [replicaGC,n2,s2,r8/43:/Table/1{1-2}] removed 11 (0+11) keys in 17ms [clear=14ms commit=3ms]
I170912 20:47:18.599858 232 storage/store.go:2159  [replicaGC,n2,s2,r2/62:/System/{-NodeLive…}] removing replica
I170912 20:47:18.620865 232 storage/replica.go:760  [replicaGC,n2,s2,r2/62:/System/{-NodeLive…}] removed 10 (0+10) keys in 21ms [clear=18ms commit=3ms]
@knz
Copy link
Contributor Author

knz commented Sep 12, 2017

@a-robinson @tschottdorf any suggestion?

@knz
Copy link
Contributor Author

knz commented Sep 12, 2017

This flurry stops on node 1 if I stop node 2, and then starts again if I restart node 2.

In debug range ls, the ranges appear to not migrate from node 1 to node 2.

@a-robinson
Copy link
Contributor

Heh, thanks @knz. When I run this locally the thrashing is due to the same issue as #17879, so it's possible the fix for that isn't working right. I'll investigate.

@a-robinson a-robinson self-assigned this Sep 12, 2017
a-robinson added a commit to a-robinson/cockroach that referenced this issue Sep 12, 2017
Added by 54618ea. This is what I get
for testing out the initial version of the PR but not the final version.

Fixes cockroachdb#18456
@a-robinson a-robinson added this to the 1.1 milestone Sep 12, 2017
a-robinson added a commit to a-robinson/cockroach that referenced this issue Sep 13, 2017
Added by 54618ea. This is what I get
for testing out the initial version of the PR but not the final version.

Fixes cockroachdb#18456
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants