Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stability: raft panic after process death #9037

Closed
mberhault opened this issue Sep 1, 2016 · 23 comments
Closed

stability: raft panic after process death #9037

mberhault opened this issue Sep 1, 2016 · 23 comments
Assignees

Comments

@mberhault
Copy link
Contributor

mberhault commented Sep 1, 2016

rho-1 (104.196.18.63) was killed uncleanly (I was trying to replace supervisord with a newer version and reset the machine to get around wedged systemctl).

build sha: 1a34ca3 with race-detector enabled.

After the machine came back, cockroach now dies at startup with:

panic: store=1:1 range=1078 [/Table/55/1/2206910991923582422/"870569b6-bf23-49b0-b5a7-da8a4f8a1d2a"/1537136-/Table/55/1/2216032119155182690/"cf1a193d-401b-4c12-917a-41800f7d85b3"/122879): tocommit(20055) is out of range [lastIndex(20053)]. Was the raft log corrupted, truncated, or lost? [recovered]
        panic: store=1:1 range=1078 [/Table/55/1/2206910991923582422/"870569b6-bf23-49b0-b5a7-da8a4f8a1d2a"/1537136-/Table/55/1/2216032119155182690/"cf1a193d-401b-4c12-917a-41800f7d85b3"/122879): tocommit(20055) is out of range [lastIndex(20053)]. Was the raft log corrupted, truncated, or lost?

goroutine 308 [running]:
panic(0x1abeae0, 0xc420e09c80)
        /usr/local/go/src/runtime/panic.go:500 +0x1ae
github.com/cockroachdb/cockroach/cli.initBacktrace.func2(0x1abeae0, 0xc420e09c80)
        /go/src/github.com/cockroachdb/cockroach/cli/backtrace.go:98 +0xe8
github.com/cockroachdb/cockroach/util/stop.(*Stopper).Recover(0xc42045c510)
        /go/src/github.com/cockroachdb/cockroach/util/stop/stopper.go:173 +0xb1
panic(0x1abeae0, 0xc420e09c80)
        /usr/local/go/src/runtime/panic.go:458 +0x271
github.com/cockroachdb/cockroach/storage.(*raftLogger).Panicf(0xc420e09620, 0x1ca21e8, 0x5d, 0xc420a249c0, 0x2, 0x2)
        /go/src/github.com/cockroachdb/cockroach/storage/raft.go:121 +0x225
github.com/coreos/etcd/raft.(*raftLog).commitTo(0xc420d2cc40, 0x4e57)
        /go/src/github.com/coreos/etcd/raft/log.go:191 +0x208
github.com/coreos/etcd/raft.(*raft).handleHeartbeat(0xc420dca960, 0x8, 0x1e, 0x1b, 0x4bb, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /go/src/github.com/coreos/etcd/raft/raft.go:916 +0x7a
github.com/coreos/etcd/raft.stepFollower(0xc420dca960, 0x8, 0x1e, 0x1b, 0x4bb, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /go/src/github.com/coreos/etcd/raft/raft.go:855 +0x1ed4
github.com/coreos/etcd/raft.(*raft).Step(0xc420dca960, 0x8, 0x1e, 0x1b, 0x4bb, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /go/src/github.com/coreos/etcd/raft/raft.go:633 +0xc0
github.com/coreos/etcd/raft.(*RawNode).Step(0xc4212d3400, 0x8, 0x1e, 0x1b, 0x4bb, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /go/src/github.com/coreos/etcd/raft/rawnode.go:180 +0x17d
github.com/cockroachdb/cockroach/storage.(*Store).HandleRaftRequest.func2(0xc4212d3400, 0x0, 0x0)
        /go/src/github.com/cockroachdb/cockroach/storage/store.go:2388 +0xb3
github.com/cockroachdb/cockroach/storage.(*Replica).withRaftGroupLocked(0xc420902900, 0xc42124b7f0, 0xc420902960, 0xc)
        /go/src/github.com/cockroachdb/cockroach/storage/replica.go:379 +0x172
github.com/cockroachdb/cockroach/storage.(*Replica).withRaftGroup(0xc420902900, 0xc42124b7f0, 0x0, 0x0)
        /go/src/github.com/cockroachdb/cockroach/storage/replica.go:388 +0x98
github.com/cockroachdb/cockroach/storage.(*Store).HandleRaftRequest(0xc42030cc80, 0x7f55bba4a280, 0xc4208eca50, 0xc420b86f00, 0x0)
        /go/src/github.com/cockroachdb/cockroach/storage/store.go:2389 +0x146c
github.com/cockroachdb/cockroach/storage.(*RaftTransport).handleRaftRequest(0xc420450800, 0x7f55bba4a280, 0xc4208eca50, 0xc420b86f00, 0x0)
        /go/src/github.com/cockroachdb/cockroach/storage/raft_transport.go:236 +0x13d
github.com/cockroachdb/cockroach/storage.(*RaftTransport).RaftMessage.func1.1.1(0x2804e00, 0xc420f8a2c0, 0xc420450800, 0x671700, 0xc421072f88)
        /go/src/github.com/cockroachdb/cockroach/storage/raft_transport.go:285 +0xdf
github.com/cockroachdb/cockroach/storage.(*RaftTransport).RaftMessage.func1.1()
        /go/src/github.com/cockroachdb/cockroach/storage/raft_transport.go:292 +0x61
github.com/cockroachdb/cockroach/util/stop.(*Stopper).RunWorker.func1(0xc42045c510, 0xc4208ecab0)
        /go/src/github.com/cockroachdb/cockroach/util/stop/stopper.go:187 +0x8b
created by github.com/cockroachdb/cockroach/util/stop.(*Stopper).RunWorker
        /go/src/github.com/cockroachdb/cockroach/util/stop/stopper.go:188 +0x74

Here is the log, including the last few attempts to start it and the initial death around I160901 16:46:15
cockroach.stderr.txt

The process is currently down, so feel free to poke around.

@tbg
Copy link
Member

tbg commented Sep 1, 2016

Could be related to #9029.

bdarnell added a commit to bdarnell/cockroach that referenced this issue Sep 6, 2016
This reverts commit a5ce7dd.

Moving to 4.9 introduced some bugs: cockroachdb#9029 and possibly cockroachdb#9037. Reverting
pending investigation.
@tbg tbg assigned bdarnell and unassigned tbg Sep 6, 2016
@tbg
Copy link
Member

tbg commented Sep 6, 2016

Assigning to @bdarnell as I believe he's looking into this at the moment.

@mberhault
Copy link
Contributor Author

still poking around? There are races and panics I'd like to investigate, but I don't want to disrupt things.

@tbg
Copy link
Member

tbg commented Sep 6, 2016

discussed offline: we should retain a copy of the failing data's node but there seems to be no good reason to not wipe it after.

@bdarnell
Copy link
Contributor

bdarnell commented Sep 7, 2016

This panic means that the leader believes this node has log entry 20055 (i.e. it has sent a MsgAppResp for this entry), but the node only has up to 20053 (and the data on disk appears to be consistent with this). It doesn't look like this node was recently added to the range, so snapshots shouldn't be involved.

handleRaftReady commits the batch writing new log entries before sending any raft messages. The most straightforward explanation is that batch.Commit does not result in the data immediately and synchronously persisted (note that in this instance it doesn't matter when the data would make it all the way to the physical disk. All we require here is that the data be given to the kernel so that after our process is kill -9ed the kernel will finish the write Reading the original again, it looks like this is a hardware reset). This could be a bug in rocksdb 4.9 (given the suspicious timing of the upgrade and the possibly-related panic in #9029), or it could be an error in the way we configure rocksdb.

I don't think there's more we could get out of the data directory. The one thing that would be interesting is to look at the rocksdb WAL to see if the missing log entries are there, but that log has already been compacted away by the restart attempts and/or my use of debugging tools.

@bdarnell
Copy link
Contributor

bdarnell commented Sep 7, 2016

Looking at the logs between rocksdb 4.8 and 4.9:

facebook/rocksdb@0460e9dcc is immediately suspicious, since it changes the WriteBatch format, although it looks like it doesn't affect batches that don't use the new two-phase commit features. (note that there is another version of this commit in the history which was more intrusive, but it was subsequently reverted). And we have a test (storage/engine/batch_test.go) which is supposed to detect any relevant changes to the batch encoding.

@bdarnell
Copy link
Contributor

bdarnell commented Sep 7, 2016

I don't think we can blame this on rocksdb 4.9 - we don't have enough history with unclean shutdowns to say that they were working fine before. We could put rocksdb in a stricter mode which would cause it to fail at startup instead of shortly after startup.

The root cause is either that we're not fsyncing when we're supposed to, or that fsync isn't going all the way to the disk (whatever that means in the GCE environment. FWIW, this machine is using the "local SSD" option with an ext4 filesystem). As far as I can tell, all the appropriate durability options are set by default and rocksdb is calling fdatasync on every write (We could optionally call fsync instead, which improves consistency on ext3. It's not clear to me whether it helps on ext4). There are various reports of unsynced data with ext4; some of which refer to near-future fixes (in posts that are years old), so I'm not sure if we need to do anything at the filesystem level to ensure durability.

@tamird
Copy link
Contributor

tamird commented Nov 10, 2016

I am occasionally able to reproduce this on GCE after applying #10583:

make stress STRESSFLAGS='-p 100 -stderr -maxfails 1' TESTTIMEOUT=15s PKG=./pkg/storage TESTS=TestRaftRemoveRace
go list -tags '' -f \
'go test -v  -tags '\'''\'' -ldflags '\'''\'' -i -c {{.ImportPath}} -o {{.Dir}}/stress.test && (cd {{.Dir}} && if [ -f stress.test ]; then stress -p 100 -stderr -maxfails 1 ./stress.test -test.run '\''TestRaftRemoveRace'\'' -test.timeout 15s ; fi)' ./pkg/storage | \
/bin/bash
github.com/cockroachdb/cockroach/pkg/storage
github.com/cockroachdb/cockroach/pkg/storage_test
testmain
112 runs so far, 0 failures, over 5s
343 runs so far, 0 failures, over 10s
572 runs so far, 0 failures, over 15s
787 runs so far, 0 failures, over 20s
1004 runs so far, 0 failures, over 25s
1222 runs so far, 0 failures, over 30s

I161110 09:49:47.885454 1 rand.go:76  Random seed: -3911018962390461145
I161110 09:49:47.921506 22 storage/engine/rocksdb.go:340  opening in memory rocksdb instance
I161110 09:49:47.923307 22 gossip/gossip.go:237  [n?] initial resolvers: []
W161110 09:49:47.923381 22 gossip/gossip.go:1055  [n?] no resolvers found; use --join to specify a connected node
I161110 09:49:47.923462 22 base/node_id.go:62  NodeID set to 1
I161110 09:49:47.929146 22 storage/store.go:1183  [n1,s1]: failed initial metrics computation: [n1,s1]: system config not yet available
I161110 09:49:47.929289 22 gossip/gossip.go:280  [n1] NodeDescriptor set to node_id:1 address:<network_field:"tcp" address_field:"127.0.0.1:37168" > attrs:<> locality:<>
I161110 09:49:47.931049 60 storage/replica_proposal.go:328  [s1,r1/1:/M{in-ax}] new range lease replica {1 1 1} 1970-01-01 00:00:00 +0000 UTC 900.000124ms following replica {0 0 0} 1970-01-01 00:00:00 +0000 UTC 0s [physicalTime=1970-01-01 00:00:00.000000123 +0000 UTC]
I161110 09:49:47.931775 22 storage/engine/rocksdb.go:340  opening in memory rocksdb instance
I161110 09:49:47.932592 22 gossip/gossip.go:237  [n?] initial resolvers: [127.0.0.1:37168]
W161110 09:49:47.932633 22 gossip/gossip.go:1057  [n?] no incoming or outgoing connections
I161110 09:49:47.932677 22 base/node_id.go:62  NodeID set to 2
I161110 09:49:47.941497 22 storage/store.go:1183  [n2,s2]: failed initial metrics computation: [n2,s2]: system config not yet available
I161110 09:49:47.941572 22 gossip/gossip.go:280  [n2] NodeDescriptor set to node_id:2 address:<network_field:"tcp" address_field:"127.0.0.1:45510" > attrs:<> locality:<>
I161110 09:49:47.941970 244 gossip/client.go:125  [n2] started gossip client to 127.0.0.1:37168
I161110 09:49:47.942391 22 storage/engine/rocksdb.go:340  opening in memory rocksdb instance
I161110 09:49:47.946449 22 gossip/gossip.go:237  [n?] initial resolvers: [127.0.0.1:37168]
W161110 09:49:47.946478 22 gossip/gossip.go:1057  [n?] no incoming or outgoing connections
I161110 09:49:47.946524 22 base/node_id.go:62  NodeID set to 3
I161110 09:49:48.071286 330 gossip/client.go:125  [n3] started gossip client to 127.0.0.1:37168
I161110 09:49:48.085492 22 storage/store.go:1183  [n3,s3]: failed initial metrics computation: [n3,s3]: system config not yet available
I161110 09:49:48.085585 22 gossip/gossip.go:280  [n3] NodeDescriptor set to node_id:3 address:<network_field:"tcp" address_field:"127.0.0.1:46514" > attrs:<> locality:<>
I161110 09:49:48.130383 22 storage/replica_raftstorage.go:445  [s1,r1/1:/M{in-ax}] generated snapshot 16f1958a at index 16 in 53.722µs.
I161110 09:49:48.131175 22 storage/store.go:3128  [s1,r1/1:/M{in-ax}] streamed snapshot: kv pairs: 35, log entries: 6
I161110 09:49:48.131723 169 storage/store.go:1908  [s2] ADD 1
I161110 09:49:48.132048 169 storage/replica_raftstorage.go:589  [s2,r1/?:{-}] applying preemptive snapshot at index 16 (id=16f1958a, encoded size=16, 1 rocksdb batches, 6 log entries)
I161110 09:49:48.149401 169 storage/replica_raftstorage.go:592  [s2,r1/?:/M{in-ax}] applied preemptive snapshot in 0.017s
I161110 09:49:48.149477 169 storage/store.go:2814  [s2,r1/?:/M{in-ax}] DEL 1
I161110 09:49:48.151161 22 storage/replica_command.go:3244  [s1,r1/1:/M{in-ax}] change replicas: read existing descriptor range_id:1 start_key:"" end_key:"\377\377" replicas:<node_id:1 store_id:1 replica_id:1 > next_replica_id:2
I161110 09:49:48.152218 419 storage/replica.go:2052  [s1,r1/1:/M{in-ax}] proposing ADD_REPLICA {NodeID:2 StoreID:2 ReplicaID:2}: [{NodeID:1 StoreID:1 ReplicaID:1} {NodeID:2 StoreID:2 ReplicaID:2}]
I161110 09:49:48.153243 22 storage/replica_raftstorage.go:445  [s1,r1/1:/M{in-ax}] generated snapshot 860e21e3 at index 19 in 43.601µs.
I161110 09:49:48.154960 22 storage/store.go:3128  [s1,r1/1:/M{in-ax}] streamed snapshot: kv pairs: 39, log entries: 9
I161110 09:49:48.169646 405 storage/store.go:1908  [s3] ADD 1
I161110 09:49:48.169850 405 storage/replica_raftstorage.go:589  [s3,r1/?:{-}] applying preemptive snapshot at index 19 (id=860e21e3, encoded size=16, 1 rocksdb batches, 9 log entries)
I161110 09:49:48.170865 405 storage/replica_raftstorage.go:592  [s3,r1/?:/M{in-ax}] applied preemptive snapshot in 0.001s
I161110 09:49:48.170896 405 storage/store.go:2814  [s3,r1/?:/M{in-ax}] DEL 1
I161110 09:49:48.171604 22 storage/replica_command.go:3244  [s1,r1/1:/M{in-ax}] change replicas: read existing descriptor range_id:1 start_key:"" end_key:"\377\377" replicas:<node_id:1 store_id:1 replica_id:1 > replicas:<node_id:2 store_id:2 replica_id:2 > next_replica_id:3
I161110 09:49:48.172803 390 storage/raft_transport.go:432  raft transport stream to node 1 established
I161110 09:49:48.189841 412 storage/replica.go:2052  [s1,r1/1:/M{in-ax}] proposing ADD_REPLICA {NodeID:3 StoreID:3 ReplicaID:3}: [{NodeID:1 StoreID:1 ReplicaID:1} {NodeID:2 StoreID:2 ReplicaID:2} {NodeID:3 StoreID:3 ReplicaID:3}]
I161110 09:49:48.199265 22 storage/replica_command.go:3244  [s1,r1/1:/M{in-ax}] change replicas: read existing descriptor range_id:1 start_key:"" end_key:"\377\377" replicas:<node_id:1 store_id:1 replica_id:1 > replicas:<node_id:2 store_id:2 replica_id:2 > replicas:<node_id:3 store_id:3 replica_id:3 > next_replica_id:4
I161110 09:49:48.200800 415 storage/raft_transport.go:432  raft transport stream to node 1 established
I161110 09:49:48.399525 375 storage/replica.go:2052  [s1,r1/1:/M{in-ax}] proposing REMOVE_REPLICA {NodeID:3 StoreID:3 ReplicaID:3}: [{NodeID:1 StoreID:1 ReplicaID:1} {NodeID:2 StoreID:2 ReplicaID:2}]
W161110 09:49:48.400132 257 storage/store.go:781  storeMu: mutex held by github.com/cockroachdb/cockroach/pkg/storage.(*Store).LookupReplica for 132.03433ms (>100ms):
goroutine 257 [running]:
runtime/debug.Stack(0x1c00738, 0x1c00778, 0x43)
    /home/tamird/goroot/src/runtime/debug/stack.go:24 +0x79
github.com/cockroachdb/cockroach/pkg/util/syncutil.ThresholdLogger.func1(0x7deaf1a)
    /home/tamird/go/src/github.com/cockroachdb/cockroach/pkg/util/syncutil/timedmutex.go:65 +0xe1
github.com/cockroachdb/cockroach/pkg/util/syncutil.(*TimedMutex).Unlock(0xc4201d82c8)
    /home/tamird/go/src/github.com/cockroachdb/cockroach/pkg/util/syncutil/timedmutex.go:92 +0x80
github.com/cockroachdb/cockroach/pkg/storage.(*Store).LookupReplica(0xc4201d8000, 0x272f5c0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc4204dfc00)
    /home/tamird/go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:1476 +0x1f6
github.com/cockroachdb/cockroach/pkg/storage_test.(*multiTestContext).FirstRange.func1(0xc4201d8000, 0xc4201f05a0, 0xc420771558)
    /home/tamird/go/src/github.com/cockroachdb/cockroach/pkg/storage/client_test.go:550 +0x77
github.com/cockroachdb/cockroach/pkg/storage.(*Stores).VisitStores(0xc4200b6a20, 0xc420771628, 0xc4202946a0, 0x30)
    /home/tamird/go/src/github.com/cockroachdb/cockroach/pkg/storage/stores.go:131 +0x219
github.com/cockroachdb/cockroach/pkg/storage_test.(*multiTestContext).FirstRange(0xc4202381a0, 0x0, 0x0, 0x0)
    /home/tamird/go/src/github.com/cockroachdb/cockroach/pkg/storage/client_test.go:555 +0x276
github.com/cockroachdb/cockroach/pkg/kv.(*rangeDescriptorCache).performRangeLookup(0xc4206fc820, 0x7fae83002a38, 0xc4203852f0, 0xc420377080, 0xd, 0xd, 0x0, 0x0, 0x0, 0x0, ...)
    /home/tamird/go/src/github.com/cockroachdb/cockroach/pkg/kv/range_cache.go:428 +0x2db
github.com/cockroachdb/cockroach/pkg/kv.(*rangeDescriptorCache).lookupRangeDescriptorInternal(0xc4206fc820, 0x7fae83002a38, 0xc4203852f0, 0xc420377080, 0xd, 0xd, 0x0, 0x0, 0x0, 0x0, ...)
    /home/tamird/go/src/github.com/cockroachdb/cockroach/pkg/kv/range_cache.go:327 +0x518
github.com/cockroachdb/cockroach/pkg/kv.(*rangeDescriptorCache).LookupRangeDescriptor(0xc4206fc820, 0x7fae83002a38, 0xc4203852f0, 0xc420377080, 0xd, 0xd, 0x0, 0x0, 0x8, 0xc637d400df8278e9, ...)
    /home/tamird/go/src/github.com/cockroachdb/cockroach/pkg/kv/range_cache.go:266 +0x92
github.com/cockroachdb/cockroach/pkg/kv.(*rangeDescriptorCache).performRangeLookup(0xc4206fc820, 0x7fae83002a38, 0xc4203852f0, 0xc4203753e0, 0xc, 0x14, 0x0, 0x0, 0x0, 0x0, ...)
    /home/tamird/go/src/github.com/cockroachdb/cockroach/pkg/kv/range_cache.go:437 +0x55f
github.com/cockroachdb/cockroach/pkg/kv.(*rangeDescriptorCache).lookupRangeDescriptorInternal(0xc4206fc820, 0x7fae83002a38, 0xc4203852f0, 0xc4203753e0, 0xc, 0x14, 0x0, 0xc420377000, 0x0, 0x0, ...)
    /home/tamird/go/src/github.com/cockroachdb/cockroach/pkg/kv/range_cache.go:327 +0x518
github.com/cockroachdb/cockroach/pkg/kv.(*rangeDescriptorCache).LookupRangeDescriptor(0xc4206fc820, 0x7fae83002a38, 0xc4203852f0, 0xc4203753e0, 0xc, 0x14, 0x0, 0x0, 0x16eee80, 0xc420377030, ...)
    /home/tamird/go/src/github.com/cockroachdb/cockroach/pkg/kv/range_cache.go:266 +0x92
github.com/cockroachdb/cockroach/pkg/kv.(*DistSender).getDescriptor(0xc42067c370, 0x7fae83002a38, 0xc4203852f0, 0xc4203753e0, 0xc, 0x14, 0x0, 0xc42062b300, 0x2faf080, 0x12a05f200, ...)
    /home/tamird/go/src/github.com/cockroachdb/cockroach/pkg/kv/dist_sender.go:401 +0x8a
github.com/cockroachdb/cockroach/pkg/kv.(*RangeIterator).Seek(0xc4207726e0, 0x7fae83002a38, 0xc4203852f0, 0xc4203753e0, 0xc, 0x14)
    /home/tamird/go/src/github.com/cockroachdb/cockroach/pkg/kv/range_iter.go:132 +0x2a1
github.com/cockroachdb/cockroach/pkg/kv.(*DistSender).divideAndSendBatchToRanges(0xc42067c370, 0x7fae83002a38, 0xc4203852f0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc420414300, ...)
    /home/tamird/go/src/github.com/cockroachdb/cockroach/pkg/kv/dist_sender.go:670 +0x197
github.com/cockroachdb/cockroach/pkg/kv.(*DistSender).Send(0xc42067c370, 0x7fae83002a38, 0xc4203852f0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc420414300, ...)
    /home/tamird/go/src/github.com/cockroachdb/cockroach/pkg/kv/dist_sender.go:576 +0x2b3
github.com/cockroachdb/cockroach/pkg/kv.(*TxnCoordSender).Send(0xc4200b8960, 0x7fae83002a38, 0xc4203852f0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc420414240, ...)
    /home/tamird/go/src/github.com/cockroachdb/cockroach/pkg/kv/txn_coord_sender.go:414 +0x7cc
github.com/cockroachdb/cockroach/pkg/internal/client.(*Txn).sendInternal(0xc4205e2840, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc4205e2888, 0x0, 0x0, ...)
    /home/tamird/go/src/github.com/cockroachdb/cockroach/pkg/internal/client/txn.go:585 +0xfd
github.com/cockroachdb/cockroach/pkg/internal/client.(*Txn).send(0xc4205e2840, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
    /home/tamird/go/src/github.com/cockroachdb/cockroach/pkg/internal/client/txn.go:709 +0x498
github.com/cockroachdb/cockroach/pkg/internal/client.(*Txn).(github.com/cockroachdb/cockroach/pkg/internal/client.send)-fm(0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
    /home/tamird/go/src/github.com/cockroachdb/cockroach/pkg/internal/client/txn.go:311 +0x58
github.com/cockroachdb/cockroach/pkg/internal/client.sendAndFill(0xc420773610, 0xc42031d200, 0x0, 0xc420773710)
    /home/tamird/go/src/github.com/cockroachdb/cockroach/pkg/internal/client/db.go:416 +0x168
github.com/cockroachdb/cockroach/pkg/internal/client.(*Txn).Run(0xc4205e2840, 0xc42031d200, 0x0, 0x0)
    /home/tamird/go/src/github.com/cockroachdb/cockroach/pkg/internal/client/txn.go:311 +0xc6
github.com/cockroachdb/cockroach/pkg/storage.(*NodeLiveness).updateLiveness.func1(0xc4205e2840, 0x0, 0x1)
    /home/tamird/go/src/github.com/cockroachdb/cockroach/pkg/storage/node_liveness.go:307 +0x3b2
github.com/cockroachdb/cockroach/pkg/internal/client.(*DB).Txn.func1(0xc4205e2840, 0xc420376f70, 0x4000000000000000, 0x0)
    /home/tamird/go/src/github.com/cockroachdb/cockroach/pkg/internal/client/db.go:466 +0x2f
github.com/cockroachdb/cockroach/pkg/internal/client.(*Txn).Exec(0xc4205e2840, 0xc420370101, 0x0, 0xc420376f60, 0x0, 0x0)
    /home/tamird/go/src/github.com/cockroachdb/cockroach/pkg/internal/client/txn.go:518 +0x215
github.com/cockroachdb/cockroach/pkg/internal/client.(*DB).Txn(0xc420529a40, 0x7fae83002a38, 0xc4203852f0, 0xc4203753c0, 0xc420773ab8, 0x625940)
    /home/tamird/go/src/github.com/cockroachdb/cockroach/pkg/internal/client/db.go:467 +0x1fa
github.com/cockroachdb/cockroach/pkg/storage.(*NodeLiveness).updateLiveness(0xc42011a160, 0x7fae83002a38, 0xc4203852f0, 0x100000003, 0xc420375380, 0xc4203753a0, 0xc420773bd8, 0x0, 0x2faf080)
    /home/tamird/go/src/github.com/cockroachdb/cockroach/pkg/storage/node_liveness.go:308 +0xbc
github.com/cockroachdb/cockroach/pkg/storage.(*NodeLiveness).heartbeat(0xc42011a160, 0x7fae83002a38, 0xc4203852f0, 0x0, 0x0)
    /home/tamird/go/src/github.com/cockroachdb/cockroach/pkg/storage/node_liveness.go:188 +0x2f0
github.com/cockroachdb/cockroach/pkg/storage.(*NodeLiveness).StartHeartbeat.func1()
    /home/tamird/go/src/github.com/cockroachdb/cockroach/pkg/storage/node_liveness.go:140 +0x2e7
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunWorker.func1(0xc4206a46c0, 0xc420716360)
    /home/tamird/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:196 +0x7d
created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunWorker
    /home/tamird/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:197 +0x66
I161110 09:49:48.430064 101 storage/replica.go:2052  [s1,r1/1:/M{in-ax}] proposing REMOVE_REPLICA {NodeID:3 StoreID:3 ReplicaID:3}: [{NodeID:1 StoreID:1 ReplicaID:1} {NodeID:2 StoreID:2 ReplicaID:2}]
I161110 09:49:48.450654 466 storage/raft_transport.go:432  raft transport stream to node 3 established
I161110 09:49:48.495765 103 storage/replica.go:2052  [s1,r1/1:/M{in-ax}] proposing REMOVE_REPLICA {NodeID:3 StoreID:3 ReplicaID:3}: [{NodeID:1 StoreID:1 ReplicaID:1} {NodeID:2 StoreID:2 ReplicaID:2}]
I161110 09:49:48.730007 118 storage/replica.go:2052  [s1,r1/1:/M{in-ax}] proposing REMOVE_REPLICA {NodeID:3 StoreID:3 ReplicaID:3}: [{NodeID:1 StoreID:1 ReplicaID:1} {NodeID:2 StoreID:2 ReplicaID:2}]
I161110 09:49:48.743782 120 storage/replica.go:2052  [s1,r1/1:/M{in-ax}] proposing REMOVE_REPLICA {NodeID:3 StoreID:3 ReplicaID:3}: [{NodeID:1 StoreID:1 ReplicaID:1} {NodeID:2 StoreID:2 ReplicaID:2}]
I161110 09:49:48.744354 284 storage/raft_transport.go:432  raft transport stream to node 2 established
I161110 09:49:49.029959 61 storage/replica.go:2052  [s1,r1/1:/M{in-ax}] proposing REMOVE_REPLICA {NodeID:3 StoreID:3 ReplicaID:3}: [{NodeID:1 StoreID:1 ReplicaID:1} {NodeID:2 StoreID:2 ReplicaID:2}]
I161110 09:49:49.329977 83 storage/replica.go:2052  [s1,r1/1:/M{in-ax}] proposing REMOVE_REPLICA {NodeID:3 StoreID:3 ReplicaID:3}: [{NodeID:1 StoreID:1 ReplicaID:1} {NodeID:2 StoreID:2 ReplicaID:2}]
I161110 09:49:49.386687 87 storage/replica.go:2052  [s1,r1/1:/M{in-ax}] proposing REMOVE_REPLICA {NodeID:3 StoreID:3 ReplicaID:3}: [{NodeID:1 StoreID:1 ReplicaID:1} {NodeID:2 StoreID:2 ReplicaID:2}]
I161110 09:49:49.391118 365 storage/store.go:2038  [s3,r1/3:/M{in-ax}] DEL 1
I161110 09:49:49.391245 22 storage/replica_raftstorage.go:445  [s1,r1/1:/M{in-ax}] generated snapshot 4c747bfc at index 27 in 51.416µs.
W161110 09:49:49.392144 401 storage/store.go:2989  [s1] got error from range 1, replica {3 3 3}: raft group deleted
W161110 09:49:49.392175 401 storage/store.go:2989  [s1] got error from range 1, replica {3 3 3}: raft group deleted
W161110 09:49:49.392194 401 storage/store.go:2989  [s1] got error from range 1, replica {3 3 3}: raft group deleted
W161110 09:49:49.392210 401 storage/store.go:2989  [s1] got error from range 1, replica {3 3 3}: raft group deleted
W161110 09:49:49.392228 401 storage/store.go:2989  [s1] got error from range 1, replica {3 3 3}: raft group deleted
W161110 09:49:49.392245 401 storage/store.go:2989  [s1] got error from range 1, replica {3 3 3}: raft group deleted
W161110 09:49:49.392262 401 storage/store.go:2989  [s1] got error from range 1, replica {3 3 3}: raft group deleted
W161110 09:49:49.392445 401 storage/store.go:2989  [s1] got error from range 1, replica {3 3 3}: raft group deleted
I161110 09:49:49.396508 22 storage/store.go:3128  [s1,r1/1:/M{in-ax}] streamed snapshot: kv pairs: 48, log entries: 17
I161110 09:49:49.396761 459 storage/store.go:1908  [s3] ADD 1
I161110 09:49:49.396930 459 storage/replica_raftstorage.go:589  [s3,r1/?:{-}] applying preemptive snapshot at index 27 (id=4c747bfc, encoded size=16, 1 rocksdb batches, 17 log entries)
I161110 09:49:49.397609 459 storage/replica_raftstorage.go:592  [s3,r1/?:/M{in-ax}] applied preemptive snapshot in 0.001s
I161110 09:49:49.397626 459 storage/store.go:2814  [s3,r1/?:/M{in-ax}] DEL 1
I161110 09:49:49.398198 22 storage/replica_command.go:3244  [s1,r1/1:/M{in-ax}] change replicas: read existing descriptor range_id:1 start_key:"" end_key:"\377\377" replicas:<node_id:1 store_id:1 replica_id:1 > replicas:<node_id:2 store_id:2 replica_id:2 > next_replica_id:4
I161110 09:49:49.845254 493 storage/replica.go:2052  [s1,r1/1:/M{in-ax}] proposing ADD_REPLICA {NodeID:3 StoreID:3 ReplicaID:4}: [{NodeID:1 StoreID:1 ReplicaID:1} {NodeID:2 StoreID:2 ReplicaID:2} {NodeID:3 StoreID:3 ReplicaID:4}]
I161110 09:49:49.847292 22 storage/replica_command.go:3244  [s1,r1/1:/M{in-ax}] change replicas: read existing descriptor range_id:1 start_key:"" end_key:"\377\377" replicas:<node_id:1 store_id:1 replica_id:1 > replicas:<node_id:2 store_id:2 replica_id:2 > replicas:<node_id:3 store_id:3 replica_id:4 > next_replica_id:5
I161110 09:49:49.849687 518 storage/replica.go:2052  [s1,r1/1:/M{in-ax}] proposing REMOVE_REPLICA {NodeID:3 StoreID:3 ReplicaID:4}: [{NodeID:1 StoreID:1 ReplicaID:1} {NodeID:2 StoreID:2 ReplicaID:2}]
I161110 09:49:49.852936 22 storage/replica_raftstorage.go:445  [s1,r1/1:/M{in-ax}] generated snapshot 04d108d5 at index 46 in 64.612µs.
I161110 09:49:49.854872 452 storage/store.go:2985  [s3,r1/?:/M{in-ax}] added to replica GC queue (peer suggestion)
I161110 09:49:49.873961 22 storage/store.go:3128  [s1,r1/1:/M{in-ax}] streamed snapshot: kv pairs: 60, log entries: 36
I161110 09:49:49.883789 365 storage/store.go:2038  [s3,r1/4:/M{in-ax}] DEL 1
I161110 09:49:49.884146 529 storage/store.go:1908  [s3] ADD 1
I161110 09:49:49.884470 529 storage/replica_raftstorage.go:589  [s3,r1/?:{-}] applying preemptive snapshot at index 46 (id=04d108d5, encoded size=16, 1 rocksdb batches, 36 log entries)
I161110 09:49:49.886038 529 storage/replica_raftstorage.go:592  [s3,r1/?:/M{in-ax}] applied preemptive snapshot in 0.002s
I161110 09:49:49.886097 529 storage/store.go:2814  [s3,r1/?:/M{in-ax}] DEL 1
I161110 09:49:49.933301 22 storage/replica_command.go:3244  [s1,r1/1:/M{in-ax}] change replicas: read existing descriptor range_id:1 start_key:"" end_key:"\377\377" replicas:<node_id:1 store_id:1 replica_id:1 > replicas:<node_id:2 store_id:2 replica_id:2 > next_replica_id:5
I161110 09:49:49.936459 541 storage/replica.go:2052  [s1,r1/1:/M{in-ax}] proposing ADD_REPLICA {NodeID:3 StoreID:3 ReplicaID:5}: [{NodeID:1 StoreID:1 ReplicaID:1} {NodeID:2 StoreID:2 ReplicaID:2} {NodeID:3 StoreID:3 ReplicaID:5}]
I161110 09:49:49.943731 22 storage/replica_command.go:3244  [s1,r1/1:/M{in-ax}] change replicas: read existing descriptor range_id:1 start_key:"" end_key:"\377\377" replicas:<node_id:1 store_id:1 replica_id:1 > replicas:<node_id:2 store_id:2 replica_id:2 > replicas:<node_id:3 store_id:3 replica_id:5 > next_replica_id:6
I161110 09:49:49.946645 571 storage/replica.go:2052  [s1,r1/1:/M{in-ax}] proposing REMOVE_REPLICA {NodeID:3 StoreID:3 ReplicaID:5}: [{NodeID:1 StoreID:1 ReplicaID:1} {NodeID:2 StoreID:2 ReplicaID:2}]
I161110 09:49:49.950270 22 storage/replica_raftstorage.go:445  [s1,r1/1:/M{in-ax}] generated snapshot acb7550a at index 53 in 50.793µs.
I161110 09:49:49.952061 22 storage/store.go:3128  [s1,r1/1:/M{in-ax}] streamed snapshot: kv pairs: 69, log entries: 43
I161110 09:49:49.952543 452 storage/store.go:2985  [s3,r1/5:/M{in-ax}] added to replica GC queue (peer suggestion)
I161110 09:49:49.955697 365 storage/store.go:2038  [s3,r1/5:/M{in-ax}] DEL 1
I161110 09:49:49.956406 22 storage/replica_command.go:3244  [s1,r1/1:/M{in-ax}] change replicas: read existing descriptor range_id:1 start_key:"" end_key:"\377\377" replicas:<node_id:1 store_id:1 replica_id:1 > replicas:<node_id:2 store_id:2 replica_id:2 > next_replica_id:6
I161110 09:49:49.957889 591 storage/replica.go:2052  [s1,r1/1:/M{in-ax}] proposing ADD_REPLICA {NodeID:3 StoreID:3 ReplicaID:6}: [{NodeID:1 StoreID:1 ReplicaID:1} {NodeID:2 StoreID:2 ReplicaID:2} {NodeID:3 StoreID:3 ReplicaID:6}]
I161110 09:49:50.075819 62 storage/replica_raftstorage.go:445  [s1,r1/1:/M{in-ax}] generated snapshot 4156714b at index 56 in 290.865µs.
I161110 09:49:50.077208 470 storage/store.go:3128  [s1,r1/1:/M{in-ax}] streamed snapshot: kv pairs: 73, log entries: 46
I161110 09:49:50.077359 612 storage/store.go:1908  [s3] ADD 1
I161110 09:49:50.077550 612 storage/replica_raftstorage.go:589  [s3,r1/6:{-}] applying Raft snapshot at index 56 (id=4156714b, encoded size=16, 1 rocksdb batches, 46 log entries)
I161110 09:49:50.078981 612 storage/replica_raftstorage.go:592  [s3,r1/6:/M{in-ax}] applied Raft snapshot in 0.001s
I161110 09:49:50.140084 22 storage/replica_command.go:3244  [s1,r1/1:/M{in-ax}] change replicas: read existing descriptor range_id:1 start_key:"" end_key:"\377\377" replicas:<node_id:1 store_id:1 replica_id:1 > replicas:<node_id:2 store_id:2 replica_id:2 > replicas:<node_id:3 store_id:3 replica_id:6 > next_replica_id:7
I161110 09:49:50.144693 481 storage/replica.go:2052  [s1,r1/1:/M{in-ax}] proposing REMOVE_REPLICA {NodeID:3 StoreID:3 ReplicaID:6}: [{NodeID:1 StoreID:1 ReplicaID:1} {NodeID:2 StoreID:2 ReplicaID:2}]
I161110 09:49:50.146453 22 storage/replica_raftstorage.go:445  [s1,r1/1:/M{in-ax}] generated snapshot deaaaeb5 at index 60 in 49.027µs.
I161110 09:49:50.149957 365 storage/store.go:2038  [s3,r1/6:/M{in-ax}] DEL 1
I161110 09:49:50.152066 634 storage/store.go:1908  [s3] ADD 1
I161110 09:49:50.152515 634 storage/replica_raftstorage.go:589  [s3,r1/?:{-}] applying preemptive snapshot at index 60 (id=deaaaeb5, encoded size=16, 1 rocksdb batches, 50 log entries)
I161110 09:49:50.154539 634 storage/replica_raftstorage.go:592  [s3,r1/?:/M{in-ax}] applied preemptive snapshot in 0.002s
I161110 09:49:50.154598 634 storage/store.go:2814  [s3,r1/?:/M{in-ax}] DEL 1
I161110 09:49:50.154843 22 storage/store.go:3128  [s1,r1/1:/M{in-ax}] streamed snapshot: kv pairs: 78, log entries: 50
I161110 09:49:50.155718 22 storage/replica_command.go:3244  [s1,r1/1:/M{in-ax}] change replicas: read existing descriptor range_id:1 start_key:"" end_key:"\377\377" replicas:<node_id:1 store_id:1 replica_id:1 > replicas:<node_id:2 store_id:2 replica_id:2 > next_replica_id:7
I161110 09:49:50.158352 646 storage/replica.go:2052  [s1,r1/1:/M{in-ax}] proposing ADD_REPLICA {NodeID:3 StoreID:3 ReplicaID:7}: [{NodeID:1 StoreID:1 ReplicaID:1} {NodeID:2 StoreID:2 ReplicaID:2} {NodeID:3 StoreID:3 ReplicaID:7}]
I161110 09:49:50.201988 22 storage/replica_command.go:3244  [s1,r1/1:/M{in-ax}] change replicas: read existing descriptor range_id:1 start_key:"" end_key:"\377\377" replicas:<node_id:1 store_id:1 replica_id:1 > replicas:<node_id:2 store_id:2 replica_id:2 > replicas:<node_id:3 store_id:3 replica_id:7 > next_replica_id:8
I161110 09:49:50.240926 452 storage/store.go:2985  [s3,r1/?:/M{in-ax}] added to replica GC queue (peer suggestion)
I161110 09:49:50.263096 661 storage/replica.go:2052  [s1,r1/1:/M{in-ax}] proposing REMOVE_REPLICA {NodeID:3 StoreID:3 ReplicaID:7}: [{NodeID:1 StoreID:1 ReplicaID:1} {NodeID:2 StoreID:2 ReplicaID:2}]
I161110 09:49:50.265466 22 storage/replica_raftstorage.go:445  [s1,r1/1:/M{in-ax}] generated snapshot 528cf394 at index 65 in 49.991µs.
I161110 09:49:50.269364 22 storage/store.go:3128  [s1,r1/1:/M{in-ax}] streamed snapshot: kv pairs: 85, log entries: 55
I161110 09:49:50.269577 452 storage/store.go:2985  [s3,r1/7:/M{in-ax}] added to replica GC queue (peer suggestion)
I161110 09:49:50.270561 365 storage/store.go:2038  [s3,r1/7:/M{in-ax}] DEL 1
I161110 09:49:50.270902 677 storage/store.go:1908  [s3] ADD 1
I161110 09:49:50.271215 677 storage/replica_raftstorage.go:589  [s3,r1/?:{-}] applying preemptive snapshot at index 65 (id=528cf394, encoded size=16, 1 rocksdb batches, 55 log entries)
I161110 09:49:50.272810 677 storage/replica_raftstorage.go:592  [s3,r1/?:/M{in-ax}] applied preemptive snapshot in 0.002s
I161110 09:49:50.272829 677 storage/store.go:2814  [s3,r1/?:/M{in-ax}] DEL 1
I161110 09:49:50.273400 22 storage/replica_command.go:3244  [s1,r1/1:/M{in-ax}] change replicas: read existing descriptor range_id:1 start_key:"" end_key:"\377\377" replicas:<node_id:1 store_id:1 replica_id:1 > replicas:<node_id:2 store_id:2 replica_id:2 > next_replica_id:8
I161110 09:49:50.274934 682 storage/replica.go:2052  [s1,r1/1:/M{in-ax}] proposing ADD_REPLICA {NodeID:3 StoreID:3 ReplicaID:8}: [{NodeID:1 StoreID:1 ReplicaID:1} {NodeID:2 StoreID:2 ReplicaID:2} {NodeID:3 StoreID:3 ReplicaID:8}]
I161110 09:49:50.277827 22 storage/replica_command.go:3244  [s1,r1/1:/M{in-ax}] change replicas: read existing descriptor range_id:1 start_key:"" end_key:"\377\377" replicas:<node_id:1 store_id:1 replica_id:1 > replicas:<node_id:2 store_id:2 replica_id:2 > replicas:<node_id:3 store_id:3 replica_id:8 > next_replica_id:9
I161110 09:49:50.279834 695 storage/replica.go:2052  [s1,r1/1:/M{in-ax}] proposing REMOVE_REPLICA {NodeID:3 StoreID:3 ReplicaID:8}: [{NodeID:1 StoreID:1 ReplicaID:1} {NodeID:2 StoreID:2 ReplicaID:2}]
W161110 09:49:50.287298 401 storage/store.go:2989  [s1] got error from range 1, replica {3 3 7}: replicaID cannot move backwards from 8 to 7
I161110 09:49:50.337536 22 storage/replica_raftstorage.go:445  [s1,r1/1:/M{in-ax}] generated snapshot 618f8104 at index 71 in 49.139µs.
I161110 09:49:50.340002 22 storage/store.go:3128  [s1,r1/1:/M{in-ax}] streamed snapshot: kv pairs: 93, log entries: 61
I161110 09:49:50.341133 365 storage/store.go:2038  [s3,r1/8:/M{in-ax}] DEL 1
I161110 09:49:50.341381 709 storage/store.go:1908  [s3] ADD 1
I161110 09:49:50.341970 709 storage/replica_raftstorage.go:589  [s3,r1/?:{-}] applying preemptive snapshot at index 71 (id=618f8104, encoded size=16, 1 rocksdb batches, 61 log entries)
I161110 09:49:50.344353 709 storage/replica_raftstorage.go:592  [s3,r1/?:/M{in-ax}] applied preemptive snapshot in 0.002s
I161110 09:49:50.344379 709 storage/store.go:2814  [s3,r1/?:/M{in-ax}] DEL 1
I161110 09:49:50.345031 22 storage/replica_command.go:3244  [s1,r1/1:/M{in-ax}] change replicas: read existing descriptor range_id:1 start_key:"" end_key:"\377\377" replicas:<node_id:1 store_id:1 replica_id:1 > replicas:<node_id:2 store_id:2 replica_id:2 > next_replica_id:9
I161110 09:49:50.346017 452 storage/store.go:2985  [s3,r1/?:/M{in-ax}] added to replica GC queue (peer suggestion)
I161110 09:49:50.347119 365 storage/store.go:2038  [s3,r1/?:/M{in-ax}] DEL 1
W161110 09:49:50.383048 401 storage/store.go:2989  [s1] got error from range 1, replica {3 3 8}: raft group deleted
I161110 09:49:50.397671 713 storage/replica.go:2052  [s1,r1/1:/M{in-ax}] proposing ADD_REPLICA {NodeID:3 StoreID:3 ReplicaID:9}: [{NodeID:1 StoreID:1 ReplicaID:1} {NodeID:2 StoreID:2 ReplicaID:2} {NodeID:3 StoreID:3 ReplicaID:9}]
I161110 09:49:50.429221 121 storage/replica_raftstorage.go:445  [s1,r1/1:/M{in-ax}] generated snapshot 7c066faf at index 74 in 61.447µs.
I161110 09:49:50.430947 759 storage/store.go:3128  [s1,r1/1:/M{in-ax}] streamed snapshot: kv pairs: 97, log entries: 64
I161110 09:49:50.431240 764 storage/store.go:1908  [s3] ADD 1
I161110 09:49:50.448100 764 storage/replica_raftstorage.go:589  [s3,r1/9:{-}] applying Raft snapshot at index 74 (id=7c066faf, encoded size=16, 1 rocksdb batches, 64 log entries)
I161110 09:49:50.514723 764 storage/replica_raftstorage.go:592  [s3,r1/9:/M{in-ax}] applied Raft snapshot in 0.067s
I161110 09:49:50.537683 22 storage/replica_command.go:3244  [s1,r1/1:/M{in-ax}] change replicas: read existing descriptor range_id:1 start_key:"" end_key:"\377\377" replicas:<node_id:1 store_id:1 replica_id:1 > replicas:<node_id:2 store_id:2 replica_id:2 > replicas:<node_id:3 store_id:3 replica_id:9 > next_replica_id:10
I161110 09:49:50.540127 623 storage/replica.go:2052  [s1,r1/1:/M{in-ax}] proposing REMOVE_REPLICA {NodeID:3 StoreID:3 ReplicaID:9}: [{NodeID:1 StoreID:1 ReplicaID:1} {NodeID:2 StoreID:2 ReplicaID:2}]
I161110 09:49:50.545336 365 storage/store.go:2038  [s3,r1/9:/M{in-ax}] DEL 1
I161110 09:49:50.546779 22 storage/replica_raftstorage.go:445  [s1,r1/1:/M{in-ax}] generated snapshot 3b430471 at index 78 in 92.552µs.
I161110 09:49:50.549381 787 storage/store.go:1908  [s3] ADD 1
I161110 09:49:50.550048 787 storage/replica_raftstorage.go:589  [s3,r1/?:{-}] applying preemptive snapshot at index 78 (id=3b430471, encoded size=16, 1 rocksdb batches, 68 log entries)
I161110 09:49:50.552708 787 storage/replica_raftstorage.go:592  [s3,r1/?:/M{in-ax}] applied preemptive snapshot in 0.003s
I161110 09:49:50.552731 787 storage/store.go:2814  [s3,r1/?:/M{in-ax}] DEL 1
I161110 09:49:50.552895 22 storage/store.go:3128  [s1,r1/1:/M{in-ax}] streamed snapshot: kv pairs: 102, log entries: 68
I161110 09:49:50.553345 22 storage/replica_command.go:3244  [s1,r1/1:/M{in-ax}] change replicas: read existing descriptor range_id:1 start_key:"" end_key:"\377\377" replicas:<node_id:1 store_id:1 replica_id:1 > replicas:<node_id:2 store_id:2 replica_id:2 > next_replica_id:10
I161110 09:49:50.554840 806 storage/replica.go:2052  [s1,r1/1:/M{in-ax}] proposing ADD_REPLICA {NodeID:3 StoreID:3 ReplicaID:10}: [{NodeID:1 StoreID:1 ReplicaID:1} {NodeID:2 StoreID:2 ReplicaID:2} {NodeID:3 StoreID:3 ReplicaID:10}]
I161110 09:49:50.557223 22 storage/replica_command.go:3244  [s1,r1/1:/M{in-ax}] change replicas: read existing descriptor range_id:1 start_key:"" end_key:"\377\377" replicas:<node_id:1 store_id:1 replica_id:1 > replicas:<node_id:2 store_id:2 replica_id:2 > replicas:<node_id:3 store_id:3 replica_id:10 > next_replica_id:11
I161110 09:49:50.578736 835 storage/replica.go:2052  [s1,r1/1:/M{in-ax}] proposing REMOVE_REPLICA {NodeID:3 StoreID:3 ReplicaID:10}: [{NodeID:1 StoreID:1 ReplicaID:1} {NodeID:2 StoreID:2 ReplicaID:2}]
I161110 09:49:50.598879 22 storage/replica_raftstorage.go:445  [s1,r1/1:/M{in-ax}] generated snapshot e1396f58 at index 82 in 53.342µs.
I161110 09:49:50.601942 22 storage/store.go:3128  [s1,r1/1:/M{in-ax}] streamed snapshot: kv pairs: 108, log entries: 72
W161110 09:49:50.621983 401 storage/store.go:2989  [s1] got error from range 1, replica {3 3 9}: replicaID cannot move backwards from 10 to 9
I161110 09:49:50.622162 837 storage/replica_raftstorage.go:589  [s3,r1/10:/M{in-ax}] applying Raft snapshot at index 82 (id=e1396f58, encoded size=16, 1 rocksdb batches, 72 log entries)
I161110 09:49:50.624909 837 storage/replica_raftstorage.go:592  [s3,r1/10:/M{in-ax}] applied Raft snapshot in 0.003s
I161110 09:49:50.625750 22 storage/replica_command.go:3244  [s1,r1/1:/M{in-ax}] change replicas: read existing descriptor range_id:1 start_key:"" end_key:"\377\377" replicas:<node_id:1 store_id:1 replica_id:1 > replicas:<node_id:2 store_id:2 replica_id:2 > next_replica_id:11
I161110 09:49:50.627142 792 storage/replica.go:2052  [s1,r1/1:/M{in-ax}] proposing ADD_REPLICA {NodeID:3 StoreID:3 ReplicaID:11}: [{NodeID:1 StoreID:1 ReplicaID:1} {NodeID:2 StoreID:2 ReplicaID:2} {NodeID:3 StoreID:3 ReplicaID:11}]
I161110 09:49:50.627666 452 storage/store.go:2985  [s3,r1/10:/M{in-ax}] added to replica GC queue (peer suggestion)
I161110 09:49:50.629059 365 storage/store.go:2038  [s3,r1/10:/M{in-ax}] DEL 1
I161110 09:49:50.795948 101 storage/replica_raftstorage.go:445  [s1,r1/1:/M{in-ax}] generated snapshot 9ade7105 at index 87 in 74.055µs.
I161110 09:49:50.798195 814 storage/store.go:3128  [s1,r1/1:/M{in-ax}] streamed snapshot: kv pairs: 114, log entries: 77
I161110 09:49:50.801887 853 storage/store.go:1908  [s3] ADD 1
I161110 09:49:50.803192 853 storage/replica_raftstorage.go:589  [s3,r1/11:{-}] applying Raft snapshot at index 87 (id=9ade7105, encoded size=16, 1 rocksdb batches, 77 log entries)
I161110 09:49:50.896813 853 storage/replica_raftstorage.go:592  [s3,r1/11:/M{in-ax}] applied Raft snapshot in 0.004s
I161110 09:49:50.980453 22 storage/replica_command.go:3244  [s1,r1/1:/M{in-ax}] change replicas: read existing descriptor range_id:1 start_key:"" end_key:"\377\377" replicas:<node_id:1 store_id:1 replica_id:1 > replicas:<node_id:2 store_id:2 replica_id:2 > replicas:<node_id:3 store_id:3 replica_id:11 > next_replica_id:12
I161110 09:49:50.997213 729 storage/replica.go:2052  [s1,r1/1:/M{in-ax}] proposing REMOVE_REPLICA {NodeID:3 StoreID:3 ReplicaID:11}: [{NodeID:1 StoreID:1 ReplicaID:1} {NodeID:2 StoreID:2 ReplicaID:2}]
I161110 09:49:51.047530 22 storage/replica_raftstorage.go:445  [s1,r1/1:/M{in-ax}] generated snapshot 8aa13fd9 at index 93 in 53.926µs.
I161110 09:49:51.055468 452 storage/store.go:2985  [s3,r1/11:/M{in-ax}] added to replica GC queue (peer suggestion)
I161110 09:49:51.057341 22 storage/store.go:3128  [s1,r1/1:/M{in-ax}] streamed snapshot: kv pairs: 121, log entries: 83
I161110 09:49:51.059106 365 storage/store.go:2038  [s3,r1/11:/M{in-ax}] DEL 1
I161110 09:49:51.075593 734 storage/store.go:1908  [s3] ADD 1
I161110 09:49:51.079754 734 storage/replica_raftstorage.go:589  [s3,r1/?:{-}] applying preemptive snapshot at index 93 (id=8aa13fd9, encoded size=16, 1 rocksdb batches, 83 log entries)
I161110 09:49:51.084389 734 storage/replica_raftstorage.go:592  [s3,r1/?:/M{in-ax}] applied preemptive snapshot in 0.005s
I161110 09:49:51.084450 734 storage/store.go:2814  [s3,r1/?:/M{in-ax}] DEL 1
E161110 09:49:51.084719 326 raft/log.go:191  [s3,r1/?:/M{in-ax}] tocommit(95) is out of range [lastIndex(93)]. Was the raft log corrupted, truncated, or lost?
panic: tocommit(95) is out of range [lastIndex(93)]. Was the raft log corrupted, truncated, or lost? [recovered]
    panic: tocommit(95) is out of range [lastIndex(93)]. Was the raft log corrupted, truncated, or lost?

goroutine 326 [running]:
panic(0x16eee80, 0xc4204667e0)
    /home/tamird/goroot/src/runtime/panic.go:500 +0x1a1
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).Recover(0xc4206a46c0)
    /home/tamird/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:185 +0x6e
panic(0x16eee80, 0xc4204667e0)
    /home/tamird/goroot/src/runtime/panic.go:458 +0x243
github.com/cockroachdb/cockroach/pkg/storage.(*raftLogger).Panicf(0xc420466680, 0x191e4e4, 0x5d, 0xc4203b2ac0, 0x2, 0x2)
    /home/tamird/go/src/github.com/cockroachdb/cockroach/pkg/storage/raft.go:111 +0x107
github.com/coreos/etcd/raft.(*raftLog).commitTo(0xc420278a80, 0x5f)
    /home/tamird/go/src/github.com/coreos/etcd/raft/log.go:191 +0x166
github.com/coreos/etcd/raft.(*raft).handleHeartbeat(0xc420860000, 0x8, 0xb, 0x1, 0xb, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
    /home/tamird/go/src/github.com/coreos/etcd/raft/raft.go:1094 +0x54
github.com/coreos/etcd/raft.stepFollower(0xc420860000, 0x8, 0xb, 0x1, 0xb, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
    /home/tamird/go/src/github.com/coreos/etcd/raft/raft.go:1040 +0x31f
github.com/coreos/etcd/raft.(*raft).Step(0xc420860000, 0x8, 0xb, 0x1, 0xb, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
    /home/tamird/go/src/github.com/coreos/etcd/raft/raft.go:779 +0x11ef
github.com/coreos/etcd/raft.(*RawNode).Step(0xc4208f6240, 0x8, 0xb, 0x1, 0xb, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
    /home/tamird/go/src/github.com/coreos/etcd/raft/rawnode.go:195 +0xc9
github.com/cockroachdb/cockroach/pkg/storage.(*Store).processRaftRequest.func4(0xc4208f6240, 0x0, 0x0, 0x0)
    /home/tamird/go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:2945 +0x89
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).withRaftGroupLocked(0xc420b8f400, 0x1aa0400, 0xc420dbd848, 0x68f48b, 0xc420b8f4d8)
    /home/tamird/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica.go:509 +0x139
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).withRaftGroup(0xc420b8f400, 0xc420dbd848, 0x0, 0x0)
    /home/tamird/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica.go:526 +0x92
github.com/cockroachdb/cockroach/pkg/storage.(*Store).processRaftRequest(0xc42017ce00, 0x7fae83002a38, 0xc420ab4f60, 0xc420a3c900, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
    /home/tamird/go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:2946 +0xb94
github.com/cockroachdb/cockroach/pkg/storage.(*Store).processRequestQueue(0xc42017ce00, 0x1)
    /home/tamird/go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:3174 +0x180
github.com/cockroachdb/cockroach/pkg/storage.(*raftScheduler).worker(0xc4200b8c80, 0xc4206a46c0)
    /home/tamird/go/src/github.com/cockroachdb/cockroach/pkg/storage/scheduler.go:241 +0x267
github.com/cockroachdb/cockroach/pkg/storage.(*raftScheduler).Start.func2()
    /home/tamird/go/src/github.com/cockroachdb/cockroach/pkg/storage/scheduler.go:181 +0x33
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunWorker.func1(0xc4206a46c0, 0xc4200f1600)
    /home/tamird/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:196 +0x7d
created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunWorker
    /home/tamird/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:197 +0x66


ERROR: exit status 2

1433 runs completed, 1 failures, over 35s
FAIL
Makefile:128: recipe for target 'stress' failed
make: *** [stress] Error 1

Note that the "ADD" and "DEL" logs are something I added in pursuit of #9017.

@petermattis
Copy link
Collaborator

That's scary. Would be good to reproduce this with --vmodule=raft=4,raft_log_queue=1.

@tamird
Copy link
Contributor

tamird commented Nov 11, 2016

OK, with #10634 applied I was able to reproduce this with the extra logging: https://gist.github.com/tamird/914d125022cfa7cc40248d07c2ce4455

@tamird
Copy link
Contributor

tamird commented Nov 13, 2016

This reproduces in about 30 seconds on GCE with the following diff:

diff --git a/pkg/storage/client_raft_test.go b/pkg/storage/client_raft_test.go
index f1aac22..7c6be7f 100644
--- a/pkg/storage/client_raft_test.go
+++ b/pkg/storage/client_raft_test.go
@@ -1994,11 +1994,21 @@ func TestRaftAfterRemoveRange(t *testing.T) {
 // number of repetitions adds an unacceptable amount of test runtime).
 func TestRaftRemoveRace(t *testing.T) {
        defer leaktest.AfterTest(t)()
-       mtc := startMultiTestContext(t, 3)
+
+       const numStores = 10
+       mtc := startMultiTestContext(t, numStores)
        defer mtc.Stop()

-       rangeID := roachpb.RangeID(1)
-       mtc.replicateRange(rangeID, 1, 2)
+       const rangeID = roachpb.RangeID(1)
+
+       {
+               // Replicate the range to all stores in a single call because it's faster.
+               var storeIndexes []int
+               for i := 1; i < numStores; i++ {
+                       storeIndexes = append(storeIndexes, i)
+               }
+               mtc.replicateRange(rangeID, storeIndexes...)
+       }

        for i := 0; i < 10; i++ {
                mtc.unreplicateRange(rangeID, 2)

With the invocation (on GCE):

make stress PKG=./pkg/storage TESTS=TestRaftRemoveRace TESTTIMEOUT=10m TESTFLAGS=--vmodule=raft=4,raft_log_queue=1

The increased number of stores appears to increase the probability. From what I can tell, we're applying a snapshot that causes the loss of one or more raft log entries.

@tamird
Copy link
Contributor

tamird commented Nov 13, 2016

I161113 02:37:58.934196 23842 storage/replica.go:2055  [s1,r1/1:/M{in-ax}] proposing REMOVE_REPLICA {NodeID:3 StoreID:3 ReplicaID:104}: [{NodeID:1 StoreID:1 ReplicaID:1} {NodeID:2 StoreID:2 ReplicaID:2} {NodeID:10 StoreID:10 ReplicaID:10} {NodeID:4 StoreID:4 ReplicaID:4} {NodeID:5 StoreID:5 ReplicaID:5} {NodeID:6 StoreID:6 ReplicaID:6} {NodeID:7 StoreID:7 ReplicaID:7} {NodeID:8 StoreID:8 ReplicaID:8} {NodeID:9 StoreID:9 ReplicaID:9}]
I161113 02:37:58.991150 10 storage/replica_raftstorage.go:445  [s1,r1/1:/M{in-ax}] generated snapshot fcb9694b at index 1748 in 75.302µs.
I161113 02:37:58.992114 1265 storage/store.go:2982  [s3,r1/104:/M{in-ax}] added to replica GC queue (peer suggestion)
I161113 02:37:58.995757 72 gossip/gossip.go:1071  [n1] node has connected to cluster via gossip
I161113 02:37:58.996343 10 storage/store.go:3125  [s1,r1/1:/M{in-ax}] streamed snapshot: kv pairs: 1936, log entries: 6
I161113 02:37:59.030709 23725 raft/raft.go:568  [s3,r1/?:{-}] ffffffffffffffff became follower at term 0
I161113 02:37:59.032901 23725 raft/raft.go:316  [s3,r1/?:{-}] newRaft ffffffffffffffff [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
I161113 02:37:59.032940 23725 raft/raft.go:568  [s3,r1/?:{-}] ffffffffffffffff became follower at term 1
I161113 02:37:59.032975 23725 raft/raft.go:710  [s3,r1/?:{-}] ffffffffffffffff [term: 1] received a MsgSnap message with higher term from 1 [term: 6]
I161113 02:37:59.033001 23725 raft/raft.go:568  [s3,r1/?:{-}] ffffffffffffffff became follower at term 6
I161113 02:37:59.034584 23725 raft/raft.go:1125  [s3,r1/?:{-}] ffffffffffffffff [commit: 0, lastindex: 0, lastterm: 0] starts to restore snapshot [index: 1748, term: 6]
I161113 02:37:59.034627 23725 raft/log.go:288  [s3,r1/?:{-}] log [committed=0, applied=0, unstable.offset=1, len(unstable.Entries)=0] starts to restore snapshot [index: 1748, term: 6]
I161113 02:37:59.034664 23725 raft/raft.go:1135  [s3,r1/?:{-}] ffffffffffffffff restored progress of 1 [next = 1749, match = 0, state = ProgressStateProbe, waiting = false, pendingSnapshot = 0]
I161113 02:37:59.034693 23725 raft/raft.go:1135  [s3,r1/?:{-}] ffffffffffffffff restored progress of 2 [next = 1749, match = 0, state = ProgressStateProbe, waiting = false, pendingSnapshot = 0]
I161113 02:37:59.034719 23725 raft/raft.go:1135  [s3,r1/?:{-}] ffffffffffffffff restored progress of a [next = 1749, match = 0, state = ProgressStateProbe, waiting = false, pendingSnapshot = 0]
I161113 02:37:59.034746 23725 raft/raft.go:1135  [s3,r1/?:{-}] ffffffffffffffff restored progress of 4 [next = 1749, match = 0, state = ProgressStateProbe, waiting = false, pendingSnapshot = 0]
I161113 02:37:59.034776 23725 raft/raft.go:1135  [s3,r1/?:{-}] ffffffffffffffff restored progress of 5 [next = 1749, match = 0, state = ProgressStateProbe, waiting = false, pendingSnapshot = 0]
I161113 02:37:59.034803 23725 raft/raft.go:1135  [s3,r1/?:{-}] ffffffffffffffff restored progress of 6 [next = 1749, match = 0, state = ProgressStateProbe, waiting = false, pendingSnapshot = 0]
I161113 02:37:59.034845 23725 raft/raft.go:1135  [s3,r1/?:{-}] ffffffffffffffff restored progress of 7 [next = 1749, match = 0, state = ProgressStateProbe, waiting = false, pendingSnapshot = 0]
I161113 02:37:59.034876 23725 raft/raft.go:1135  [s3,r1/?:{-}] ffffffffffffffff restored progress of 8 [next = 1749, match = 0, state = ProgressStateProbe, waiting = false, pendingSnapshot = 0]
I161113 02:37:59.034906 23725 raft/raft.go:1135  [s3,r1/?:{-}] ffffffffffffffff restored progress of 9 [next = 1749, match = 0, state = ProgressStateProbe, waiting = false, pendingSnapshot = 0]
I161113 02:37:59.034930 23725 raft/raft.go:1102  [s3,r1/?:{-}] ffffffffffffffff [commit: 1748] restored snapshot [index: 1748, term: 6]
I161113 02:37:59.034958 23725 storage/replica_raftstorage.go:589  [s3,r1/?:{-}] applying preemptive snapshot at index 1748 (id=fcb9694b, encoded size=16, 1 rocksdb batches, 6 log entries)
I161113 02:37:59.065532 23725 storage/replica_raftstorage.go:592  [s3,r1/?:/M{in-ax}] applied preemptive snapshot in 0.031s
I161113 02:37:59.065725 275 raft/raft.go:568  [s3,r1/?:/M{in-ax}] 68 became follower at term 6
I161113 02:37:59.065802 275 raft/raft.go:316  [s3,r1/?:/M{in-ax}] newRaft 68 [peers: [1,2,4,5,6,7,8,9,a], term: 6, commit: 1748, applied: 1748, lastindex: 1748, lastterm: 6]
E161113 02:37:59.065896 275 raft/log.go:191  [s3,r1/?:/M{in-ax}] tocommit(1749) is out of range [lastIndex(1748)]. Was the raft log corrupted, truncated, or lost?
panic: tocommit(1749) is out of range [lastIndex(1748)]. Was the raft log corrupted, truncated, or lost? [recovered]
    panic: tocommit(1749) is out of range [lastIndex(1748)]. Was the raft log corrupted, truncated, or lost?
  • store 1 proposes REMOVE_REPLICA
  • store 1 generates a snapshot
  • store 3 GC's the replica
  • store 1 finishes sending the snapshot
  • store 3 starts applying the preemptive snapshot (evidenced by the line [s3,r1/?:{-}] newRaft fffffffffffffff)
  • store 3 finishes applying the snapshot
  • store 3's replica gets raft ID 68 (hex, 104 in decimal -- this is the same as the previous replica's ID, and is pretty fishy. previous generations of this replica never reuse a raft ID this way)
  • boom

@tamird
Copy link
Contributor

tamird commented Nov 13, 2016

@tamird tamird self-assigned this Nov 13, 2016
@bdarnell
Copy link
Contributor

Here's what's going on:

When we apply a preemptive snapshot, we nil out the replica's raft group, to be recreated by the next raft message. Normally, that's going to be a message from the new incarnation of the range.

However, in this case, we get a message for the old incarnation (which has been held up in some queue somewhere), so we recreate the raft group with its old id. This almost works (dropping and recreating the raft group is similar to what happens when a node restarts) but the snapshot process has discarded the uncommitted tail of the log (the snapshot itself only contains committed log entries, on the assumption that uncommitted entries can be replayed from the leader, and we delete everything related to the previous incarnation of the range except its HardState). When the leader asks this replica to commit an entry (which that replica had previously acknowledged writing to its log) and it's not there, it panics.

I think there are two solutions. One option would be to preserve the uncommitted tail of the log across preemptive snapshots (except to the extent that it conflicts with newer committed entries). I think this would be a fairly simple code change, but it feels dirty to combine log entries from two different sources like this. The alternative would be to ensure that after we apply a preemptive snapshot, the replica cannot be used under its old id any more. The simplest way to do this would be to refuse any preemptive snapshots while an older replica still exists (and queue it up for an immediate GC).

@petermattis
Copy link
Collaborator

The alternative would be to ensure that after we apply a preemptive snapshot, the replica cannot be used under its old id any more. The simplest way to do this would be to refuse any preemptive snapshots while an older replica still exists (and queue it up for an immediate GC).

Not sure I understand this proposal. How would you know when the older replica is definitively gone and no more messages will be received for it?

@bdarnell
Copy link
Contributor

When a replica is GCed, we write a tombstone with its replica ID, and that makes us ignore any future messages for that replica ID.

@petermattis
Copy link
Collaborator

Looks to me like the tombstone prevents the creation of a replica with an old replica-id, but it does not prevent ignoring messages with an older replica ID on a newer replica. That is accomplished by Replica.setReplicaIDLocked.

I think preemptive snapshots created a hole in the handling of message from old replicas. The preemptive snapshot can create the Replica and because it is a preemptive snapshot we ignore the tombstone (see Store.tryGetOrCreateReplica). After the preemptive snapshot is applied, Replica.mu.replicaID is still 0 and a message for the old replica can arrive which Replica.setReplicaIDLocked allows to proceed. Is the fix as simple as something like:

func (s *Store) tryGetOrCreateReplica(...) {
...
    } else if ok {
        if replicaID != 0 && replicaID < tombstone.NextReplicaID {
            return nil, false, &roachpb.RaftGroupDeletedError{}
        } else if replicaID == 0 {
            replicaID = tombstone.NextReplicaID
        }
    }
...

@bdarnell
Copy link
Contributor

No, we can't just use tombstone.NextReplicaID as our replica ID - that ID has (probably) been assigned to someone else (and I don't think we even hit this code path for the case we're discussing here). We need to A) respect the tombstone earlier in tryGetOrCreateReplica when we're calling setReplicaID and B) write a tombstone when a preemptive snapshot is applied over an older replica.

@petermattis
Copy link
Collaborator

We need to A) respect the tombstone earlier in tryGetOrCreateReplica when we're calling setReplicaID

We don't want to load the tombstone on every call to tryGetOrCreateReplica. If we can't use it to initialize Replica.mu.replicaID, what about adding a Replica.mu.minReplicaID?

B) write a tombstone when a preemptive snapshot is applied over an older replica.

How can we be applying a preemptive snapshot over an older replica? I could have sworn we rejected preemptive snapshots if the replica already existed and was initialized, but it appears we don't.

@bdarnell
Copy link
Contributor

We'd only need to load the tombstone in tryGetOrCreateReplica if we're going to call setReplicaID (that is, if the incoming message's replica ID differs from the one in our Replica object). But a minReplicaID field could work too.

We used to disallow preemptive snapshots when the raft group existed, but I changed that in #8613 when we stopped ignoring errors from preemptive snapshots. This was necessary to get the tests to pass, but there are probably other solutions we could use instead.

@petermattis
Copy link
Collaborator

@bdarnell Unless you've already started on this, I can take a crack at fixing it.

@bdarnell
Copy link
Contributor

Go ahead; I'm working on #10602.

@bdarnell bdarnell assigned petermattis and unassigned bdarnell and tamird Nov 14, 2016
@petermattis
Copy link
Collaborator

Reproduced on my laptop with:

~ make stress PKG=./storage TESTS='^TestRaftRemoveRace$' STRESSFLAGS="-maxfails 1 -stderr -p 16"
...
24 runs completed, 1 failures, over 7s

Same error as @tamird was seeing:

E161114 14:26:17.843869 231 raft/log.go:191  [s3,r1/?:/M{in-ax}] tocommit(170) is out of range [lastIndex(169)]. Was the raft log corrupted, truncated, or lost?
panic: tocommit(170) is out of range [lastIndex(169)]. Was the raft log corrupted, truncated, or lost? [recovered]
    panic: tocommit(170) is out of range [lastIndex(169)]. Was the raft log corrupted, truncated, or lost?

Now to fix.

petermattis added a commit to petermattis/cockroach that referenced this issue Nov 14, 2016
Introduce Replica.mu.minReplicaID which enforces the tombstone invariant
that we don't accept messages for previous Replica incarnations.

Make TestRaftRemoveRace more aggressive.

Fixes cockroachdb#9037
petermattis added a commit to petermattis/cockroach that referenced this issue Nov 15, 2016
Introduce Replica.mu.minReplicaID which enforces the tombstone invariant
that we don't accept messages for previous Replica incarnations.

Make TestRaftRemoveRace more aggressive.

Fixes cockroachdb#9037
petermattis added a commit to petermattis/cockroach that referenced this issue Nov 15, 2016
Introduce Replica.mu.minReplicaID which enforces the tombstone invariant
that we don't accept messages for previous Replica incarnations.

Make TestRaftRemoveRace more aggressive.

Fixes cockroachdb#9037
petermattis added a commit to petermattis/cockroach that referenced this issue Nov 17, 2016
Introduce Replica.mu.minReplicaID which enforces the tombstone invariant
that we don't accept messages for previous Replica incarnations.

Make TestRaftRemoveRace more aggressive.

Fixes cockroachdb#9037
petermattis added a commit to petermattis/cockroach that referenced this issue Nov 21, 2016
Introduce Replica.mu.minReplicaID which enforces the tombstone invariant
that we don't accept messages for previous Replica incarnations.

Make TestRaftRemoveRace more aggressive.

Fixes cockroachdb#9037
petermattis added a commit to petermattis/cockroach that referenced this issue Nov 21, 2016
Introduce Replica.mu.minReplicaID which enforces the tombstone invariant
that we don't accept messages for previous Replica incarnations.

Make TestRaftRemoveRace more aggressive.

Fixes cockroachdb#9037
petermattis added a commit to petermattis/cockroach that referenced this issue Nov 21, 2016
Introduce Replica.mu.minReplicaID which enforces the tombstone invariant
that we don't accept messages for previous Replica incarnations.

Make TestRaftRemoveRace more aggressive.

Fixes cockroachdb#9037
irfansharif added a commit to irfansharif/cockroach that referenced this issue Jun 7, 2017
Fixes cockroachdb#16376.

Under stress TestRaftRemoveRace failed with:

  --- FAIL: TestRaftRemoveRace (1.45s)
  client_test.go:1031: change replicas of r1 failed: quota pool no
  longer in use

Consider the following:
- 'add replica' commands get queued up on the replicate queue
- leader replica steps down as leader due to timeouts thus closing the
  quota pool
- commands get out of the queue, cannot acquire quota because the quota
  pool is closed and fail with an error indicating so

TestRaftRemoveRace fails as it expects all replica additions to go
through without failure.

To reproduce this the following minimal test (run under stress) is
sufficient where we lower RaftTickInterval and RaftElectionTimeoutTicks
to make it more likely that leadership changes take place.

  func TestReplicateQueueQuota(t *testing.T) {
    defer leaktest.AfterTest(t)()
      sc := storage.TestStoreConfig(nil)
      sc.RaftElectionTimeoutTicks = 2             // Default: 15
      sc.RaftTickInterval = 10 * time.Millisecond // Default: 200ms
      mtc := &multiTestContext{storeConfig: &sc}
      defer mtc.Stop()
      mtc.Start(t, 3)

      const rangeID = roachpb.RangeID(1)
      mtc.replicateRange(rangeID, 1, 2)

      for i := 0; i < 10; i++ {
          mtc.unreplicateRange(rangeID, 2)
          mtc.replicateRange(rangeID, 2)
      }
  }

The earlier version of TestRaftRemoveRace was written to reproduce the
failure seen in cockroachdb#9037, this was agnostic to raft leadership changes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants