ui: no data available but cluster appears to be running normally #18324

dianasaur323 · 2017-09-07T15:52:02Z

BUG REPORT

This is running on the PM cluster (4 nodes, 2 in NYC, 1 in SF, 1 in AMS).

NYC1:

I170905 23:53:52.056927 76 util/log/clog.go:888  [config] file created at: 2017/09/05 23:53:52
I170905 23:53:52.056927 76 util/log/clog.go:888  [config] running on machine: pm-cluster-nyc1-01
I170905 23:53:52.056927 76 util/log/clog.go:888  [config] binary: CockroachDB CCL a9ef834 (linux amd64, built 2017/09/05 17:04:18, go1.8.3)
I170905 23:53:52.056927 76 util/log/clog.go:888  [config] arguments: [/usr/local/bin/cockroach start --certs-dir=./certs --advertise-host=192.241.150.167]

NYC2:

I170906 09:57:30.690041 117 util/log/clog.go:888  [config] file created at: 2017/09/06 09:57:30
I170906 09:57:30.690041 117 util/log/clog.go:888  [config] running on machine: pm-cluster-nyc1-02
I170906 09:57:30.690041 117 util/log/clog.go:888  [config] binary: CockroachDB CCL a9ef834 (linux amd64, built 2017/09/05 17:04:18, go1.8.3)
I170906 09:57:30.690041 117 util/log/clog.go:888  [config] arguments: [/usr/local/bin/cockroach start --certs-dir=./certs --advertise-host=165.227.80.203 --join=192.241.150.167]

SF:

I170907 15:29:09.975391 131 util/log/clog.go:888  [config] file created at: 2017/09/07 15:29:09
I170907 15:29:09.975391 131 util/log/clog.go:888  [config] running on machine: pm-cluster-sfo2-01
I170907 15:29:09.975391 131 util/log/clog.go:888  [config] binary: CockroachDB CCL a9ef834 (linux amd64, built 2017/09/05 17:04:18, go1.8.3)
I170907 15:29:09.975391 131 util/log/clog.go:888  [config] arguments: [/usr/local/bin/cockroach start --certs-dir=./certs --advertise-host=138.197.195.129 --join=192.241.150.167]

AMS:

I170907 04:22:57.344517 106 util/log/clog.go:888  [config] file created at: 2017/09/07 04:22:57
I170907 04:22:57.344517 106 util/log/clog.go:888  [config] running on machine: pm-cluster-ams2-01
I170907 04:22:57.344517 106 util/log/clog.go:888  [config] binary: CockroachDB CCL a9ef834 (linux amd64, built 2017/09/05 17:04:18, go1.8.3)
I170907 04:22:57.344517 106 util/log/clog.go:888  [config] arguments: [/usr/local/bin/cockroach start --certs-dir=./certs --advertise-host=146.185.156.213 --join=192.241.150.167]

What did you do? Loaded up the admin UI.
What did you expect to see? Expected to see data
What did you see instead? Screenshots below

The interesting thing is that we've had under-replicated ranges for a while. I'm not sure if this is an issue, or an existing just showing itself in a more extreme form.

cc @nstewart

The text was updated successfully, but these errors were encountered:

tbg · 2017-09-07T15:57:35Z

Screenshot of the pages for range 1, 61, 3 please (or probably we can look at this cluster directly, right?). @nvanbenschoten, this version has the log spy so the Raft problems I anticipate should be straightforward to debug. Perhaps #17524 again.

tbg · 2017-09-07T19:47:12Z

We diagnosed #18327 as one of the problems in this cluster. @dianasaur323 to bump the node version to see if problems persist.

dianasaur323 · 2017-09-09T02:58:59Z

hmm... I don't think this problem is persisting anymore, although I continue to have quite a few ranges with no lease or underreplicated / slow. I wonder if this has to do with our geo-distributed cluster.

tbg · 2017-09-10T02:23:52Z

The under-replicated ranges are in fact under-replicated; there are only two replicas for ~30 ranges. The reason for which the third replica was removed is always "store dead". The allocator simulation always suggests adding a store. Yet, this doesn't seem to be happening (@dianasaur323 mentioned seeing refused snapshots, which could be part of the problem).

There may be more problems, but that is definitely one of them.

tbg · 2017-09-10T02:24:41Z

There's also still #18339 which I hear is going to be fixed soon, so we'll have a better idea about the "no lease" ranges which looked healthy when I looked.

tbg · 2017-09-10T02:38:56Z

Seeing this on the lease holder of one of the underreplicated ranges:

goroutine 109 [select]:
github.com/cockroachdb/cockroach/vendor/golang.org/x/time/rate.(*Limiter).waitN(0xc42aceff40, 0x7f27a344fe10, 0xc42c6cc720, 0x1, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/vendor/golang.org/x/time/rate/rate.go:259 +0x370
github.com/cockroachdb/cockroach/vendor/golang.org/x/time/rate.(*Limiter).WaitN(0xc42aceff40, 0x2ac92a0, 0xc42c6cc720, 0x1, 0xc42c6cc720, 0x0)
	/go/src/github.com/cockroachdb/cockroach/vendor/golang.org/x/time/rate/rate_go17.go:20 +0x6d
github.com/cockroachdb/cockroach/pkg/storage.sendSnapshot(0x7f27a34dc9a8, 0xc42c6cc720, 0xc420019900, 0x7f27a344eac8, 0xc42de20020, 0x2ab1020, 0xc420698840, 0x49, 0x22, 0xc42c6ccb40, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:3375 +0xca2
github.com/cockroachdb/cockroach/pkg/storage.(*RaftTransport).SendSnapshot(0xc4204b24b0, 0x7f27a34dc9a8, 0xc42c6cc720, 0xc420698840, 0x49, 0x22, 0xc42c6ccb40, 0xc4256e66e0, 0xc427743200, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/raft_transport.go:648 +0x2aa
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).sendSnapshot(0xc420c95400, 0x7f27a34dc9a8, 0xc42c6cc720, 0x100000001, 0xc400000003, 0x1d01a4f, 0x4, 0xc400000001, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_command.go:3770 +0x42e
github.com/cockroachdb/cockroach/pkg/storage.(*raftSnapshotQueue).processRaftSnapshot(0xc4201cd3c0, 0x7f27a34dc9a8, 0xc42c6cc720, 0xc420c95400, 0x3, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/raft_snapshot_queue.go:116 +0xed
github.com/cockroachdb/cockroach/pkg/storage.(*raftSnapshotQueue).process(0xc4201cd3c0, 0x7f27a34dc9a8, 0xc42c6cc720, 0xc420c95400, 0x0, 0x0, 0x0, 0x2ad5200, 0xc423d10000)

The cluster is running with defaults, so a raft snapshot should be capped at 8mb/s.

tbg · 2017-09-10T02:43:32Z

Logs show many snapshot problems. It looks like all nodes are routinely refusing snapshots and nothing ever gets done because the stores are throttled and even if they weren't they'd refuse most things.

I've also seen this:

E170910 02:38:29.352546 109 storage/queue.go:656 [raftsnapshot,n2,s2,r754/2:/Table/166/7/9{419/2…-879/2…}] snapshot failed: log truncation during snapshot removed sideloaded SSTable

@dianasaur323 I'm curious, did you by any chance manually delete anything from the data directory? You would expect that error only in the wild when actively ingesting a backup. After that, the only explanation I have is that someone deleted the auxiliary directory (or, very occasionally, if a range hasn't seen activity at all after the backup and only now gets a log truncation while a snapshot is active). Either way, just checking because if you did manually delete stuff then that can prevent snapshots for that range basically forever.

tbg · 2017-09-10T02:48:05Z

A rough idea about the frequency of these deletions. Note how it's always the same ranges -- that sounds like files were deleted manually.

E170910 02:44:53.190429 126 storage/queue.go:656  [raftsnapshot,n4,s4,r731/1:/Table/166/1/2214{3821…-4024…}] snapshot failed: log truncation during snapshot removed sideloaded SSTable
E170910 02:45:07.273912 126 storage/queue.go:656  [raftsnapshot,n4,s4,r731/1:/Table/166/1/2214{3821…-4024…}] snapshot failed: log truncation during snapshot removed sideloaded SSTable
E170910 02:45:16.677327 126 storage/queue.go:656  [raftsnapshot,n4,s4,r731/1:/Table/166/1/2214{3821…-4024…}] snapshot failed: log truncation during snapshot removed sideloaded SSTable
E170910 02:45:30.297467 126 storage/queue.go:656  [raftsnapshot,n4,s4,r731/1:/Table/166/1/2214{3821…-4024…}] snapshot failed: log truncation during snapshot removed sideloaded SSTable
E170910 02:45:44.046849 126 storage/queue.go:656  [raftsnapshot,n4,s4,r731/1:/Table/166/1/2214{3821…-4024…}] snapshot failed: log truncation during snapshot removed sideloaded SSTable
E170910 02:45:58.115813 126 storage/queue.go:656  [raftsnapshot,n4,s4,r731/1:/Table/166/1/2214{3821…-4024…}] snapshot failed: log truncation during snapshot removed sideloaded SSTable
E170910 02:46:11.536777 126 storage/queue.go:656  [raftsnapshot,n4,s4,r731/1:/Table/166/1/2214{3821…-4024…}] snapshot failed: log truncation during snapshot removed sideloaded SSTable
I170910 02:46:20.831292 123 storage/replicate_queue.go:231  [replicate,n4,s4,r686/1:/Table/166/1/22145{053…-259…}] snapshot failed: log truncation during snapshot removed sideloaded SSTable
E170910 02:46:25.736157 126 storage/queue.go:656  [raftsnapshot,n4,s4,r731/1:/Table/166/1/2214{3821…-4024…}] snapshot failed: log truncation during snapshot removed sideloaded SSTable
I170910 02:46:26.689174 123 storage/replicate_queue.go:231  [replicate,n4,s4,r686/1:/Table/166/1/22145{053…-259…}] snapshot failed: log truncation during snapshot removed sideloaded SSTable
I170910 02:46:32.743099 123 storage/replicate_queue.go:231  [replicate,n4,s4,r686/1:/Table/166/1/22145{053…-259…}] snapshot failed: log truncation during snapshot removed sideloaded SSTable
E170910 02:46:39.799586 126 storage/queue.go:656  [raftsnapshot,n4,s4,r731/1:/Table/166/1/2214{3821…-4024…}] snapshot failed: log truncation during snapshot removed sideloaded SSTable
E170910 02:46:53.359401 126 storage/queue.go:656  [raftsnapshot,n4,s4,r731/1:/Table/166/1/2214{3821…-4024…}] snapshot failed: log truncation during snapshot removed sideloaded SSTable

tbg · 2017-09-10T03:09:07Z

Oh, monster duh -- this [the sideloading stuff] is expected with this change: #17787

Basically the old file that was written at ./auxiliary/<rangeID>.<replicaID>/X is now at ./auxiliary/sideloading/<rangeID%1000>/<rangeID>.<replicaID>/X.

tbg · 2017-09-10T03:22:09Z

I've made an attempt at copying the files in their new location, using this snippet:

cd ~/cockroach-data/auxiliary && ls && find [0-9]* -type f -print0 | xargs --null -n 1 ./shard.sh

#!/usr/bin/env bash

id=$(echo "$1" | sed -E 's/^([0-9]+).*/\1/g')
shard=$(($id % 1000))

dest="sideloading/${shard}/${1}"
mkdir -p "$(dirname $dest)"
cp -p "$1" "$dest"

(no, that is not idiomatic).

Will check tomorrow if the sideloading messages persist. I assume that if they will, it's because I didn't copy into the right location. But hopefully they're gone and we can check whether they were somehow connected to this (pretty likely since it persistently fails snapshots, which could clog things up if it affects enough ranges).

tbg · 2017-09-10T03:36:55Z

Hmm. Seems to continue, but the odd thing I'm noticing only now is that the replicas for which I see that error have a new-style sideloading directory already, and there's (usually) one SSTable in it. They don't even have the old-style directory, which makes it seem like they would be unaffected by the problem described in my last post. (Sample of 2, so might not hold for all of them).

It's definitely awful that that error doesn't give us but a clue as to which file it's looking for. I'll make sure to fix that for 1.1. But, regardless, it looks more as if the code is looking in the old place and the file is really in the new location. Or we deleted something during truncation we weren't supposed to delete - a one-off would be most likely here, but I didn't find one.

tbg · 2017-09-10T03:45:44Z

@dianasaur323 would be interested in when you imported into this cluster and which version you were running. The change above was merged at Date: Thu Aug 31 07:56:46 2017 -0400. which I believe precedes the ranges I've looked at.

Hmm, I need to check what we are doing when a range splits. We should be copying (parts of the sideloaded storage) around. Pretty sure we don't, perhaps this is at work here (and even if it is, it's a pretty bad omission).

dianasaur323 · 2017-09-10T04:00:59Z

@tschottdorf wow, you did a lot of work in the last couple hours.... I haven't been manually deleting anything, although our cluster right now is fully automated. I believe @nstewart has some cron jobs that spin up nodes if they die, and also pull the most recent binary on a daily basis. Should be the keep-alive.sh and upgrade-and-kill.sh files located in the root directory. I'm noticing that the upgrade-and-kill.sh file kills all cockroach nodes versus doing a rolling upgrade. Could that be causing a problem?

tbg · 2017-09-10T04:05:37Z

Hmm, I need to check what we are doing when a range splits. We should be copying (parts of the sideloaded storage) around. Pretty sure we don't, perhaps this is at work here (and even if it is, it's a pretty bad omission).

Eh, too late to write comments. The RHS of a split starts with an empty log so the above is false.

Still, I'm lacking a sensible theory as to what's going on with these missing files.

tbg · 2017-09-10T04:07:13Z

Could that be causing a problem?

No, or rather, if it is, then that's something that would need to be fixed. Fairly certain it shouldn't though.

nstewart · 2017-09-10T04:46:41Z

@dianasaur323 killall is local to the node and the cron for that command is offset by 5 mins so it should be a rolling upgrade

dianasaur323 · 2017-09-10T04:51:50Z

@nstewart ah, excellent! thanks for clarifying that.

tbg · 2017-09-10T14:57:44Z

Yikes, there's a range that's very clearly in deep trouble (not under-replicated though). This is again due to the sideloading files missing. I'm mildly confident that that is the main problem at work here.

tbg · 2017-09-10T15:01:13Z

(btw, if you're wondering how I made that screenshot, it's ./tunnel xxx.xxx.xxx.xxx:8080 '/#/reports/range/676' with https://github.com/tschottdorf/roacheteer)

tbg · 2017-09-10T16:16:30Z

For r676, I checked the sideloading directory and a) it has only the new one and b) there's only one SSTable in there, at index 21/term 7. Looking at the screenshot above, the snapshot for the trailing replica would contain that SSTable. Thinking that maybe it was looking for the wrong term I made a few copies of the file at all relevant terms, but no impact. Well, #18405 will tell us what the problem is.

For cockroachdb#18324.

tbg · 2017-09-10T17:59:32Z

Bumped the cluster to include #18405.

tbg · 2017-09-10T18:05:25Z

Ok, here's the updated message:

E170910 18:02:27.879579 140 storage/queue.go:656  [raftsnapshot,n3,s3,r676/1:/Table/165/{3/10223…-4}] snapshot failed: log truncation during snapshot removed sideloaded SSTable at index 22, term 8

This is interesting. As I mentioned above, the file we have in that replica's sideloaded storage is i21.t8. If a truncation had been at fault, we would expect to not see any files at indexes lower than the missing one (22). The fact that we see i21.t8 suggests that perhaps this is not a truncation-related problem.

tbg · 2017-09-10T18:06:43Z

node 2, the other member of the raft group which is up to date, also only has i21.t8 and nothing else.

tbg · 2017-09-10T18:47:10Z

/Local/RangeID/676/u/RaftLog/logIndex:21: Term:7 Index:21  by repl=(n3,s3):1 start=1504212874.096533018,0 epo=38 pro=1504212874.096535332,0
proposer_replica:<node_id:3 store_id:3 replica_id:1 > proposer_lease:<start:<wall_time:1504212874096533018 logical:0 > expiration:<wall_time:0 logical:0 > replica:<node_id:3 store_id:3 replica_id:1 > deprecated_start_stasis:<wall_time:0 logical:0 > proposed_ts:<wall_time:1504212874096535332 logical:0 > epoch:38 > max_lease_index:7 replicated_eval_result:<start_key:"\366\245\213\370\001\217X\375\003\022\224\354\214\260\000\001\210" end_key:"\366\245\213\370\002I\357\375\003\022\232\273\312I\000\001\210\000" block_reads:false state:<gc_threshold:<wall_time:0 logical:0 > stats:<contains_estimates:false last_update_nanos:0 intent_age:0 gc_bytes_age:0 live_bytes:0 live_count:0 key_bytes:0 key_count:0 val_bytes:0 val_count:0 intent_bytes:0 intent_count:0 sys_bytes:0 sys_count:0 > txn_span_gc_threshold:<wall_time:0 logical:0 > > is_lease_request:false is_freeze:false timestamp:<wall_time:1504212934211614036 logical:0 > is_consistency_related:false delta:<contains_estimates:false last_update_nanos:1504212934211614036 intent_age:0 gc_bytes_age:0 live_bytes:16730385 live_count:478011 key_bytes:14340330 key_count:478011 val_bytes:2390055 val_count:478011 intent_bytes:0 intent_count:0 sys_bytes:0 sys_count:0 > add_sstable:<crc32:1448830645 > > write_batch:<data:"\000\000\000\000\000\000\000\000\000\000\000\000" >

/Local/RangeID/676/u/RaftLog/logIndex:22: Term:8 Index:22 : EMPTY

/Local/RangeID/676/u/RaftLog/logIndex:23: Term:9 Index:23 : EMPTY

/Local/RangeID/676/u/RaftLog/logIndex:24: Term:10 Index:24 : EMPTY

/Local/RangeID/676/u/RaftLog/logIndex:25: Term:11 Index:25 : EMPTY

This is surprising. The empty entry at 22 indicates that the Raft leader proposed an empty entry here (as it does after leader election). A sideloaded proposal looks like the one at index 21. So the conjecture is that the actual sideloaded proposal is i21.t7 (which we do have on disk) but for some reason we're trying to treat i22 as a sideloaded proposal, which it is not.

This looks like we're seeing an empty entry but somehow mistaking it for a sideloaded entry. This would happen if it had the wrong command encoding version, but an empty entry has an empty version.

tbg · 2017-09-10T18:48:34Z

My conjecture is that the fact that there is a sideloaded entry at 21 and then an empty one at 22 somehow causes the bug. For example, this would happen if we reuse the same Entry struct but don't clear it. (But we do).

tbg · 2017-09-10T19:01:25Z

Uh, actually:

func TestFoo(t *testing.T) {
	var ent raftpb.Entry
	ent.Data = []byte("foo")
	if err := ent.Unmarshal([]byte{}); err != nil {
		t.Fatal(err)
	}
	// --- FAIL: TestFoo (0.00s)
	// store_test.go:61: {0 0 EntryNormal [102 111 111] []}
	if len(ent.Data) != 0 {
		t.Fatal(ent)
	}
}

That does explain it because

cockroach/pkg/storage/store.go

Lines 3430 to 3437 in 1fd366d

    
           var ent raftpb.Entry 
        
           for i := range logEntries { 
        
           	if err := ent.Unmarshal(logEntries[i]); err != nil { 
        
           		return err 
        
           	} 
        
           	if !sniffSideloadedRaftCommand(ent.Data) { 
        
           		continue 
        
           	}

.

tbg · 2017-09-10T19:04:39Z

((m *Entry).Unmarshal does not reset, proto.Unmarshal which is what I looked at before, does). That bug might lurk elsewhere as well, but I'll send a patch for this particular incarnation now.

See: cockroachdb#18324 (comment) Touches cockroachdb#18324.

tbg · 2017-09-10T20:07:08Z

Having restarted with the fix, r767 is now operational. The system appears to be busy applying (actually applying) snapshots. Not seeing any more sideloading snapshot errors in any of the logs.

There are still a bunch of occurrences of #18339, so that issue should be addressed ASAP.

See: cockroachdb#18324 (comment) Touches cockroachdb#18324.

For cockroachdb#18324.

tbg · 2017-09-11T00:39:47Z

Everything replicated back up, but there's one single holdover:

I170911 00:01:50.558255 117 storage/replica_raftstorage.go:516  [replicate,n3,s3,r779/1:/Table/172/{1-2}] generated preemptive snapshot dd348fc5 at index 127
I170911 00:01:50.643202 117 storage/replicate_queue.go:231  [replicate,n3,s3,r779/1:/Table/172/{1-2}] snapshot failed: (n1,s1):?: remote declined snapshot: reservation rejected
I170911 00:01:50.691113 117 storage/replica_raftstorage.go:516  [replicate,n3,s3,r779/1:/Table/172/{1-2}] generated preemptive snapshot e1e96a36 at index 127
I170911 00:01:50.850667 117 storage/replicate_queue.go:231  [replicate,n3,s3,r779/1:/Table/172/{1-2}] snapshot failed: (n4,s4):?: remote declined snapshot: reservation rejected
E170911 00:01:50.942205 117 storage/queue.go:656  [replicate,n3,s3,r779/1:/Table/172/{1-2}] 2 matching stores are currently throttled

... oops, didn't post this for an hour, and now it has upreplicated. Well, problems fixed, then, I suppose? Though it's not clear why the stores were throttled for that long. Any ideas @a-robinson?

a-robinson · 2017-09-11T14:48:22Z

There are 4 reasons that a store may be rejecting snapshots:

It's draining before shutting down
It's currently receiving/applying a snapshot
It's currently removing a replica
It's too far behind on too many raft ranges.

Options 2, 3, and 4 could all easily be true on a node that just came up, and 3 and 4 might take a while before they stop being true.

tbg · 2017-09-11T18:35:50Z

It took on the order of 3 hours before the upreplication got through. Wonder what we can do to make it easier to see why it wasn't. Could we add a reason for the store crying throttled? Or perhaps it's discernible from the graphs. Either way, I just couldn't tell so we should probably improve something.

On Mon, Sep 11, 2017, 10:49 Alex Robinson ***@***.***> wrote: There are 4 reasons that a store may be rejecting snapshots: 1. It's draining before shutting down 2. It's currently receiving/applying a snapshot 3. It's currently removing a replica 4. It's too far behind on too many raft ranges. Options 2, 3, and 4 could all easily be true on a node that just came up, and 3 and 4 might take a while before they stop being true. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#18324 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AE135PaTPysbFu3WzBuzRM5XNkSJanpnks5shUhRgaJpZM4PQBqr> .

--

…

-- Tobias

a-robinson · 2017-09-11T21:25:09Z

We can't easily distinguish between 2 and 3 (and we do already include an extra message for 1), but differentiating 4 couldn't hurt. We could probably figure it out from graphs, but we don't need to require that extra step.

dianasaur323 · 2017-09-12T13:45:40Z

hmm... checking my debug page shows this:

tbg · 2017-09-12T17:44:28Z

^- this is probably #18055

I found two more issues related to this sideloading in this cluster.

when applying a preemptive snapshot, we (have to) write "fat" entries into the log because we don't have a sideloaded storage location yet (no replicaID). However, when that node itself later tries to create snapshot, it looks at these fat proposals and mistakenly tries to "re-fatten" them from the storage, which will not find the corresponding sideloaded payload. This is easy to fix.
the sideloaded storage is created too early. I saw a replica that came into existence through a preemptive snapshot and it later failed to create snapshots of its own (again, not finding a sideloaded payload). I looked manually and the replica had a sideloaded storage for replicaID==0 that had the file, but it didn't have a storage for its actual replicaID. Moving the file fixed it. So this is like an off-by-one and though I'm still figuring out the exact sequence of events, I'm sure that too can be fixed. Testing it might take more work.

tbg · 2017-09-12T19:49:35Z

Hmm, 2. is not clear to me. We do create the sideloaded storage a little early, but we replace it every time the replicaID changes (from tryGetOrCreateReplica). Additionally we never sideload anything except from snapshot application (which doesn't do it if replicaID == 0) and regular Raft append (which we would never do with an uninitialized replica). Mysterious.

Update now that I have WiFi: ok, figured it out, I think. The code which created the sideloaded storage in tryGetOrCreateReplica was using the incoming replicaID -- and in the case in which a preemptive snapshot is received by an existing replica, that would switch out the perfectly reasonable sideloaded storage with one at replicaID zero! Easy to fix too.

When a preemptive snapshot is applied, we (have to) write "fat" entries into the log because we don't have a sideloaded storage location yet (no replicaID). However, when that node itself later tries to create snapshot, it ends up in the situation in which an entry has the sideloading raft command version, but isn't actually on disk (but instead already inlined). We weren't handling that case correctly, resulting in (the first half of) cockroachdb#18324 (comment). Remedied that and updated TestRaftSSTableSideloadingInline so that it fails before the fix and passes after.

tbg · 2017-09-13T03:37:10Z

With #18462 in, I believe these problems have been fixed, though I had to restart one node which also moved Raft leadership to other nodes that didn't have the problem in the first place.

The cluster now has node2 decommissioned:

root@pm-cluster-nyc1-01:~# cockroach node status --certs-dir=certs --decommission
+----+-----------------------+---------+---------------------+---------------------+---------+-------------------+--------------------+-------------+
| id |        address        |  build  |     updated_at      |     started_at      | is_live | gossiped_replicas | is_decommissioning | is_draining |
+----+-----------------------+---------+---------------------+---------------------+---------+-------------------+--------------------+-------------+
|  1 | 192.241.150.167:26257 | f701865 | 2017-09-13 03:36:34 | 2017-09-12 06:06:03 | true    |               753 | false              | false       |
|  2 | 165.227.80.203:26257  | f701865 | 2017-09-13 03:36:32 | 2017-09-12 18:44:01 | true    |                 0 | true               | true        |
|  3 | 138.197.195.129:26257 | f701865 | 2017-09-13 03:36:35 | 2017-09-12 06:16:04 | true    |               753 | false              | false       |
|  4 | 146.185.156.213:26257 | 1e7b386 | 2017-09-13 03:36:35 | 2017-09-13 03:12:04 | true    |               753 | false              | false       |
+----+-----------------------+---------+---------------------+---------------------+---------+-------------------+--------------------+-------------+

tbg · 2017-09-13T03:37:24Z

(I'll recommission it just for fun)

tbg · 2017-09-13T04:53:04Z

This issue has been pretty productive and please keep doing what you're doing, @dianasaur323. Going to close this one though since we seem to be all good.

When a preemptive snapshot is applied, we (have to) write "fat" entries into the log because we don't have a sideloaded storage location yet (no replicaID). However, when that node itself later tries to create snapshot, it ends up in the situation in which an entry has the sideloading raft command version, but isn't actually on disk (but instead already inlined). We weren't handling that case correctly, resulting in (the first half of) cockroachdb#18324 (comment). Remedied that and updated TestRaftSSTableSideloadingInline so that it fails before the fix and passes after.

dianasaur323 added this to the 1.1 milestone Sep 7, 2017

cuongdo assigned tbg Sep 7, 2017

tbg mentioned this issue Sep 10, 2017

user cluster has three wedged timeseries ranges #17524

Closed

tbg mentioned this issue Sep 10, 2017

storage: improve snapshot sideloading error #18405

Merged

tbg added a commit to tbg/cockroach that referenced this issue Sep 10, 2017

storage: improve snapshot sideloading error

c930b54

For cockroachdb#18324.

tbg added a commit to tbg/cockroach that referenced this issue Sep 10, 2017

storage: clear Entry before reuse

9dcedd6

See: cockroachdb#18324 (comment) Touches cockroachdb#18324.

This was referenced Sep 10, 2017

storage: clear Entry before reuse #18406

Merged

server, storage, ui: add the lease state to the range debug page #18339

Closed

tbg added a commit to tbg/cockroach that referenced this issue Sep 10, 2017

storage: clear Entry before reuse

adcb743

See: cockroachdb#18324 (comment) Touches cockroachdb#18324.

tbg added a commit to tbg/cockroach that referenced this issue Sep 10, 2017

storage: improve snapshot sideloading error

31d6f93

For cockroachdb#18324.

a-robinson mentioned this issue Sep 11, 2017

storage: Add explanatory messages to preemptive snapshot rejections #18439

Merged

tbg closed this as completed Sep 13, 2017

ui: no data available but cluster appears to be running normally #18324

ui: no data available but cluster appears to be running normally #18324

Comments

dianasaur323 commented Sep 7, 2017 • edited Loading

tbg commented Sep 7, 2017

tbg commented Sep 7, 2017

dianasaur323 commented Sep 9, 2017

tbg commented Sep 10, 2017

tbg commented Sep 10, 2017

tbg commented Sep 10, 2017

tbg commented Sep 10, 2017

tbg commented Sep 10, 2017

tbg commented Sep 10, 2017 • edited Loading

tbg commented Sep 10, 2017

tbg commented Sep 10, 2017

tbg commented Sep 10, 2017

dianasaur323 commented Sep 10, 2017

tbg commented Sep 10, 2017 • edited Loading

tbg commented Sep 10, 2017

nstewart commented Sep 10, 2017

dianasaur323 commented Sep 10, 2017

tbg commented Sep 10, 2017

tbg commented Sep 10, 2017

tbg commented Sep 10, 2017

tbg commented Sep 10, 2017

tbg commented Sep 10, 2017

tbg commented Sep 10, 2017

tbg commented Sep 10, 2017 • edited Loading

tbg commented Sep 10, 2017

tbg commented Sep 10, 2017

tbg commented Sep 10, 2017

tbg commented Sep 10, 2017

tbg commented Sep 11, 2017

a-robinson commented Sep 11, 2017

tbg commented Sep 11, 2017 via email

a-robinson commented Sep 11, 2017

dianasaur323 commented Sep 12, 2017

tbg commented Sep 12, 2017

tbg commented Sep 12, 2017

tbg commented Sep 13, 2017

tbg commented Sep 13, 2017

tbg commented Sep 13, 2017

dianasaur323 commented Sep 7, 2017 •

edited

Loading

tbg commented Sep 10, 2017 •

edited

Loading

tbg commented Sep 10, 2017 •

edited

Loading

tbg commented Sep 10, 2017 •

edited

Loading