Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ui: no data available but cluster appears to be running normally #18324

Closed
dianasaur323 opened this issue Sep 7, 2017 · 38 comments
Closed

ui: no data available but cluster appears to be running normally #18324

dianasaur323 opened this issue Sep 7, 2017 · 38 comments
Assignees
Milestone

Comments

@dianasaur323
Copy link
Contributor

dianasaur323 commented Sep 7, 2017

BUG REPORT

This is running on the PM cluster (4 nodes, 2 in NYC, 1 in SF, 1 in AMS).

NYC1:

I170905 23:53:52.056927 76 util/log/clog.go:888  [config] file created at: 2017/09/05 23:53:52
I170905 23:53:52.056927 76 util/log/clog.go:888  [config] running on machine: pm-cluster-nyc1-01
I170905 23:53:52.056927 76 util/log/clog.go:888  [config] binary: CockroachDB CCL a9ef834 (linux amd64, built 2017/09/05 17:04:18, go1.8.3)
I170905 23:53:52.056927 76 util/log/clog.go:888  [config] arguments: [/usr/local/bin/cockroach start --certs-dir=./certs --advertise-host=192.241.150.167] 

NYC2:

I170906 09:57:30.690041 117 util/log/clog.go:888  [config] file created at: 2017/09/06 09:57:30
I170906 09:57:30.690041 117 util/log/clog.go:888  [config] running on machine: pm-cluster-nyc1-02
I170906 09:57:30.690041 117 util/log/clog.go:888  [config] binary: CockroachDB CCL a9ef834 (linux amd64, built 2017/09/05 17:04:18, go1.8.3)
I170906 09:57:30.690041 117 util/log/clog.go:888  [config] arguments: [/usr/local/bin/cockroach start --certs-dir=./certs --advertise-host=165.227.80.203 --join=192.241.150.167]

SF:

I170907 15:29:09.975391 131 util/log/clog.go:888  [config] file created at: 2017/09/07 15:29:09
I170907 15:29:09.975391 131 util/log/clog.go:888  [config] running on machine: pm-cluster-sfo2-01
I170907 15:29:09.975391 131 util/log/clog.go:888  [config] binary: CockroachDB CCL a9ef834 (linux amd64, built 2017/09/05 17:04:18, go1.8.3)
I170907 15:29:09.975391 131 util/log/clog.go:888  [config] arguments: [/usr/local/bin/cockroach start --certs-dir=./certs --advertise-host=138.197.195.129 --join=192.241.150.167]

AMS:

I170907 04:22:57.344517 106 util/log/clog.go:888  [config] file created at: 2017/09/07 04:22:57
I170907 04:22:57.344517 106 util/log/clog.go:888  [config] running on machine: pm-cluster-ams2-01
I170907 04:22:57.344517 106 util/log/clog.go:888  [config] binary: CockroachDB CCL a9ef834 (linux amd64, built 2017/09/05 17:04:18, go1.8.3)
I170907 04:22:57.344517 106 util/log/clog.go:888  [config] arguments: [/usr/local/bin/cockroach start --certs-dir=./certs --advertise-host=146.185.156.213 --join=192.241.150.167]
  • What did you do? Loaded up the admin UI.

  • What did you expect to see? Expected to see data

  • What did you see instead? Screenshots below

image

image

The interesting thing is that we've had under-replicated ranges for a while. I'm not sure if this is an issue, or an existing just showing itself in a more extreme form.

image

cc @nstewart

@dianasaur323 dianasaur323 added this to the 1.1 milestone Sep 7, 2017
@tbg
Copy link
Member

tbg commented Sep 7, 2017

Screenshot of the pages for range 1, 61, 3 please (or probably we can look at this cluster directly, right?). @nvanbenschoten, this version has the log spy so the Raft problems I anticipate should be straightforward to debug. Perhaps #17524 again.

@tbg
Copy link
Member

tbg commented Sep 7, 2017

We diagnosed #18327 as one of the problems in this cluster. @dianasaur323 to bump the node version to see if problems persist.

@dianasaur323
Copy link
Contributor Author

hmm... I don't think this problem is persisting anymore, although I continue to have quite a few ranges with no lease or underreplicated / slow. I wonder if this has to do with our geo-distributed cluster.

image

@tbg
Copy link
Member

tbg commented Sep 10, 2017

The under-replicated ranges are in fact under-replicated; there are only two replicas for ~30 ranges. The reason for which the third replica was removed is always "store dead". The allocator simulation always suggests adding a store. Yet, this doesn't seem to be happening (@dianasaur323 mentioned seeing refused snapshots, which could be part of the problem).

There may be more problems, but that is definitely one of them.

@tbg
Copy link
Member

tbg commented Sep 10, 2017

There's also still #18339 which I hear is going to be fixed soon, so we'll have a better idea about the "no lease" ranges which looked healthy when I looked.

@tbg
Copy link
Member

tbg commented Sep 10, 2017

Seeing this on the lease holder of one of the underreplicated ranges:

goroutine 109 [select]:
github.com/cockroachdb/cockroach/vendor/golang.org/x/time/rate.(*Limiter).waitN(0xc42aceff40, 0x7f27a344fe10, 0xc42c6cc720, 0x1, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/vendor/golang.org/x/time/rate/rate.go:259 +0x370
github.com/cockroachdb/cockroach/vendor/golang.org/x/time/rate.(*Limiter).WaitN(0xc42aceff40, 0x2ac92a0, 0xc42c6cc720, 0x1, 0xc42c6cc720, 0x0)
	/go/src/github.com/cockroachdb/cockroach/vendor/golang.org/x/time/rate/rate_go17.go:20 +0x6d
github.com/cockroachdb/cockroach/pkg/storage.sendSnapshot(0x7f27a34dc9a8, 0xc42c6cc720, 0xc420019900, 0x7f27a344eac8, 0xc42de20020, 0x2ab1020, 0xc420698840, 0x49, 0x22, 0xc42c6ccb40, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:3375 +0xca2
github.com/cockroachdb/cockroach/pkg/storage.(*RaftTransport).SendSnapshot(0xc4204b24b0, 0x7f27a34dc9a8, 0xc42c6cc720, 0xc420698840, 0x49, 0x22, 0xc42c6ccb40, 0xc4256e66e0, 0xc427743200, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/raft_transport.go:648 +0x2aa
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).sendSnapshot(0xc420c95400, 0x7f27a34dc9a8, 0xc42c6cc720, 0x100000001, 0xc400000003, 0x1d01a4f, 0x4, 0xc400000001, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_command.go:3770 +0x42e
github.com/cockroachdb/cockroach/pkg/storage.(*raftSnapshotQueue).processRaftSnapshot(0xc4201cd3c0, 0x7f27a34dc9a8, 0xc42c6cc720, 0xc420c95400, 0x3, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/raft_snapshot_queue.go:116 +0xed
github.com/cockroachdb/cockroach/pkg/storage.(*raftSnapshotQueue).process(0xc4201cd3c0, 0x7f27a34dc9a8, 0xc42c6cc720, 0xc420c95400, 0x0, 0x0, 0x0, 0x2ad5200, 0xc423d10000)

The cluster is running with defaults, so a raft snapshot should be capped at 8mb/s.

@tbg
Copy link
Member

tbg commented Sep 10, 2017

Logs show many snapshot problems. It looks like all nodes are routinely refusing snapshots and nothing ever gets done because the stores are throttled and even if they weren't they'd refuse most things.

I've also seen this:

E170910 02:38:29.352546 109 storage/queue.go:656 [raftsnapshot,n2,s2,r754/2:/Table/166/7/9{419/2…-879/2…}] snapshot failed: log truncation during snapshot removed sideloaded SSTable

@dianasaur323 I'm curious, did you by any chance manually delete anything from the data directory? You would expect that error only in the wild when actively ingesting a backup. After that, the only explanation I have is that someone deleted the auxiliary directory (or, very occasionally, if a range hasn't seen activity at all after the backup and only now gets a log truncation while a snapshot is active). Either way, just checking because if you did manually delete stuff then that can prevent snapshots for that range basically forever.

@tbg
Copy link
Member

tbg commented Sep 10, 2017

A rough idea about the frequency of these deletions. Note how it's always the same ranges -- that sounds like files were deleted manually.

E170910 02:44:53.190429 126 storage/queue.go:656  [raftsnapshot,n4,s4,r731/1:/Table/166/1/2214{3821…-4024…}] snapshot failed: log truncation during snapshot removed sideloaded SSTable
E170910 02:45:07.273912 126 storage/queue.go:656  [raftsnapshot,n4,s4,r731/1:/Table/166/1/2214{3821…-4024…}] snapshot failed: log truncation during snapshot removed sideloaded SSTable
E170910 02:45:16.677327 126 storage/queue.go:656  [raftsnapshot,n4,s4,r731/1:/Table/166/1/2214{3821…-4024…}] snapshot failed: log truncation during snapshot removed sideloaded SSTable
E170910 02:45:30.297467 126 storage/queue.go:656  [raftsnapshot,n4,s4,r731/1:/Table/166/1/2214{3821…-4024…}] snapshot failed: log truncation during snapshot removed sideloaded SSTable
E170910 02:45:44.046849 126 storage/queue.go:656  [raftsnapshot,n4,s4,r731/1:/Table/166/1/2214{3821…-4024…}] snapshot failed: log truncation during snapshot removed sideloaded SSTable
E170910 02:45:58.115813 126 storage/queue.go:656  [raftsnapshot,n4,s4,r731/1:/Table/166/1/2214{3821…-4024…}] snapshot failed: log truncation during snapshot removed sideloaded SSTable
E170910 02:46:11.536777 126 storage/queue.go:656  [raftsnapshot,n4,s4,r731/1:/Table/166/1/2214{3821…-4024…}] snapshot failed: log truncation during snapshot removed sideloaded SSTable
I170910 02:46:20.831292 123 storage/replicate_queue.go:231  [replicate,n4,s4,r686/1:/Table/166/1/22145{053…-259…}] snapshot failed: log truncation during snapshot removed sideloaded SSTable
E170910 02:46:25.736157 126 storage/queue.go:656  [raftsnapshot,n4,s4,r731/1:/Table/166/1/2214{3821…-4024…}] snapshot failed: log truncation during snapshot removed sideloaded SSTable
I170910 02:46:26.689174 123 storage/replicate_queue.go:231  [replicate,n4,s4,r686/1:/Table/166/1/22145{053…-259…}] snapshot failed: log truncation during snapshot removed sideloaded SSTable
I170910 02:46:32.743099 123 storage/replicate_queue.go:231  [replicate,n4,s4,r686/1:/Table/166/1/22145{053…-259…}] snapshot failed: log truncation during snapshot removed sideloaded SSTable
E170910 02:46:39.799586 126 storage/queue.go:656  [raftsnapshot,n4,s4,r731/1:/Table/166/1/2214{3821…-4024…}] snapshot failed: log truncation during snapshot removed sideloaded SSTable
E170910 02:46:53.359401 126 storage/queue.go:656  [raftsnapshot,n4,s4,r731/1:/Table/166/1/2214{3821…-4024…}] snapshot failed: log truncation during snapshot removed sideloaded SSTable

@tbg
Copy link
Member

tbg commented Sep 10, 2017

Oh, monster duh -- this [the sideloading stuff] is expected with this change: #17787

Basically the old file that was written at ./auxiliary/<rangeID>.<replicaID>/X is now at ./auxiliary/sideloading/<rangeID%1000>/<rangeID>.<replicaID>/X.

@tbg
Copy link
Member

tbg commented Sep 10, 2017

I've made an attempt at copying the files in their new location, using this snippet:

cd ~/cockroach-data/auxiliary && ls && find [0-9]* -type f -print0 | xargs --null -n 1 ./shard.sh
#!/usr/bin/env bash

id=$(echo "$1" | sed -E 's/^([0-9]+).*/\1/g')
shard=$(($id % 1000))

dest="sideloading/${shard}/${1}"
mkdir -p "$(dirname $dest)"
cp -p "$1" "$dest"

(no, that is not idiomatic).

Will check tomorrow if the sideloading messages persist. I assume that if they will, it's because I didn't copy into the right location. But hopefully they're gone and we can check whether they were somehow connected to this (pretty likely since it persistently fails snapshots, which could clog things up if it affects enough ranges).

@tbg
Copy link
Member

tbg commented Sep 10, 2017

Hmm. Seems to continue, but the odd thing I'm noticing only now is that the replicas for which I see that error have a new-style sideloading directory already, and there's (usually) one SSTable in it. They don't even have the old-style directory, which makes it seem like they would be unaffected by the problem described in my last post. (Sample of 2, so might not hold for all of them).

It's definitely awful that that error doesn't give us but a clue as to which file it's looking for. I'll make sure to fix that for 1.1. But, regardless, it looks more as if the code is looking in the old place and the file is really in the new location. Or we deleted something during truncation we weren't supposed to delete - a one-off would be most likely here, but I didn't find one.

@tbg
Copy link
Member

tbg commented Sep 10, 2017

@dianasaur323 would be interested in when you imported into this cluster and which version you were running. The change above was merged at Date: Thu Aug 31 07:56:46 2017 -0400. which I believe precedes the ranges I've looked at.

Hmm, I need to check what we are doing when a range splits. We should be copying (parts of the sideloaded storage) around. Pretty sure we don't, perhaps this is at work here (and even if it is, it's a pretty bad omission).

@dianasaur323
Copy link
Contributor Author

@tschottdorf wow, you did a lot of work in the last couple hours.... I haven't been manually deleting anything, although our cluster right now is fully automated. I believe @nstewart has some cron jobs that spin up nodes if they die, and also pull the most recent binary on a daily basis. Should be the keep-alive.sh and upgrade-and-kill.sh files located in the root directory. I'm noticing that the upgrade-and-kill.sh file kills all cockroach nodes versus doing a rolling upgrade. Could that be causing a problem?

@tbg
Copy link
Member

tbg commented Sep 10, 2017

Hmm, I need to check what we are doing when a range splits. We should be copying (parts of the sideloaded storage) around. Pretty sure we don't, perhaps this is at work here (and even if it is, it's a pretty bad omission).

Eh, too late to write comments. The RHS of a split starts with an empty log so the above is false.

Still, I'm lacking a sensible theory as to what's going on with these missing files.

@tbg
Copy link
Member

tbg commented Sep 10, 2017

Could that be causing a problem?

No, or rather, if it is, then that's something that would need to be fixed. Fairly certain it shouldn't though.

@nstewart
Copy link
Contributor

@dianasaur323 killall is local to the node and the cron for that command is offset by 5 mins so it should be a rolling upgrade

@dianasaur323
Copy link
Contributor Author

@nstewart ah, excellent! thanks for clarifying that.

@tbg
Copy link
Member

tbg commented Sep 10, 2017

Yikes, there's a range that's very clearly in deep trouble (not under-replicated though). This is again due to the sideloading files missing. I'm mildly confident that that is the main problem at work here.

image

@tbg
Copy link
Member

tbg commented Sep 10, 2017

(btw, if you're wondering how I made that screenshot, it's ./tunnel xxx.xxx.xxx.xxx:8080 '/#/reports/range/676' with https://github.com/tschottdorf/roacheteer)

@tbg
Copy link
Member

tbg commented Sep 10, 2017

For r676, I checked the sideloading directory and a) it has only the new one and b) there's only one SSTable in there, at index 21/term 7. Looking at the screenshot above, the snapshot for the trailing replica would contain that SSTable. Thinking that maybe it was looking for the wrong term I made a few copies of the file at all relevant terms, but no impact. Well, #18405 will tell us what the problem is.

tbg added a commit to tbg/cockroach that referenced this issue Sep 10, 2017
@tbg
Copy link
Member

tbg commented Sep 10, 2017

Bumped the cluster to include #18405.

@tbg
Copy link
Member

tbg commented Sep 10, 2017

Ok, here's the updated message:

E170910 18:02:27.879579 140 storage/queue.go:656  [raftsnapshot,n3,s3,r676/1:/Table/165/{3/10223…-4}] snapshot failed: log truncation during snapshot removed sideloaded SSTable at index 22, term 8

This is interesting. As I mentioned above, the file we have in that replica's sideloaded storage is i21.t8. If a truncation had been at fault, we would expect to not see any files at indexes lower than the missing one (22). The fact that we see i21.t8 suggests that perhaps this is not a truncation-related problem.

@tbg
Copy link
Member

tbg commented Sep 10, 2017

node 2, the other member of the raft group which is up to date, also only has i21.t8 and nothing else.

@tbg
Copy link
Member

tbg commented Sep 10, 2017

/Local/RangeID/676/u/RaftLog/logIndex:21: Term:7 Index:21  by repl=(n3,s3):1 start=1504212874.096533018,0 epo=38 pro=1504212874.096535332,0
proposer_replica:<node_id:3 store_id:3 replica_id:1 > proposer_lease:<start:<wall_time:1504212874096533018 logical:0 > expiration:<wall_time:0 logical:0 > replica:<node_id:3 store_id:3 replica_id:1 > deprecated_start_stasis:<wall_time:0 logical:0 > proposed_ts:<wall_time:1504212874096535332 logical:0 > epoch:38 > max_lease_index:7 replicated_eval_result:<start_key:"\366\245\213\370\001\217X\375\003\022\224\354\214\260\000\001\210" end_key:"\366\245\213\370\002I\357\375\003\022\232\273\312I\000\001\210\000" block_reads:false state:<gc_threshold:<wall_time:0 logical:0 > stats:<contains_estimates:false last_update_nanos:0 intent_age:0 gc_bytes_age:0 live_bytes:0 live_count:0 key_bytes:0 key_count:0 val_bytes:0 val_count:0 intent_bytes:0 intent_count:0 sys_bytes:0 sys_count:0 > txn_span_gc_threshold:<wall_time:0 logical:0 > > is_lease_request:false is_freeze:false timestamp:<wall_time:1504212934211614036 logical:0 > is_consistency_related:false delta:<contains_estimates:false last_update_nanos:1504212934211614036 intent_age:0 gc_bytes_age:0 live_bytes:16730385 live_count:478011 key_bytes:14340330 key_count:478011 val_bytes:2390055 val_count:478011 intent_bytes:0 intent_count:0 sys_bytes:0 sys_count:0 > add_sstable:<crc32:1448830645 > > write_batch:<data:"\000\000\000\000\000\000\000\000\000\000\000\000" >

/Local/RangeID/676/u/RaftLog/logIndex:22: Term:8 Index:22 : EMPTY

/Local/RangeID/676/u/RaftLog/logIndex:23: Term:9 Index:23 : EMPTY

/Local/RangeID/676/u/RaftLog/logIndex:24: Term:10 Index:24 : EMPTY

/Local/RangeID/676/u/RaftLog/logIndex:25: Term:11 Index:25 : EMPTY

This is surprising. The empty entry at 22 indicates that the Raft leader proposed an empty entry here (as it does after leader election). A sideloaded proposal looks like the one at index 21. So the conjecture is that the actual sideloaded proposal is i21.t7 (which we do have on disk) but for some reason we're trying to treat i22 as a sideloaded proposal, which it is not.

This looks like we're seeing an empty entry but somehow mistaking it for a sideloaded entry. This would happen if it had the wrong command encoding version, but an empty entry has an empty version.

@tbg
Copy link
Member

tbg commented Sep 10, 2017

My conjecture is that the fact that there is a sideloaded entry at 21 and then an empty one at 22 somehow causes the bug. For example, this would happen if we reuse the same Entry struct but don't clear it. (But we do).

@tbg
Copy link
Member

tbg commented Sep 10, 2017

Uh, actually:

func TestFoo(t *testing.T) {
	var ent raftpb.Entry
	ent.Data = []byte("foo")
	if err := ent.Unmarshal([]byte{}); err != nil {
		t.Fatal(err)
	}
	// --- FAIL: TestFoo (0.00s)
	// store_test.go:61: {0 0 EntryNormal [102 111 111] []}
	if len(ent.Data) != 0 {
		t.Fatal(ent)
	}
}

That does explain it because

cockroach/pkg/storage/store.go

Lines 3430 to 3437 in 1fd366d

var ent raftpb.Entry
for i := range logEntries {
if err := ent.Unmarshal(logEntries[i]); err != nil {
return err
}
if !sniffSideloadedRaftCommand(ent.Data) {
continue
}
.

@tbg
Copy link
Member

tbg commented Sep 10, 2017

((m *Entry).Unmarshal does not reset, proto.Unmarshal which is what I looked at before, does). That bug might lurk elsewhere as well, but I'll send a patch for this particular incarnation now.

@tbg
Copy link
Member

tbg commented Sep 10, 2017

Having restarted with the fix, r767 is now operational. The system appears to be busy applying (actually applying) snapshots. Not seeing any more sideloading snapshot errors in any of the logs.

There are still a bunch of occurrences of #18339, so that issue should be addressed ASAP.

tbg added a commit to tbg/cockroach that referenced this issue Sep 10, 2017
tbg added a commit to tbg/cockroach that referenced this issue Sep 10, 2017
@tbg
Copy link
Member

tbg commented Sep 11, 2017

Everything replicated back up, but there's one single holdover:

I170911 00:01:50.558255 117 storage/replica_raftstorage.go:516  [replicate,n3,s3,r779/1:/Table/172/{1-2}] generated preemptive snapshot dd348fc5 at index 127
I170911 00:01:50.643202 117 storage/replicate_queue.go:231  [replicate,n3,s3,r779/1:/Table/172/{1-2}] snapshot failed: (n1,s1):?: remote declined snapshot: reservation rejected
I170911 00:01:50.691113 117 storage/replica_raftstorage.go:516  [replicate,n3,s3,r779/1:/Table/172/{1-2}] generated preemptive snapshot e1e96a36 at index 127
I170911 00:01:50.850667 117 storage/replicate_queue.go:231  [replicate,n3,s3,r779/1:/Table/172/{1-2}] snapshot failed: (n4,s4):?: remote declined snapshot: reservation rejected
E170911 00:01:50.942205 117 storage/queue.go:656  [replicate,n3,s3,r779/1:/Table/172/{1-2}] 2 matching stores are currently throttled

... oops, didn't post this for an hour, and now it has upreplicated. Well, problems fixed, then, I suppose? Though it's not clear why the stores were throttled for that long. Any ideas @a-robinson?

@a-robinson
Copy link
Contributor

There are 4 reasons that a store may be rejecting snapshots:

  1. It's draining before shutting down
  2. It's currently receiving/applying a snapshot
  3. It's currently removing a replica
  4. It's too far behind on too many raft ranges.

Options 2, 3, and 4 could all easily be true on a node that just came up, and 3 and 4 might take a while before they stop being true.

@tbg
Copy link
Member

tbg commented Sep 11, 2017 via email

@a-robinson
Copy link
Contributor

We can't easily distinguish between 2 and 3 (and we do already include an extra message for 1), but differentiating 4 couldn't hurt. We could probably figure it out from graphs, but we don't need to require that extra step.

@dianasaur323
Copy link
Contributor Author

hmm... checking my debug page shows this:

image

@tbg
Copy link
Member

tbg commented Sep 12, 2017

^- this is probably #18055

I found two more issues related to this sideloading in this cluster.

  1. when applying a preemptive snapshot, we (have to) write "fat" entries into the log because we don't have a sideloaded storage location yet (no replicaID). However, when that node itself later tries to create snapshot, it looks at these fat proposals and mistakenly tries to "re-fatten" them from the storage, which will not find the corresponding sideloaded payload. This is easy to fix.
  2. the sideloaded storage is created too early. I saw a replica that came into existence through a preemptive snapshot and it later failed to create snapshots of its own (again, not finding a sideloaded payload). I looked manually and the replica had a sideloaded storage for replicaID==0 that had the file, but it didn't have a storage for its actual replicaID. Moving the file fixed it. So this is like an off-by-one and though I'm still figuring out the exact sequence of events, I'm sure that too can be fixed. Testing it might take more work.

@tbg
Copy link
Member

tbg commented Sep 12, 2017

Hmm, 2. is not clear to me. We do create the sideloaded storage a little early, but we replace it every time the replicaID changes (from tryGetOrCreateReplica). Additionally we never sideload anything except from snapshot application (which doesn't do it if replicaID == 0) and regular Raft append (which we would never do with an uninitialized replica). Mysterious.

Update now that I have WiFi: ok, figured it out, I think. The code which created the sideloaded storage in tryGetOrCreateReplica was using the incoming replicaID -- and in the case in which a preemptive snapshot is received by an existing replica, that would switch out the perfectly reasonable sideloaded storage with one at replicaID zero! Easy to fix too.

tbg added a commit to tbg/cockroach that referenced this issue Sep 12, 2017
When a preemptive snapshot is applied, we (have to) write "fat" entries into the
log because we don't have a sideloaded storage location yet (no replicaID).

However, when that node itself later tries to create snapshot, it ends up in the
situation in which an entry has the sideloading raft command version, but isn't
actually on disk (but instead already inlined). We weren't handling that case
correctly, resulting in (the first half of)
cockroachdb#18324 (comment).

Remedied that and updated TestRaftSSTableSideloadingInline so that it fails before
the fix and passes after.
tbg added a commit to tbg/cockroach that referenced this issue Sep 13, 2017
When a preemptive snapshot is applied, we (have to) write "fat" entries into the
log because we don't have a sideloaded storage location yet (no replicaID).

However, when that node itself later tries to create snapshot, it ends up in the
situation in which an entry has the sideloading raft command version, but isn't
actually on disk (but instead already inlined). We weren't handling that case
correctly, resulting in (the first half of)
cockroachdb#18324 (comment).

Remedied that and updated TestRaftSSTableSideloadingInline so that it fails before
the fix and passes after.
@tbg
Copy link
Member

tbg commented Sep 13, 2017

With #18462 in, I believe these problems have been fixed, though I had to restart one node which also moved Raft leadership to other nodes that didn't have the problem in the first place.

The cluster now has node2 decommissioned:

root@pm-cluster-nyc1-01:~# cockroach node status --certs-dir=certs --decommission
+----+-----------------------+---------+---------------------+---------------------+---------+-------------------+--------------------+-------------+
| id |        address        |  build  |     updated_at      |     started_at      | is_live | gossiped_replicas | is_decommissioning | is_draining |
+----+-----------------------+---------+---------------------+---------------------+---------+-------------------+--------------------+-------------+
|  1 | 192.241.150.167:26257 | f701865 | 2017-09-13 03:36:34 | 2017-09-12 06:06:03 | true    |               753 | false              | false       |
|  2 | 165.227.80.203:26257  | f701865 | 2017-09-13 03:36:32 | 2017-09-12 18:44:01 | true    |                 0 | true               | true        |
|  3 | 138.197.195.129:26257 | f701865 | 2017-09-13 03:36:35 | 2017-09-12 06:16:04 | true    |               753 | false              | false       |
|  4 | 146.185.156.213:26257 | 1e7b386 | 2017-09-13 03:36:35 | 2017-09-13 03:12:04 | true    |               753 | false              | false       |
+----+-----------------------+---------+---------------------+---------------------+---------+-------------------+--------------------+-------------+

@tbg
Copy link
Member

tbg commented Sep 13, 2017

(I'll recommission it just for fun)

@tbg
Copy link
Member

tbg commented Sep 13, 2017

This issue has been pretty productive and please keep doing what you're doing, @dianasaur323. Going to close this one though since we seem to be all good.

@tbg tbg closed this as completed Sep 13, 2017
tbg added a commit to tbg/cockroach that referenced this issue Apr 16, 2018
When a preemptive snapshot is applied, we (have to) write "fat" entries into the
log because we don't have a sideloaded storage location yet (no replicaID).

However, when that node itself later tries to create snapshot, it ends up in the
situation in which an entry has the sideloading raft command version, but isn't
actually on disk (but instead already inlined). We weren't handling that case
correctly, resulting in (the first half of)
cockroachdb#18324 (comment).

Remedied that and updated TestRaftSSTableSideloadingInline so that it fails before
the fix and passes after.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants