-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stability: resurrecting registration cluster #6991
Comments
data backed up on each node to: |
the output of |
blast. we don't seem to be getting into a stable enough state to actually apply the zone config change. |
ok, I added swap on each machine and the snapshot for range 1 went through. sql is usable again (including zone commands) |
Looks like you picked the wrong node to run Did that node have an extended period of downtime prior to this? I think this is just a case of the raft logs growing without bound while a node is down and there is no healthy node to repair onto. |
One node in the registration cluster died:
|
Last runtime stats were
I think the machines have ~7gig of ram, so that points to an issue here. Some of the other nodes are similarly high:
with cockroach reporting just shy of 7gb RSS. The cluster should still be working with a node down. It clearly doesn't. The UI isn't accessible from the outside, so poking that way is a bit awkward. In any case, some raft groups are pretty long (I tried 1 which didn't exist and then 2 gave the following):
|
The version running is |
Some more random tidbits from one of the nodes:
Rudimentary elinks-based poking on the debug/requests endpoint shows... well, just a lot of NotLeaderErrors (without a new lease holder hint). We need to a way for us to access the admin port to make this debugging less painful. In light of all of these bugs that we fixed since May 30, I also think we should update that version ASAP. I'm not sure what our protocol is wrt this cluster - can I simply do that? |
I don't know what our protocol is either, but I would lean toward yes. |
Ok. I'll pull a backup off the dataset and run last night's beta. |
One node died a few minutes in with OOM, presumably due to snapshotting.
|
That raft log is ginormous. Why do we send the full raft log on snapshots? |
I think this is probably a huge Raft log that was created prior to our truncation improvements, but which was picked up by the replication queue before the truncation queue. Maybe we should put a failsafe into snapshot creation (so that any snapshot which exceeds a certain size isn't even fully created)? |
Seems easier to only snapshot the necessary tail of the raft log. For a snapshot, I think we only have to send anything past the applied index of the raft log which should be very small. Ah, strike this. Now I recall that raft log truncation is itself a raft operation. Ok, I think putting a failsafe to avoid creating excessively large snapshots is reasonable. I'll file an issue. |
it died again at the same range. I think that failsafe is worthwhile - it would give the truncation queue a chance to pick it up first. The failsafe could even aggressively queue the truncation. |
I'm a bit out of ideas as to how to proceed right now. In an ideal world, I could restart the cluster with upreplication turned off, and wait for the truncation queue to do its job. |
Is this happening on just one node? Are all ranges fully replicated to other nodes? Can you simply nuke this one node? |
There's very little visibility since I can't access the admin ui from outside. Anyone have experience setting up an ssh-tunnel-proxy? If one node tries to send that snapshot, chances are it's the same on the other nodes or underreplicated. In both cases, nuking the first node won't help. I also think I saw two nodes die already. |
If you're running insecure you can do: |
It's a secure cluster. I'll give it a try though. |
Should still work with a secure cluster. |
It simply works, great. Thanks @petermattis. Would you mind running |
Where is this |
It's my local clone of our non-public |
Got it. |
I realized that I hadn't actually managed to run the updated version because |
Restarted one of the nodes. Magically that seems to have brought the first range back in the game. Snapshot sending time. |
Still in critical state, though. Had to restart one of the nodes again (to resuscitate first range gossip). Sometimes things are relatively quiet, then large swaths of
I think those might have to do with us reporting "unreachable" to Raft every time the outgoing message queue is full (cc @tamird). Too bad I'm not running with the per-replica outboxes yet. |
As discovered in cockroachdb#6991 (comment), it's possible that we apply a Raft snapshot without writing a corresponding HardState since we write the snapshot in its own batch first and only then write a HardState. If that happens, the server is going to panic on restart: It will have a nontrivial first index, but a committed index of zero (from the empty HardState). This change prevents us from applying a snapshot when there is no HardState supplied along with it, except when applying a preemptive snapshot (in which case we synthesize a HardState).
See cockroachdb#6991. It's possible that the HardState is missing after a snapshot was applied (so there is a TruncatedState). In this case, synthesize a HardState (simply setting everything that was in the snapshot to committed). Having lost the original HardState can theoretically mean that the replica was further ahead or had voted, and so there's no guarantee that this will be correct. But it will be correct in the majority of cases, and some state *has* to be recovered. To illustrate this in the scenario in cockroachdb#6991: There, we (presumably) have applied an empty snapshot (no real data, but a Raft log which starts and ends at index ten as designated by its TruncatedState). We don't have a HardState, so Raft will crash because its Commit index zero isn't in line with the fact that our Raft log starts only at index ten. The migration sees that there is a TruncatedState, but no HardState. It will synthesize a HardState with Commit:10 (and the corresponding Term from the TruncatedState, which is five).
As discovered in cockroachdb#6991 (comment), it's possible that we apply a Raft snapshot without writing a corresponding HardState since we write the snapshot in its own batch first and only then write a HardState. If that happens, the server is going to panic on restart: It will have a nontrivial first index, but a committed index of zero (from the empty HardState). This change prevents us from applying a snapshot when there is no HardState supplied along with it, except when applying a preemptive snapshot (in which case we synthesize a HardState). Ensure that a new HardState does not break promises made by an existing one during preemptive snapshot application. Fixes cockroachdb#7619.
See cockroachdb#6991. It's possible that the HardState is missing after a snapshot was applied (so there is a TruncatedState). In this case, synthesize a HardState (simply setting everything that was in the snapshot to committed). Having lost the original HardState can theoretically mean that the replica was further ahead or had voted, and so there's no guarantee that this will be correct. But it will be correct in the majority of cases, and some state *has* to be recovered. To illustrate this in the scenario in cockroachdb#6991: There, we (presumably) have applied an empty snapshot (no real data, but a Raft log which starts and ends at index ten as designated by its TruncatedState). We don't have a HardState, so Raft will crash because its Commit index zero isn't in line with the fact that our Raft log starts only at index ten. The migration sees that there is a TruncatedState, but no HardState. It will synthesize a HardState with Commit:10 (and the corresponding Term from the TruncatedState, which is five).
As discovered in cockroachdb#6991 (comment), it's possible that we apply a Raft snapshot without writing a corresponding HardState since we write the snapshot in its own batch first and only then write a HardState. If that happens, the server is going to panic on restart: It will have a nontrivial first index, but a committed index of zero (from the empty HardState). This change prevents us from applying a snapshot when there is no HardState supplied along with it, except when applying a preemptive snapshot (in which case we synthesize a HardState). Ensure that a new HardState does not break promises made by an existing one during preemptive snapshot application. Fixes cockroachdb#7619.
See cockroachdb#6991. It's possible that the HardState is missing after a snapshot was applied (so there is a TruncatedState). In this case, synthesize a HardState (simply setting everything that was in the snapshot to committed). Having lost the original HardState can theoretically mean that the replica was further ahead or had voted, and so there's no guarantee that this will be correct. But it will be correct in the majority of cases, and some state *has* to be recovered. To illustrate this in the scenario in cockroachdb#6991: There, we (presumably) have applied an empty snapshot (no real data, but a Raft log which starts and ends at index ten as designated by its TruncatedState). We don't have a HardState, so Raft will crash because its Commit index zero isn't in line with the fact that our Raft log starts only at index ten. The migration sees that there is a TruncatedState, but no HardState. It will synthesize a HardState with Commit:10 (and the corresponding Term from the TruncatedState, which is five).
As discovered in cockroachdb#6991 (comment), it's possible that we apply a Raft snapshot without writing a corresponding HardState since we write the snapshot in its own batch first and only then write a HardState. If that happens, the server is going to panic on restart: It will have a nontrivial first index, but a committed index of zero (from the empty HardState). This change prevents us from applying a snapshot when there is no HardState supplied along with it, except when applying a preemptive snapshot (in which case we synthesize a HardState). Ensure that a new HardState does not break promises made by an existing one during preemptive snapshot application. Fixes cockroachdb#7619.
See cockroachdb#6991. It's possible that the HardState is missing after a snapshot was applied (so there is a TruncatedState). In this case, synthesize a HardState (simply setting everything that was in the snapshot to committed). Having lost the original HardState can theoretically mean that the replica was further ahead or had voted, and so there's no guarantee that this will be correct. But it will be correct in the majority of cases, and some state *has* to be recovered. To illustrate this in the scenario in cockroachdb#6991: There, we (presumably) have applied an empty snapshot (no real data, but a Raft log which starts and ends at index ten as designated by its TruncatedState). We don't have a HardState, so Raft will crash because its Commit index zero isn't in line with the fact that our Raft log starts only at index ten. The migration sees that there is a TruncatedState, but no HardState. It will synthesize a HardState with Commit:10 (and the corresponding Term from the TruncatedState, which is five).
As discovered in cockroachdb#6991 (comment), it's possible that we apply a Raft snapshot without writing a corresponding HardState since we write the snapshot in its own batch first and only then write a HardState. If that happens, the server is going to panic on restart: It will have a nontrivial first index, but a committed index of zero (from the empty HardState). This change prevents us from applying a snapshot when there is no HardState supplied along with it, except when applying a preemptive snapshot (in which case we synthesize a HardState). Ensure that a new HardState does not break promises made by an existing one during preemptive snapshot application. Fixes cockroachdb#7619.
See cockroachdb#6991. It's possible that the HardState is missing after a snapshot was applied (so there is a TruncatedState). In this case, synthesize a HardState (simply setting everything that was in the snapshot to committed). Having lost the original HardState can theoretically mean that the replica was further ahead or had voted, and so there's no guarantee that this will be correct. But it will be correct in the majority of cases, and some state *has* to be recovered. To illustrate this in the scenario in cockroachdb#6991: There, we (presumably) have applied an empty snapshot (no real data, but a Raft log which starts and ends at index ten as designated by its TruncatedState). We don't have a HardState, so Raft will crash because its Commit index zero isn't in line with the fact that our Raft log starts only at index ten. The migration sees that there is a TruncatedState, but no HardState. It will synthesize a HardState with Commit:10 (and the corresponding Term from the TruncatedState, which is five).
See cockroachdb#6991. It's possible that the HardState is missing after a snapshot was applied (so there is a TruncatedState). In this case, synthesize a HardState (simply setting everything that was in the snapshot to committed). Having lost the original HardState can theoretically mean that the replica was further ahead or had voted, and so there's no guarantee that this will be correct. But it will be correct in the majority of cases, and some state *has* to be recovered. To illustrate this in the scenario in cockroachdb#6991: There, we (presumably) have applied an empty snapshot (no real data, but a Raft log which starts and ends at index ten as designated by its TruncatedState). We don't have a HardState, so Raft will crash because its Commit index zero isn't in line with the fact that our Raft log starts only at index ten. The migration sees that there is a TruncatedState, but no HardState. It will synthesize a HardState with Commit:10 (and the corresponding Term from the TruncatedState, which is five).
As discovered in cockroachdb#6991 (comment), it's possible that we apply a Raft snapshot without writing a corresponding HardState since we write the snapshot in its own batch first and only then write a HardState. If that happens, the server is going to panic on restart: It will have a nontrivial first index, but a committed index of zero (from the empty HardState). This change prevents us from applying a snapshot when there is no HardState supplied along with it, except when applying a preemptive snapshot (in which case we synthesize a HardState). Ensure that the new HardState and Raft log does not break promises made by an existing one during preemptive snapshot application. Fixes cockroachdb#7619. storage: prevent loss of uncommitted log entries
See cockroachdb#6991. It's possible that the HardState is missing after a snapshot was applied (so there is a TruncatedState). In this case, synthesize a HardState (simply setting everything that was in the snapshot to committed). Having lost the original HardState can theoretically mean that the replica was further ahead or had voted, and so there's no guarantee that this will be correct. But it will be correct in the majority of cases, and some state *has* to be recovered. To illustrate this in the scenario in cockroachdb#6991: There, we (presumably) have applied an empty snapshot (no real data, but a Raft log which starts and ends at index ten as designated by its TruncatedState). We don't have a HardState, so Raft will crash because its Commit index zero isn't in line with the fact that our Raft log starts only at index ten. The migration sees that there is a TruncatedState, but no HardState. It will synthesize a HardState with Commit:10 (and the corresponding Term from the TruncatedState, which is five).
As discovered in cockroachdb#6991 (comment), it's possible that we apply a Raft snapshot without writing a corresponding HardState since we write the snapshot in its own batch first and only then write a HardState. If that happens, the server is going to panic on restart: It will have a nontrivial first index, but a committed index of zero (from the empty HardState). This change prevents us from applying a snapshot when there is no HardState supplied along with it, except when applying a preemptive snapshot (in which case we synthesize a HardState). Ensure that the new HardState and Raft log does not break promises made by an existing one during preemptive snapshot application. Fixes cockroachdb#7619. storage: prevent loss of uncommitted log entries
See cockroachdb#6991. It's possible that the HardState is missing after a snapshot was applied (so there is a TruncatedState). In this case, synthesize a HardState (simply setting everything that was in the snapshot to committed). Having lost the original HardState can theoretically mean that the replica was further ahead or had voted, and so there's no guarantee that this will be correct. But it will be correct in the majority of cases, and some state *has* to be recovered. To illustrate this in the scenario in cockroachdb#6991: There, we (presumably) have applied an empty snapshot (no real data, but a Raft log which starts and ends at index ten as designated by its TruncatedState). We don't have a HardState, so Raft will crash because its Commit index zero isn't in line with the fact that our Raft log starts only at index ten. The migration sees that there is a TruncatedState, but no HardState. It will synthesize a HardState with Commit:10 (and the corresponding Term from the TruncatedState, which is five).
As discovered in cockroachdb#6991 (comment), it's possible that we apply a Raft snapshot without writing a corresponding HardState since we write the snapshot in its own batch first and only then write a HardState. If that happens, the server is going to panic on restart: It will have a nontrivial first index, but a committed index of zero (from the empty HardState). This change prevents us from applying a snapshot when there is no HardState supplied along with it, except when applying a preemptive snapshot (in which case we synthesize a HardState). Ensure that the new HardState and Raft log does not break promises made by an existing one during preemptive snapshot application. Fixes cockroachdb#7619. storage: prevent loss of uncommitted log entries
See cockroachdb#6991. It's possible that the HardState is missing after a snapshot was applied (so there is a TruncatedState). In this case, synthesize a HardState (simply setting everything that was in the snapshot to committed). Having lost the original HardState can theoretically mean that the replica was further ahead or had voted, and so there's no guarantee that this will be correct. But it will be correct in the majority of cases, and some state *has* to be recovered. To illustrate this in the scenario in cockroachdb#6991: There, we (presumably) have applied an empty snapshot (no real data, but a Raft log which starts and ends at index ten as designated by its TruncatedState). We don't have a HardState, so Raft will crash because its Commit index zero isn't in line with the fact that our Raft log starts only at index ten. The migration sees that there is a TruncatedState, but no HardState. It will synthesize a HardState with Commit:10 (and the corresponding Term from the TruncatedState, which is five).
via @tschottdorf: raw data is preserved in s3, but has been dumped and imported into the new cluster. |
@bdarnell: I'll be keeping track of actions and results here.
Quick summary:
the registration cluster is falling over repeatedly due to large snapshot sizes. Specifically, recipients of range 1 snapshots OOM during
applySnapshot
.eg, on node 2
ec2-52-91-3-164.compute-1.amazonaws.com
:There is no corresponding "applied snapshot for range 1" message, and the stack trace does list an
applySnapshot
entry. Can't confirm from the trace that it is for that range (the range ID is not one of the simple arguments), but it most likely is. Similar pattern appeared multiple times.I will perform the following to try to resurrect the cluster:
The text was updated successfully, but these errors were encountered: