-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stability: crash at startup: panic: [group 903] entries[1:12) is unavailable from storage #7389
Comments
I'll leave things are they are for now, but then I'll try reverting back to the previous binary (sha a51f9e4) |
data and logs backed up on all nodes to |
confirmed that rolling back to a51f9e4 works. |
This looks related to (or the same as) #7386. |
This is still broken on |
@mberhault Still broken as in you're still seeing the same crashes? |
yup. Here's a summary of running sha and panic message (or lack of it) per node.
|
I'll look at the original test failure. It's not surprising that once we On Wed, Jun 22, 2016 at 12:31 PM marc [email protected] wrote:
-- Tobias |
The same is happening on gamma, I'm pretty sure (90%) without having even touched the data using code that has Pete's change in. Summary of running binaries and panic. Plus annotations of what I did here and there.
|
Documenting some of the post-mortem work:
I looked at some other dumps with @mberhault (preceding his upgrades, I think on From
That's not really a smoking gun since this range contains mostly time-series data, so we didn't really expect any other transactions to be anchored here. Still, let's look at the timestamps.
In the most likely interpretation, something went wrong when the Playing the same game for the other Replica 901 only reveals a split, but no rebalance. Possibly another node would have more. What's a bit puzzling with the above narrative is that this should really only clobber a Replica, not the Replica set. Going to look more. |
Picking up from that last clue (all of the replica set is clobbered), I looked at the replica on the second node. The output of
The interesting bit is that the timestamp of the ChangeReplicas occurs three times, and its timestamp 1466425098.* occurs both in the last update to MVCC and the RangeDescriptor key version, and that the stats are completely messed up (maybe explaining why there is only a single timeseries value in the range). Considering these things makes it unlikely that the replica GC queue is at fault here. After all, the history of the RangeDescriptor shows healthy three nodes, then adding Node4, and finally removing it. We're looking at Node{1,2}, so there's no reason we should ever have been clobbered, negating earlier assumptions. |
At this point, my impression is that we're simply looking at a very skinny range - it contains only a single key, We might be looking at a corruption of the internal state here which comes from a different angle. I'm looking at the Raft log (and internal Raft state) next to get more intel. |
For posterity: the easiest way to grab all of the range prefix is
where the gibberish hex was obtained by adding this in func TestFoo(t *testing.T) {
k := MakeRangeIDUnreplicatedPrefix(1)
t.Fatalf("%q", string(k))
} (and the |
Looking at 903 on rho1 (which panic'ed with From
If I look at the backup instead ( It's not unlikely that similar discrepancies would trip up older clusters (for example, before my refactorings some state was synthesized on demand, whereas after them any such state was persisted before the raft group needed said state). It may be difficult to prove conclusively, but armed with a data dump and an old enough version we should be able to prove this, and hopefully we can add a migration strategy (writing the missing state when booting up the stores). I'll take a look at that tomorrow. #7310 is the change I think is responsible after all. If the beforementioned is true, #7404 really seems like an unrelated problem with similar symptoms that happened to pop up at about the same time. |
The precise location which I think causes trouble is As mentioned before, the ranges we were looking at barely held any data. In particular, there's nothing really that ever touched them, and there was no need to truncate the logs. As a result, no With the new code, it wouldn't synthesize anything but simply return a zero value, which explains exactly the problem: It should return a FirstIndex of 11 (which is also correctly the first index present in the never-truncated log), but returns 0. Thus Raft tries to load the entries |
I'm going to whip up a "migration strategy" (it's a bit silly: going to have to synthesize a TruncatedState for those Raft groups which don't have one at startup before the corresponding Replica becomes active. The TruncatedState is part of the replicated state, but it should work out fine to write it on each individual replica, as we always overwrite this value blindly in case of a truncation being about to be applied on some of the replicas only). Will test with @mberhault. |
This migration strategy should be able to salvage the currently borked beta On Thu, Jun 23, 2016 at 10:50 AM, Tobias Schottdorf <
|
I thought |
Add those which were not decoded while looking at cockroachdb#7389.
In cockroachdb#7310, we added (and enforced) assumptions on what was persisted to disk, but did not ensure that older versions were migrated properly. This in turn lead to cockroachdb#7389. A migration is only necessary for ranges which had never been truncated, and so it should be fairly rare in practice. The migration simply writes the truncated state that would otherwise been synthesized by the old code, which is technically a consistency violation, but should work well in practice because all nodes will write that state on startup when they are rebooted.
Add those which were not decoded while looking at cockroachdb#7389.
In cockroachdb#7310, we added (and enforced) assumptions on what was persisted to disk, but did not ensure that older versions were migrated properly. This in turn lead to cockroachdb#7389. A migration is only necessary for ranges which had never been truncated, and so it should be fairly rare in practice. The migration simply writes the truncated state that would otherwise been synthesized by the old code, which is technically a consistency violation, but should work well in practice because all nodes will write that state on startup when they are rebooted.
Also straightened out some kinks with the existing reference tests, namely that they were creating two single-node clusters per test which were never used. Instead, we can now create zero-node clusters and run one-shot containers on it (one would argue that that's still weird, but it seems acceptable for the time being). The newly added reference test will crash when run for commits between cockroachdb#7310 and cockroachdb#7429. See cockroachdb/postgres-test#13 for the initial state used in the test.
Also straightened out some kinks with the existing reference tests, namely that they were creating two single-node clusters per test which were never used. Instead, we can now create zero-node clusters and run one-shot containers on it (one would argue that that's still weird, but it seems acceptable for the time being). The newly added reference test will crash when run for commits between cockroachdb#7310 and cockroachdb#7429. See cockroachdb/postgres-test#13 for the initial state used in the test.
Also straightened out some kinks with the existing reference tests, namely that they were creating two single-node clusters per test which were never used. Instead, we can now create zero-node clusters and run one-shot containers on it (one would argue that that's still weird, but it seems acceptable for the time being). The newly added reference test will crash when run for commits between cockroachdb#7310 and cockroachdb#7429. See cockroachdb/postgres-test#13 for the initial state used in the test.
Also straightened out some kinks with the existing reference tests, namely that they were creating two single-node clusters per test which were never used. Instead, we can now create zero-node clusters and run one-shot containers on it (one would argue that that's still weird, but it seems acceptable for the time being). The newly added reference test will crash when run for commits between cockroachdb#7310 and cockroachdb#7429. See cockroachdb/postgres-test#13 for the initial state used in the test.
In cockroachdb#7310, we added (and enforced) assumptions on what was persisted to disk, but did not ensure that older versions were migrated properly. This in turn lead to cockroachdb#7389. A migration is only necessary for ranges which had never been truncated, and so it should be fairly rare in practice. The migration simply writes the truncated state that would otherwise been synthesized by the old code, which is technically a consistency violation, but should work well in practice because all nodes will write that state on startup when they are rebooted.
Also straightened out some kinks with the existing reference tests, namely that they were creating two single-node clusters per test which were never used. Instead, we can now create zero-node clusters and run one-shot containers on it (one would argue that that's still weird, but it seems acceptable for the time being). The newly added reference test will crash when run for commits between cockroachdb#7310 and cockroachdb#7429. See cockroachdb/postgres-test#13 for the initial state used in the test.
In cockroachdb#7310, we added (and enforced) assumptions on what was persisted to disk, but did not ensure that older versions were migrated properly. This in turn lead to cockroachdb#7389. A migration is only necessary for ranges which had never been truncated, and so it should be fairly rare in practice. The migration simply writes the truncated state that would otherwise been synthesized by the old code, which is technically a consistency violation, but should work well in practice because all nodes will write that state on startup when they are rebooted.
Also straightened out some kinks with the existing reference tests, namely that they were creating two single-node clusters per test which were never used. Instead, we can now create zero-node clusters and run one-shot containers on it (one would argue that that's still weird, but it seems acceptable for the time being). The newly added reference test will crash when run for commits between cockroachdb#7310 and cockroachdb#7429. See cockroachdb/postgres-test#13 for the initial state used in the test.
Closing as the issue is sufficiently handled (and the more fundamental aspects are discussed separately in #7600). |
Trying to upgrade
rho
to the beta candidate sha: f447eb0On restart, all nodes crashed with:
Here are the per-node panic messages:
The text was updated successfully, but these errors were encountered: