-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: addSSTable checksum failure during restore #63297
Comments
Looks like restore ground to a halt, with lots of messages about slow RPCs, some in the thousands of seconds, mostly involving n5, e.g.
We see some AddSSTable and AdminSplits hanging for awhile too, which explains why the RESTORE hung. n5's logs have lots of health alerts too around various metrics, so looks like that node got itself into a sad state. |
(roachtest).restore2TB/nodes=10 failed on release-21.1@80709778b7c7de7b2b704972197022305bb3ca12:
More
Artifacts: /restore2TB/nodes=10
See this test on roachdash |
On the first failure: yeah what David said, here's from
This does look a lot like #61396, which @aliher1911 is investigating. |
The second failure:
cc @dt |
Repro suggestions:
Note that you shouldn't use Watch the failures roll in (hopefully...) When the repro run is aborted or fails for whatever reason, you need to clean up VMs manually, I do this:
I'm happy to review the branch before you spin up a repro attempt. |
This comment has been minimized.
This comment has been minimized.
^- GOAWAY from upstream server |
Renamed this issue so that the issue poster won't reuse it for future failures of this test, and moved out of KV backlog. |
Closing for now until we can get another repro |
(roachtest).restore2TB/nodes=10 failed on release-21.1@389cbd4be0e9ce22ca7789cd61802f1f90392c97:
More
Artifacts: /restore2TB/nodes=10
Related:
See this test on roachdash
powered by pkg/cmd/internal/issues
Jira issue: CRDB-6514
The text was updated successfully, but these errors were encountered: