-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Snap-up sector stalled in FinalizeReplicaUpdate, causing WindowPoST error and 100% loss of power in partition #8148
Comments
@benjaminh83 if you could answer some questions it will help me make progress on this. They are probably going to seem pretty basic since this is my first lotus miner debugging session, so thanks for bearing with me.
Do I correctly understand that the post scheduler usually detects corrupted data and automatically faults sectors? And furthermore in this case that is not happening. And so instead of posting with one sector marked as faulty your post scheduler just doesn't post at all? And therefore you could lose the whole partition?
|
@ZenGround0 No problem. Its not always easy for us to get the details right, so it makes totally sense what we have been doing :) I will try to answer each of them here:
Other important info: (lotus-miner port 2345, lotus-worker 3456 and 3457)
So data is still located on the lotus-worker... This is the layout:
|
This should be fixed by #8177, please reopen if you encounter a similar problem |
Checklist
Latest release
, or the most recent RC(release canadiate) for the upcoming release or the dev branch(master), or have an issue updating to any of these.Lotus component
Lotus Version
Describe the Bug
I made a snap up sector that had an error while getting transferred back to the miner. This could happen for various reasons, but point being that it did not succeed and sector (sector 3) is now stuck in
FinalizeReplicaUpdate
.It is impossible to do any changes to the sector stage. Also to abandon the snap-up, like:
And the result of having this failed snap up sector is resulting in WindowPoST failure. A failure I cannot seem to escape!
I think this is rather concerning, as this could mean faulting a full partition. Good thing I'm on calibration network, as my miner has currently lost all power and is getting slashed.
Logging Information
Repo Steps
The text was updated successfully, but these errors were encountered: