-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running window post locks with many 512MB sectors #5446
Comments
i have the same issue on calibration net. i described my symptoms here: https://filecoinproject.slack.com/archives/C01D42NNLMS/p1611706395036000 i was pretty sure this only occurs when actually sealing- but i might be wrong |
Some more information: I setup a minimal network in k8s and ran into this issue immediately upon starting a network. This problem is pretty consistent and occurs during When starting a network I see a double entry for
full logs https://gist.github.com/travisperson/e4cee85d94e47fcf537ce38b75517122 The code that appears to get caught up is the locking in bellman here |
i see this issue on machines without a GPU - but with out the "no gpu flag" set. but we might actually have 2 different issues here with the same symptoms |
This issue has been resolved. The locking issue is addressed in bellperson, see the issue here for more details: filecoin-project/rust-fil-proofs#1380 Additionally, there is a maximum of 5 partitions per deadline. With 512MB sectors this is pretty easy to hit because they are limited to 2 sectors per partition. |
I setup a network with 3 miners each having 1024 sectors (512MB sector size), this results in 10-11 partitions per window due to 512MB sectors having 2 sectors per partition.
Right now the miners lock up every once in a while when they try to run a window post and require a restart to get the chain to progress forward again.
The current impact of this issue is that we can't setup networks with 512MB miners.
More logs (includes goroutines) https://gist.github.com/travisperson/1712a7e5a2caa3472b8724ead455fc0c
The text was updated successfully, but these errors were encountered: