-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mined blocks became orphan blocks #5128
Comments
I have the same issue:
My miner says it has mined a block, but there is no record of it on filfox: https://filfox.info/en/tipset/294807. And my balance on my miner does not seem to have been credited with funds for mining the block. |
Extra context after analyzing @rjan90's ( daemon/miner interleaved, unfiltered, chain-specific) logs:
|
Just to drop my own logs into this too
|
Adding more info on re-org stats for investigation
|
I have something more interesting:
This is the block that was missing from my parent set. It arrived 8 seconds after the supposed mining deadline for it. It was included by 2 other miner, but not by mine. Also it's a bit odd that tipsets usually have many blocks, but this one only had 3. A similar issue was reported by others too:
Same miner caused it. Are we sure this is accidental and not deliberate? |
Logging out the most delayed blocks from the last day
|
I have been analysing the logs of miners that reported their blocks not getting on chain. The problem isn’t that the mined block isn’t being propagated to the network and therefore not included on chain. The problem is that shortly after the miner wins the round and starts minting the block (fixed 6 second cutoff), a new block from the parent tipset arrives. The syncer component accepts the block, but the miner has already committed to the stale base. Now, this becomes a probabilistic dilemma.
We basically need to be on the same side as all other miners. Otherwise we risk our block being rejected. Whether we pick 1 or 2, we may be on the wrong side. For all instances reported here, the miner happened to be on the wrong side, i.e. mining on top of a tipset that was missing a block that your own syncer accepted. It’s a bit of a paradox, and I think that something needs to be more intelligent here. Note: Ethereum deals with similar situations with the ommer construction. Upon discussing the above with @magik6k, winning PoSt is the most expensive part of mining a block (as also noted in the logs above), and the good news is that we can reuse it, because "randomness for winning PoSt comes from drand, not the VRF chain". So what we should consider here is:
The risk here is that the recomputation of state could make our block publishing window slip, but it seems to be a fair tradeoff. If we want to make this more intelligent, we can track the time it took to compute the previous mining base state, and estimate the time it would take to recompute to take the decision. I suspect that with this logic, all the cases of abandoned blocks reported in this thread would've not taken place. |
Another note: it's also worth noting that while we suspected of gossipsub block propagation being the issue, it still needs to be elucidated why these blocks seemed to have arrived seemingly late. Note that block validation is a blocking step in propagation, and there's a bit of state fetching and computation involved there, which we might be able to alleviate by keeping caches of worker keys and what not. |
Adding another case of a mined block, becoming orphan-block that I experienced during the night:
|
So the specs for my miner/lotus node is: it is only used as miner/lotus-node, and not as a sealing-machine |
This continues in v1.4.2 with the latest codes updates:
The miner spec is: |
I don't know what to look for at the daemon logs but here are the entries around the 2021-02-20T03:16:36 timing when this happened. The last four lines have this: 2021-02-20T03:17:11.851-0800 WARN sub sub/incoming.go:104 Received block with large delay 11 from miner f08399
|
@raulk Look at @William8Work's case. Here the delayed block came in after he had finished computing and published his block to the network:
|
Copy some conversation from the slack channel as it's being archived:
|
Looks like another instance happened to my miner again yesterday. A block was mined but no rewards. So I searched the log and saw the miner log:
Here is the daemon logs:
|
@William8Work could you please add more logs from the moment that your miner won the round? We need to get the exact timeline of events. |
@William8Work |
@raulk here are additional logs from the daemon. Happy to provide more if you need them:
|
the miner log:
|
Had another case of a orphan-block yesterday. Here is the interleaved daemon / miner logs:
This time the delayed block came in after my GenerateWinningPoSt had finished. The |
A new case of an orphan block this morning. This time it was a bit different, as there is no logs of a large delay of a block: Interleaved miner / lotus logs:
|
Another repro from f010088 on Apr 16, 2021:
|
mined another orphan block 😢. When does it stop?
|
I've had two occurrences in the last two weeks, below are the relevant logs, if I scan my lotus log I see many blocks coming in with large delays, considering that the round trip time to pretty much anywhere on the planet is ~500ms I don't understand why some blocks from some miners are coming in so late/slow, all I can think is they have very poor hardware and/or are doing it maliciously to cause block losses for others, either way miners with excessive continuous delays should be penalised/slashed for such behaviour, it's not good for the network. https://filfox.info/en/tipset/617327 2021-03-22T11:06:36.815-0400 WARN sub sub/incoming.go:104 received block with large delay from miner {"block": "bafy2bzacecjs4lkhirqkjrq7hwpb77t6oh7qlc3g7i5ejd6alkqpelzoikthm", "delay": 6, "miner": "f019074"} |
2021-05-05T12:07:06.318+0800 INFO storageminer storage/miner.go:256 Computing WinningPoSt ;[{SealProof:9 SectorNumber:77 Seal I found some error message form the lotus daemon log : |
My miner had another instance on this. Block is mined but lotus daemon said with large delay. Are we going to address this? |
@William8Work we'll probably need to starting working on a FIP proposal and probably introduce some kind of penalty / slashing for miners that causes delays. |
This is in the process of being investigated more thoroughly. Expect update EoW-ish |
something interesting for this issue: so, i suggest we have some mechanism to make sure miner can generate valid block for valid parents. e.g. i generate a block with current parents at 6s, but if i gather a new parent after 6s, i still can generate a new block. for such two blocks from same miner at the same height, due to they have different parents, only the block with all parents will be accepted by the network. |
This seems to be happening on the network all the time as pointed out by @moliujian in this thread on Slack. Recap (according to the the claim from the thread): This creates incentive that miners set propagation cut off to a larger value to game the system. |
Found another orphan block last night (running version v1.13.0-rc2):
|
Can you guys confirm if you have seen that block late? For now, it appears to be late on every nodes we checked, but still, been added to the tipset. (6 nodes operated by different miners)
|
Hey everybody! 👋Since the beginning of this thread it has been quite clear that one of the biggest reasons for "orphan blocks" has been that a new block from the parent tipset arrives after the storage provider already has started minting the block (after the 6 second propagation cutoff). WinningPoSt used to be a lot slower in the early days of the network (10-15s), and with the need of a good buffer for the tipset compute time (~10s) it was decided before mainnet that the PropagationDelay was to be set at 6 seconds. Example of an ancient winningPoSt:
Since that time we have seen significant improvements in the computation times for winningPoSt, as well as general lower rate of "orphan blocks". But SPs are still experiencing orphan blocks from time to time due to a parent block coming in late. After gathering newer block mining metrics, discussing potential solutions and getting confirmation from the researcher that changing this will be fine, we have decided to raise the default PropagationDelay to 10 seconds with this PR (coming in the v1.17.2 release). We feel confident that all systems (even when computed on CPU) will still be able to compute winningPoSt comfortably within the required time. We have also added the ability to set the PropagationDelay yourself with the Clearing up a misunderstandingI also want to clear up some confusion about what the TL;DR: A block is released exactly at epoch time. So all in all, the default PropagationDelay is raised from 6 seconds -> 10 seconds, and SPs can also fine tune this even further. We expect this to fix a lot of the remaining unwanted "orphan blocks" people are seeing, so therefore we are closing this issue. There may be events where there are hiccups in the physical internet network fabric, and a block is able to come in and be accepted right before the epoch ends. These events should be very rare, and unfortunately not something that Lotus can prevent. |
Describe the bug
We mined some blocks,but don't know why these blocks are not exists on chain.
The text was updated successfully, but these errors were encountered: