-
Notifications
You must be signed in to change notification settings - Fork 153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix/loop-sync-mode #78
Conversation
Signed-off-by: meows <[email protected]>
Signed-off-by: meows <[email protected]>
Signed-off-by: meows <[email protected]>
Multiple clients have reported BAD BLOCKS with this data. Signed-off-by: meows <[email protected]>
Signed-off-by: meows <[email protected]>
Now, the spanSearch algorithm is only used if local head is within MaxHeaderFetch distance of remote. Span searches with a remote with a distant head don't make sense. The only thing this changes for the 'API' is that the log lines are now Debug instead of Trace for the different ancestor search algorithms. Signed-off-by: meows <[email protected]>
Signed-off-by: meows <[email protected]>
…genesis Signed-off-by: meows <[email protected]>
return err | ||
var origin = uint64(0) | ||
if d.blockchain.CurrentHeader().Number.Uint64() > 0 { | ||
origin, err = d.findAncestor(p, latest) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only negotiate a non-zero common ancestor if our current Header (should be common to all sync modes) is actually above genesis.
eth/downloader/downloader.go
Outdated
if remoteHeight-localHeight < uint64(MaxHeaderFetch) { | ||
a, err := d.findAncestorSpanSearch(p, remoteHeight, localHeight, floor) | ||
if err == nil { | ||
if a > localHeight { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never allow a "common ancestor" to be above our local head.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, PTAL my logic.
eth/downloader/downloader.go
Outdated
} | ||
|
||
// Only attempt span search if local head is "close" to reported remote | ||
if remoteHeight-localHeight < uint64(MaxHeaderFetch) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only attempt the spanSearch
algo if we're within throwing distance of the remote. Otherwise this would only return blocks above our local if anything at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But please double check my logic here.
This reverts commit 5b23e88. See discussion in ethereum/go-ethereum#20791 for rationale.
Some tests apparently don't use a rreeeallll downloader. This makes them not panic. Signed-off-by: meows <[email protected]>
Signed-off-by: meows <[email protected]>
known = d.lightchain.HasHeader(h, n) | ||
default: | ||
log.Crit("unknown sync mode", "mode", d.mode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a new potential but theoretically-noop logic change, and like all audacious panics, it thinks it never should or will happen. The d.mode.String()
will return "unknown."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks legit.
These are nearly after the original ECIP1061 fork. Signed-off-by: meows <[email protected]>
Signed-off-by: meows <[email protected]>
This branch gives me the following on kotti:
I didn't observe that on #76
|
Another node the same
|
Thanks for the report!
I'll take another look at it. @tzdybal can you lend another set of 👀 as well? |
I'm tempted to add a |
|
I'm looking on this as well. |
This line in the first few of boot-kotti-core-polis.log |
Yeah, I keep getting |
Clarity: I mean it could have been caused by a number of things. |
But @meowsbits this is entirely a different issue! Let's fix the loop sync first. Why didn't we see loops with #76? It just came back after compiling this branch. |
I've read these logs, and don't see the loop sync issue there. ? (And the logs in this gist are different than those originally reported at #78 (comment)) @q9f can you point to or highlight the lines in either of the logs in the gist that show the issue? |
That's bad luck. I cannot not turn on the bad-sync-loop switch and increasing the verbosity to 4 or 5 makes it impossible for me to see what's going on. I was just hoping it contained the sync loop.
Yeah, because I had to restart the node to increase verbosity. |
Got it. Just FYI, you can use the Javascript console |
Ah good to know |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGMT in general as this actually improves the situation.
There is, however, a small performance impact for searching the entire tree for common ancestors.
INFO [03-30|12:00:25.754] Imported new chain segment blocks=1 txs=0 mgas=0.000 elapsed=1.222ms mgasps=0.000 number=2118251 hash=df27b9…ccb312 dirty=287.70KiB
INFO [03-30|12:01:10.595] Deep froze chain segment blocks=2 elapsed=1.189s number=2028250 hash=5be331…dc36a9
INFO [03-30|12:02:14.851] Importing heavy sidechain segment blocks=2048 start=3735 end=5782
ERROR[03-30|12:14:07.815] Impossible reorg, please file an issue oldnum=3736 oldhash=cc596e…129218 newnum=3736 newhash=cc596e…129218
WARN [03-30|12:14:07.816] Synchronisation failed, dropping peer peer=00759cb5fd8a35c8 err="retrieved hash chain is invalid"
WARN [03-30|12:14:08.485] Header broke chain ordering number=2118276 hash=9267da…98f5df expected=2118252
WARN [03-30|12:14:08.489] Synchronisation failed, dropping peer peer=5b1fe926e0ed8818 err="action from bad peer ignored"
INFO [03-30|12:14:08.529] Chain reorg detected number=2118250 hash=036a16…fc4211 drop=1 dropfrom=df27b9…ccb312 add=2 addfrom=2a9bb4…7629c0
INFO [03-30|12:14:08.530] Imported new chain segment blocks=1 txs=0 mgas=0.000 elapsed=713.749ms mgasps=0.000 number=2118252 hash=2a9bb4…7629c0 age=13m30s dirty=287.70KiB
INFO [03-30|12:14:08.535] Imported new chain segment blocks=1 txs=0 mgas=0.000 elapsed=4.176ms mgasps=0.000 number=2118252 hash=a5c57c…7f6950 age=13m30s dirty=287.70KiB
^ here: 12 minutes.
Can you say more about what do you mean here? This change actually improves the common ancestor networking step by skipping a useless span search for far-away peers, and skipping binary search when local is at genesis. |
Ok, unsure then, I didn't really trace what's it doing these 12 minutes. I agree that it improves the situation overall. But it appears that the node entirely stops operating during that time it tries to reorganize. |
👏 ver 👏 bose 👏 logs 👏 please 👏 😹 🕹️ Are you using a service runner to manage your instances? (I like |
@q9f You know that reorg from block 3735? The protocol is attempting that insert and returning a |
Just FYI, rel 23cc3fa :
|
Fixes #75
Refines #76, removing non-essential debug log statements... etc. Trying to minimize unnecessary refactorings or along-the-way improvements to limit the complexity of the science here.
Notes on the issue this resolves and how
kill -9
).BAD BLOCKS
reported, and I've added them to the ETC config.blockchain
actually) thinks its head is below it?Blockchain head marking is a totally separate mechanism from the actual blockchain "state." It is a fallible map
of the territory.
If a blockchain head marker is rolled back, but block data above that new marker is not removed, then
the database will contain "secret" data.
In this case, the common ancestor negotiation will ask the database if a block is available, and if it is, it will
satisfy the existing algorithmic condition and be returned positively.