-
Notifications
You must be signed in to change notification settings - Fork 20.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
core/bloombits: fix deadlock when matcher session hits an error #28184
Conversation
Do you mean this issue occurred when the ancient block data was pruned? If so, go-ethereum didn't prune any ancient data, so I think this issue can't trigger. |
Everyone is merging PRs right before I can review them :( |
Yes, that's how the issue was emerged in BSC. Even if deadlock was not reproducible in Ethereum, the root cause was still in this repo's core/bloombits and this patch would prevent unknown problems in the future. @fjl Thank you for merging this! I'll port this to other applicable blockchains. |
…reum#28184) When MatcherSession encounters an error, it attempts to close the session. Closing waits for all goroutines to finish, including the 'distributor'. However, the distributor will not exit until all requests have returned. This patch fixes the issue by delivering the (empty) result to the distributor before calling Close().
…reum#28184) When MatcherSession encounters an error, it attempts to close the session. Closing waits for all goroutines to finish, including the 'distributor'. However, the distributor will not exit until all requests have returned. This patch fixes the issue by delivering the (empty) result to the distributor before calling Close().
…reum#28184) When MatcherSession encounters an error, it attempts to close the session. Closing waits for all goroutines to finish, including the 'distributor'. However, the distributor will not exit until all requests have returned. This patch fixes the issue by delivering the (empty) result to the distributor before calling Close().
…reum#28184) When MatcherSession encounters an error, it attempts to close the session. Closing waits for all goroutines to finish, including the 'distributor'. However, the distributor will not exit until all requests have returned. This patch fixes the issue by delivering the (empty) result to the distributor before calling Close().
…or (ethereum#28184)" This reverts commit 0e3ba93.
…or (ethereum#28184)" This reverts commit 0e3ba93.
core/bloombits: fix a deadlock when a matcher session hits an error
Problem description
A deadlock occurs when a pruned node receives
eth_getLogs
for pruned blocks. We hit this deadlock in bsc and avalanche's subnet-evm.This deadlock didn't happen on Ethereum with this repo's geth. Probably it's because the pruning mechanism is different.
As discussed below, however, the root cause is in core/bloombits and its code is the same as bsc. Any component consuming core/bloombits may hit this deadlock.
Root cause and Fix
Here are the goroutines causing the deadlock. Please note that this info is from bsc. The functions names are slightly different from the current geth.
GetLogs
is waiting ons.closer
s.closer
is owned by Goroutine runningMatcherSession.Multiplex
, which is waiting ons.pend
s.pend
needs one more "Done" inMatcher.run
in the following Goroutine, which is waiting on channels.The 2nd goroutine did close the
s.quit
channel before waiting ons.pend
. When the 3rd goroutine received that signal, it set theshutdown
tonil
, but at that timeallocs
was 1, so the loop indistributor
still continued.What the 3rd goroutine is waiting is data on
m.deliveries
. If that happens, thedistributor
returns,s.pending
is marked done, and all goroutines resume.The reason why
m.deliveries
is receiving no data is that the 2nd goroutine is stuck ons.Close()
befores.deliverSections
that sends data tom.deliveries
.Therefore, the proposed fix is to call
s.deliverSections
inMatcherSession.Multiplex
before callings.Close()
.