-
Notifications
You must be signed in to change notification settings - Fork 20.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
eth: fix potential deadlock of pub/sub module #22132
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about drainLoop
. Seems like infinitely iterating will cause problems in the future; or rather, it's not the most elegant solution.
Thanks for your detail issue explanation and the fix! |
@@ -174,6 +174,17 @@ func (sub *Subscription) Unsubscribe() { | |||
// this ensures that the manager won't use the event channel which | |||
// will probably be closed by the client asap after this method returns. | |||
<-sub.Err() | |||
|
|||
drainLoop: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure it's needed. After the uninstall request gets processed, no more following events will be sent to this filter. All accumulated events will be consumed by the filter handler itself.
But anwway I agree it's ugly to prevent the deadlock in the Unsubscribe by such approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The uninstall request is processed in eventLoop
, while eventLoop
maybe busy pushing hash into sub.f.hashes
, however if sub.f.hashes
have unread hash
, then deadlock happens.
f.hashes = append(f.hashes, ph...) | ||
} | ||
api.filtersMu.Unlock() | ||
f.hashes = append(f.hashes, ph...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may introduces some data race.
In the GetFilterChanges
API will be reset the f.hashes
to nil. So the lock is kind of necessary here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, I missed this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest adding a fine-grained lock in filter
rather than using a global lock. What do you think?
Description
Fix the potential deadlock of pub/sub module which may cause the chain halt.
More detail in #22131
Rationale
The deadlock dependency is:
Routine A:
NewPendingTransactionFilter
want to lockfiltersMu
to consume hashes;Routine B:
eventLoop
is waiting Routine A to consume hashes so that it can push new hash to channel.Routine C:
Unsubscribe
is holding lock filtersMu
, but it is waiting for Routine B to consumeuninstall
channel.Changes
There is no need to fetch lock on
NewPendingTransactionFilter
routine, so just loose the logic.And fix the potential risk that
Unsubscribe
routine does not drain all thehash
,header
andlog
.