-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Limit fanouts to the number of active peers, Credit: Equilibrium #2214
Comments
Before starting this ticket, we need to decide on the best design for this ticket, because the suggested design is a lot of work. |
Zebra has a large number of peers connected on mainnet, so it is unlikely we will send any of the 3 fanout requests to the same peer. We also made changes that make sure a broadcast only goes to half the peers, which means there are always peers available. (This issue really only happens on testnet.) |
Motivation
Zebra fans out
getaddr
,getdata
, andmempool
messages to 3 peers. But this is useless when we only have 1-3 ready peers. This mainly happens on testnet, but it can also happen on busy mainnet nodes.When Zebra sends a request, it stops the peer connection being used for other things, until a response or request timeout. This can really slow down the node, because those connections can't be used for syncing. We can observe these timeouts using
waiting for the peer service to become ready
logs.Scheduling
We should try to fix the underlying
PeerSet
bugs first, and then see if Zebra still has hangs.This is a complex refactor, so we should avoid doing it unless we are sure it is needed. Instead, we should try to find easier fixes for the underlying PeerSet issues.
We should re-evaluate the risk as part of the lightwalletd work.
Solution
Refactoring
This depends on the refactoring done as part of #3230.
PeerSet::route_fanout
method, that routes a request to multiple peers. The fanout number should be limited to the number ofready
peer services (should it be ready and unready peer services?)PeerSet
decide which requests to fanoutExisting fanouts
GetAddr
fanout in the candidate setGetData
fanouts in the syncerMempoolTransactionIds
fanout in the mempool crawlerTests
PeerService
recovers from a fanned out request (i.e., the peers handling the fanned out requests become ready again. We might need to fanout multiple times until we know we have sent at least one request to each peer, and then check if thePeerSet
becomes ready after a while)PeerService
handles errors during fanout requestsAlternatives
The foundation could run some Zebra or
zcashd
nodes on testnet, to increase the number of available testnet peers.Put a watch channel in the peer set that is updated with the total number of connected peers (ready and unready), pass that channel to the crawler and syncer, and use it to limit their fanouts.
The text was updated successfully, but these errors were encountered: