-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CHIA-1638] Pace block requests #18729
Conversation
d5768f2
to
432e953
Compare
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
432e953
to
ff3ffd6
Compare
Conflicts have been resolved. A maintainer will review the pull request shortly. |
ff3ffd6
to
ac7893a
Compare
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
ac7893a
to
ced7e0f
Compare
Conflicts have been resolved. A maintainer will review the pull request shortly. |
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
d451773
to
6dc9af2
Compare
Conflicts have been resolved. A maintainer will review the pull request shortly. |
|
Purpose:
This patch addresses a problem during long-sync, where the syncing node may send
request_blocks
at a rate exceeding the peer's outbound rate limit forrespond_blocks
. The result of which is the peer not responding and the syncing node closing the connection. Over time, all peers may get disconnected and syncing stalls. It takes at least 30 seconds + weight proof validation to restart syncing.The root of this problem is that the rate limits for
request_blocks
is not aligned with the rate limit forrespond_blocks
. They are 500 msgs/minute and 100 msgs/minute respectively (but then scaled to 30%). It never makes sense to send more requests that the peer is willing to respond to, so these limits should really be the same.This patch hard-codes the expected outbound rate limit of the peer and paces requests to never exceed that limit. Thus, maintaining a steady sync (albeit, slow). The rate is at most one request every 2 seconds.
We already send requests to multiple peers, if we have more than one. This patch keeps track of the timestamp, per peer, when it's OK to send the next request. Sometimes, this timestamp can be in the past. This happens if one peer stalls for a long time and we "miss" the time to send a request to another peer (or the same peer for that matter).
The rate limit is enforced at 60 seconds at a time, so we allow "catching up" by only incrementing the timestamp by the rate limit minimum (2 seconds). However, if a peer takes too long to respond, we penalize it by bumping the time stamp to the current time. This creates a weak affinity to request more from faster peers.
There are still issues with our concept of rate limits. For instance, there is a configuration option to scale the rate limits. But the effective limits are never communicated over the protocol, so there's no way of knowing whether a peer has tweaked its limits.
Current Behavior:
During long sync, we request blocks as fast as we can (with a single request outstanding at a time). Risking pushing peers over the limit, stalling and having to restart the sync.
New Behavior:
During long sync, we pace the block requests to peers to never exceed the (presumed) rate limit for block requests.
Testing Notes:
Manually tested on my node. I sync about 3x faster.