Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CHIA-1638] Pace block requests #18729

Merged
merged 4 commits into from
Nov 1, 2024
Merged

[CHIA-1638] Pace block requests #18729

merged 4 commits into from
Nov 1, 2024

Conversation

arvidn
Copy link
Contributor

@arvidn arvidn commented Oct 17, 2024

Purpose:

This patch addresses a problem during long-sync, where the syncing node may send request_blocks at a rate exceeding the peer's outbound rate limit for respond_blocks. The result of which is the peer not responding and the syncing node closing the connection. Over time, all peers may get disconnected and syncing stalls. It takes at least 30 seconds + weight proof validation to restart syncing.

The root of this problem is that the rate limits for request_blocks is not aligned with the rate limit for respond_blocks. They are 500 msgs/minute and 100 msgs/minute respectively (but then scaled to 30%). It never makes sense to send more requests that the peer is willing to respond to, so these limits should really be the same.

This patch hard-codes the expected outbound rate limit of the peer and paces requests to never exceed that limit. Thus, maintaining a steady sync (albeit, slow). The rate is at most one request every 2 seconds.

We already send requests to multiple peers, if we have more than one. This patch keeps track of the timestamp, per peer, when it's OK to send the next request. Sometimes, this timestamp can be in the past. This happens if one peer stalls for a long time and we "miss" the time to send a request to another peer (or the same peer for that matter).

The rate limit is enforced at 60 seconds at a time, so we allow "catching up" by only incrementing the timestamp by the rate limit minimum (2 seconds). However, if a peer takes too long to respond, we penalize it by bumping the time stamp to the current time. This creates a weak affinity to request more from faster peers.

There are still issues with our concept of rate limits. For instance, there is a configuration option to scale the rate limits. But the effective limits are never communicated over the protocol, so there's no way of knowing whether a peer has tweaked its limits.

Current Behavior:

During long sync, we request blocks as fast as we can (with a single request outstanding at a time). Risking pushing peers over the limit, stalling and having to restart the sync.

New Behavior:

During long sync, we pace the block requests to peers to never exceed the (presumed) rate limit for block requests.

Testing Notes:

Manually tested on my node. I sync about 3x faster.

@arvidn arvidn added the Changed Required label for PR that categorizes merge commit message as "Changed" for changelog label Oct 18, 2024
Copy link
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@github-actions github-actions bot added the merge_conflict Branch has conflicts that prevent merge to main label Oct 18, 2024
@arvidn arvidn changed the title Pace block requests [CHIA-1638] Pace block requests Oct 18, 2024
@github-actions github-actions bot removed the merge_conflict Branch has conflicts that prevent merge to main label Oct 18, 2024
Copy link
Contributor

Conflicts have been resolved. A maintainer will review the pull request shortly.

@github-actions github-actions bot added the merge_conflict Branch has conflicts that prevent merge to main label Oct 22, 2024
Copy link
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Copy link
Contributor

Conflicts have been resolved. A maintainer will review the pull request shortly.

@github-actions github-actions bot removed the merge_conflict Branch has conflicts that prevent merge to main label Oct 25, 2024
@arvidn arvidn marked this pull request as ready for review October 25, 2024 15:42
@arvidn arvidn requested a review from a team as a code owner October 25, 2024 15:42
chia/full_node/full_node.py Show resolved Hide resolved
chia/full_node/full_node.py Outdated Show resolved Hide resolved
chia/full_node/full_node.py Outdated Show resolved Hide resolved
@altendky altendky dismissed their stale review October 31, 2024 13:22

concerns have been addressed

@arvidn arvidn requested a review from emlowe October 31, 2024 13:34
@arvidn arvidn closed this Nov 1, 2024
@arvidn arvidn reopened this Nov 1, 2024
emlowe
emlowe previously approved these changes Nov 1, 2024
@arvidn arvidn added ready_to_merge Submitter and reviewers think this is ready and removed coverage-diff labels Nov 1, 2024
@github-actions github-actions bot added the merge_conflict Branch has conflicts that prevent merge to main label Nov 1, 2024
Copy link
Contributor

github-actions bot commented Nov 1, 2024

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Copy link
Contributor

github-actions bot commented Nov 1, 2024

Conflicts have been resolved. A maintainer will review the pull request shortly.

@github-actions github-actions bot removed the merge_conflict Branch has conflicts that prevent merge to main label Nov 1, 2024
Copy link
Contributor

github-actions bot commented Nov 1, 2024

File Coverage Missing Lines
chia/full_node/full_node.py 84.0% lines 1179, 1190, 1194, 1202
Total Missing Coverage
26 lines 4 lines 84%

@pmaslana pmaslana merged commit 12a089f into main Nov 1, 2024
362 of 363 checks passed
@pmaslana pmaslana deleted the pace-block-requests branch November 1, 2024 19:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Changed Required label for PR that categorizes merge commit message as "Changed" for changelog ready_to_merge Submitter and reviewers think this is ready
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants