Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve catchup #2375

Merged
merged 4 commits into from
Dec 10, 2024
Merged

Improve catchup #2375

merged 4 commits into from
Dec 10, 2024

Conversation

jbearer
Copy link
Member

@jbearer jbearer commented Dec 7, 2024

Anticipating some problems we might run into as more nodes start pointing at more peers for catchup:

  • HTTP timeouts can be really bad because it can make us take a really long time (ie more than a view) to even get to all the peers in our list
  • There is not currently any intelligent way of ordering peers, so if you have a really unreliable peer early in the list, that can cause performance issues

This PR:

  • Implements an aggressive and adaptive timeout mechanism so no one faulty peer can cause us to timeout before trying to catch up from other peers
  • Ranks peers based on request failure rate so we always try our most reliable peers first
  • Adds metrics for catchup requests

@jbearer jbearer enabled auto-merge (squash) December 10, 2024 00:58
@jbearer jbearer merged commit c52924e into main Dec 10, 2024
21 of 22 checks passed
@jbearer jbearer deleted the jb/catchup branch December 10, 2024 01:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants