Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(sync): fix testnet syncer loop on large Orchard blocks #4286

Merged
merged 5 commits into from
May 4, 2022

Conversation

teor2345
Copy link
Contributor

@teor2345 teor2345 commented May 4, 2022

Motivation

Testnet block 1860741 contains a 600 kB Orchard transaction, which can take a long time to verify.

This causes a syncer loop:

  • the syncer requests that block again
  • which causes a duplicate request error,
  • which restarts the syncer loop,
  • which re-verifies the same block.

Eventually, all the CPUs are busy verifying the same block, and Zebra does not make any sync progress.

Edit: actually I think there's just one verification going at one time. The CPU bug could be something different.

Solution

  • Ignore the duplicate request error, and allow the block to verify
    • Make all download errors into enum variants
    • Check all syncer request errors to see if we want to ignore them

Related fixes:

  • Check genesis errors to log info or warn messages

We might also need to increase the verification timeout on slower machines (PR #4156).

Review

This bug blocks syncing on testnet past block 1860741, so it is a high priority.

Reviewer Checklist

  • Code implements Specs and Designs
  • Tests for Expected Behaviour
  • Tests for Errors

Follow Up Work

Better syncer errors

@teor2345 teor2345 added C-bug Category: This is a bug NU-5 Network Upgrade: NU5 specific tasks P-High 🔥 I-hang A Zebra component stops responding to requests I-heavy Problems with excessive memory, disk, or CPU usage labels May 4, 2022
@teor2345 teor2345 requested a review from a team as a code owner May 4, 2022 02:16
@teor2345 teor2345 self-assigned this May 4, 2022
@teor2345 teor2345 requested review from oxarbitrage and removed request for a team May 4, 2022 02:16
@teor2345 teor2345 changed the title Fix testnet sync Fix testnet syncer loop on large Orchard blocks May 4, 2022
@teor2345 teor2345 mentioned this pull request May 4, 2022
3 tasks
@teor2345 teor2345 changed the title Fix testnet syncer loop on large Orchard blocks fix(sync): fix testnet syncer loop on large Orchard blocks May 4, 2022
Copy link
Contributor

@oxarbitrage oxarbitrage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, should help.

mergify bot added a commit that referenced this pull request May 4, 2022
@mergify mergify bot merged commit 56f766f into main May 4, 2022
@mergify mergify bot deleted the fix-testnet-sync branch May 4, 2022 22:04
Copy link
Contributor

@dconnolly dconnolly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug I-hang A Zebra component stops responding to requests I-heavy Problems with excessive memory, disk, or CPU usage NU-5 Network Upgrade: NU5 specific tasks
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants