Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase block validation timeouts #4156

Merged
merged 4 commits into from
May 5, 2022
Merged

Conversation

jvff
Copy link
Contributor

@jvff jvff commented Apr 20, 2022

Motivation

The full synchronization test currently times out after running for 6 hours. One possible cause is that during synchronization the block validation may time out a few times, which leads to a few minutes of delays.

Solution

Increase the UTXO lookup timeout and the block validation time out to see if the synchronization finishes earlier.

Review

Reviewer Checklist

  • Code implements Specs and Designs
  • Tests for Expected Behaviour
  • Tests for Errors

Follow Up Work

jvff added 2 commits April 20, 2022 22:29
Avoid block validation failures because UTXOs aren't available on time.
Attempt to reduce the synchronization restarts and consequently improve
performance.
@teor2345
Copy link
Contributor

This could be a bug in the recently merged PR #4149, or an intermittent error:

WARNING: Default device-name for disk name [zebrad-cache-28dc985-mainnet-checkpoint] will be [zebrad-cache-28dc985-mainnet-checkpoint] because it is being mounted to a container with [--container-mount-disk]
ERROR: (gcloud.compute.instances.create-with-container) Could not fetch resource:

  • The resource 'projects/zealous-zebra/global/images/zebrad-cache-main-e5f00c5-v23-mainnet-checkpoint' is not ready

https://github.com/ZcashFoundation/zebra/runs/6103901143?check_suite_focus=true#step:7:95

Maybe @gustavovalverde can help diagnose?

@jvff
Copy link
Contributor Author

jvff commented Apr 21, 2022

This did not lead to a performance increase that's enough to solve the issue (#4155).

This PR can be closed, but it might be useful for reference because it contains the workaround I used when synchronization gets stuck on a single block. This can happen on debug builds or on release builds using trace-level logging. It could also happen on some lower-end computers.

@teor2345
Copy link
Contributor

This did not lead to a performance increase that's enough to solve the issue (#4155).

This PR can be closed, but it might be useful for reference because it contains the workaround I used when synchronization gets stuck on a single block. This can happen on debug builds or on release builds using trace-level logging. It could also happen on some lower-end computers.

I'd like to fix this bug for low-end computers, if we can do it without breaking the full sync test.

Can we leave this PR in draft, fix the sync speed, and then re-test to make sure it doesn't make the sync worse?

@teor2345
Copy link
Contributor

@Mergifyio update

@mergify
Copy link
Contributor

mergify bot commented Apr 26, 2022

update

✅ Branch has been successfully updated

@teor2345
Copy link
Contributor

I am updating this branch to delete the clean action, which causes other actions to fail.

@teor2345 teor2345 added C-bug Category: This is a bug NU-5 Network Upgrade: NU5 specific tasks P-High 🔥 I-hang A Zebra component stops responding to requests I-slow Problems with performance or responsiveness labels May 4, 2022
@teor2345
Copy link
Contributor

teor2345 commented May 4, 2022

@Mergifyio update

@mergify
Copy link
Contributor

mergify bot commented May 4, 2022

update

✅ Branch has been successfully updated

@teor2345
Copy link
Contributor

teor2345 commented May 4, 2022

I'm running a full sync test here:
https://github.com/ZcashFoundation/zebra/actions/runs/2267417739

If it takes around 5h50m, we should merge this PR, because it fixes part of the bug in PR #4286.

@teor2345 teor2345 assigned teor2345 and unassigned jvff May 4, 2022
@oxarbitrage
Copy link
Contributor

So the full sync tests was skipped, can we relaunch this to check how much time it takes ?

@teor2345
Copy link
Contributor

teor2345 commented May 4, 2022

So the full sync tests was skipped, can we relaunch this to check how much time it takes ?

The test I ran manually took 5h47m, so it is about the same speed.

@teor2345 teor2345 marked this pull request as ready for review May 4, 2022 21:54
@teor2345 teor2345 requested review from a team as code owners May 4, 2022 21:54
@teor2345 teor2345 requested review from oxarbitrage and removed request for a team May 4, 2022 21:54
Copy link
Contributor

@teor2345 teor2345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Required to fix bugs like #4286.

mergify bot added a commit that referenced this pull request May 4, 2022
mergify bot added a commit that referenced this pull request May 4, 2022
@mergify mergify bot merged commit 79d5828 into main May 5, 2022
@mergify mergify bot deleted the increase-block-validation-timeouts branch May 5, 2022 00:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug I-hang A Zebra component stops responding to requests I-slow Problems with performance or responsiveness NU-5 Network Upgrade: NU5 specific tasks
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants