Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(tests): Update timeout for Zebra sync tests #4918

Merged
merged 4 commits into from
Aug 24, 2022
Merged

Conversation

oxarbitrage
Copy link
Contributor

@oxarbitrage oxarbitrage commented Aug 17, 2022

Motivation

Currently Zebra CI is failing due to multiple timeouts.

Close #4910

Solution

Increase the full sync timeout from 16 hours to 20 hours.
Increase the update sync timeout from 1-3 hours to 11 hours.

Review

Anyone can review, if the CI pass i think we should merge this fast as it is blocking all the other PRs.

Reviewer Checklist

  • CI fully pass

Follow Up Work

Update the checkpoints.

@oxarbitrage oxarbitrage requested a review from a team as a code owner August 17, 2022 18:51
@oxarbitrage oxarbitrage requested review from conradoplg and removed request for a team August 17, 2022 18:51
@codecov
Copy link

codecov bot commented Aug 17, 2022

Codecov Report

Merging #4918 (225f520) into main (52fa867) will decrease coverage by 0.05%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##             main    #4918      +/-   ##
==========================================
- Coverage   79.09%   79.03%   -0.06%     
==========================================
  Files         309      309              
  Lines       38781    38781              
==========================================
- Hits        30673    30651      -22     
- Misses       8108     8130      +22     

@teor2345
Copy link
Contributor

I'm still seeing a timeout in the Zebra update to tip test, even with the updated checkpoints in PR #4919:

2022-08-18T23:00:06.180256Z INFO {net="Main"}: zebrad::components::sync::progress: estimated progress to chain tip sync_percent=99.754% current_height=Height(1773526) network_upgrade=Nu5 remaining_sync_blocks=4378 time_since_last_state_block=0s
Error:
0: stdout of command did not contain any matches for the given regex
Location:
/opt/zebrad/zebra-test/src/command.rs:779
Command:
"/opt/zebrad/target/release/zebrad" "-c" "/tmp/zebrad_testsBaar2c/zebrad.toml" "start"

Match Regex:
[
"finished initial sync to chain tip, using gossiped blocks .sync_percent.=.*100\.",
]

https://github.com/ZcashFoundation/zebra/runs/7918830518?check_suite_focus=true#step:6:143

Copy link
Contributor

@teor2345 teor2345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need to increase timeouts for the failing Zebra update to tip and lightwalletd full sync tests.

We probably need to update the timeouts for all the lightwalletd tests that sync Zebra's cached state. Some of them won't run until the lightwalletd full sync passes, but they'll have the same issue.

@teor2345 teor2345 changed the title fix(tests): Update timeout for full_sync_test fix(tests): Update timeout for Zebra sync tests Aug 22, 2022
@teor2345 teor2345 force-pushed the update-timeout branch 2 times, most recently from b4914d1 to 7860d6e Compare August 22, 2022 23:56
@teor2345 teor2345 added C-bug Category: This is a bug P-Critical 🚑 I-slow Problems with performance or responsiveness I-integration-fail Continuous integration fails, including build and test failures C-testing Category: These are tests A-rpc Area: Remote Procedure Call interfaces lightwalletd any work associated with lightwalletd labels Aug 22, 2022
teor2345
teor2345 previously approved these changes Aug 23, 2022
Copy link
Contributor

@teor2345 teor2345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alfredo and I updated all the timeouts for the failing tests.

If CI passes, this will merge automatically.

@teor2345 teor2345 requested a review from a team as a code owner August 23, 2022 06:34
Copy link
Contributor

@teor2345 teor2345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex for the 1740k job was wrong, it was expecting Zebra to sync to 1740k, but our cached state is now after 1740k.

This makes the update sync workflows run their whole Zebra sync in that job, then fail with a GitHub timeout.

Instead, we want to finish that job when we reach 1740k or later.

mergify bot added a commit that referenced this pull request Aug 23, 2022
@teor2345 teor2345 merged commit 9fb8742 into main Aug 24, 2022
@teor2345 teor2345 deleted the update-timeout branch August 24, 2022 00:06
@teor2345
Copy link
Contributor

I merged this manually, because:

  • it succeeded the first time, but failed in mergify with an intermittent panic
  • we have a plan for diagnosing and fixing the panic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-rpc Area: Remote Procedure Call interfaces C-bug Category: This is a bug C-testing Category: These are tests I-integration-fail Continuous integration fails, including build and test failures I-slow Problems with performance or responsiveness lightwalletd any work associated with lightwalletd
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Quick fix for sync test timeouts in CI
2 participants