Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(ci): Stop running multiple full syncs on different branches #6664

Merged
merged 3 commits into from
May 11, 2023

Conversation

teor2345
Copy link
Contributor

@teor2345 teor2345 commented May 11, 2023

Motivation

We are accidentally running about 8 simultaneous full syncs on testnet, which is slowing down testnet.

Specifications

https://docs.github.com/en/actions/using-jobs/using-concurrency

https://docs.github.com/en/actions/learn-github-actions/contexts#needs-context

Solution

Fix long job concurrency:

  • Only run one checkpoint/full/lwd sync at a time, by removing the branch from the concurrency group label
  • Allow a manual sync to run at the same time as automatic syncs, by adding the manual sync setting to the label

Fix cache-using job conditions:

  • If cache generation jobs are skipped due to concurrency, and the cache is missing, also skip their dependent jobs, rather than failing them when the disk isn't found
  • This is implemented by requiring that the disk was found in the cache, or the disk was previously generated in this workflow run by a successful job

Related fixes:

  • Remove outdated concurrency and fix docs on the lightwalletd send transactions test, which now runs on every PR

Review

This is urgent because we're putting a lot of load on testnet.

See my comment about manual testing of this fix below.

Reviewer Checklist

  • Will the PR name make sense to developers?
  • Are the PR labels correct?
  • Does the code do what the ticket and PR says?

Follow Up Work

I'm going to cancel all the extra jobs now, and just leave the one in this PR, the release PR #6632, and the one that's about to finish.

@teor2345 teor2345 added C-bug Category: This is a bug A-devops Area: Pipelines, CI/CD and Dockerfiles P-Critical 🚑 I-slow Problems with performance or responsiveness I-remote-node-overload Zebra can overload other nodes on the network C-trivial Category: A trivial change that is not worth mentioning in the CHANGELOG labels May 11, 2023
@teor2345 teor2345 self-assigned this May 11, 2023
@teor2345 teor2345 requested a review from a team as a code owner May 11, 2023 04:22
@teor2345 teor2345 requested review from gustavovalverde and removed request for a team May 11, 2023 04:22
@teor2345 teor2345 marked this pull request as draft May 11, 2023 04:44
@teor2345 teor2345 marked this pull request as ready for review May 11, 2023 05:03
@teor2345
Copy link
Contributor Author

I did a manual test of these changes. In this workflow, the full testnet sync was skipped due to concurrency, and so the testnet checkpoint generation was also skipped:
https://github.com/ZcashFoundation/zebra/actions/runs/4944180938/jobs/8839453807

This is the correct behaviour. (Previously the checkpoint generation job would fail.)

@teor2345 teor2345 added the I-cost Zebra infrastructure costs label May 11, 2023
Copy link
Member

@gustavovalverde gustavovalverde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this as a bugfix, but I have a "feeling" that we should move the testnet sync to the main branch, and just trigger it on PRs on-demand (using a label or comment).

@mergify mergify bot merged commit 8c8ac4a into main May 11, 2023
@mergify mergify bot deleted the stop-multi-full-sync branch May 11, 2023 11:15
@teor2345
Copy link
Contributor Author

I agree with this as a bugfix, but I have a "feeling" that we should move the testnet sync to the main branch,

it is already just running on the main branch

and just trigger it on PRs on-demand (using a label or comment).

it only runs on PRs when there is no cached state disk, or when manually triggered using workflow dispatch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-devops Area: Pipelines, CI/CD and Dockerfiles C-bug Category: This is a bug C-trivial Category: A trivial change that is not worth mentioning in the CHANGELOG I-cost Zebra infrastructure costs I-remote-node-overload Zebra can overload other nodes on the network I-slow Problems with performance or responsiveness
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants