Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(ci): Retry launching failed docker test instances #7703

Closed
wants to merge 8 commits into from

Conversation

teor2345
Copy link
Contributor

@teor2345 teor2345 commented Oct 9, 2023

Motivation

We can't merge PRs because sometimes launching the Docker test image fails.

This PR stops #7659 being a critical issue by allowing PRs to merge anyway. So we can keep that ticket open, but fix it over the next week or two while still merging other PRs.

Action Reference

The retry-step arguments are listed here:
https://github.com/marketplace/actions/retry-step

Complex Code or Requirements

I deleted some parts of the script that seemed to be redundant, and I deleted the redundant "no lightwalletd path" step. Instead, we always mount both paths, some tests don't use the lightwalletd path.

(It's also possible for tests to use the lightwalletd paths as temporary storage, but we never see them do it because their states aren't written as a cache image.)

Solution

  • Make create instance and launch Docker test into a single step
  • Retry that step up to 10 times if it fails

Review

This is blocking all the other PRs merging reliably. It's not a long-term solution, but it would let us merge other PRs while we're fixing the actual bug.

To reduce the diff, I used one-space indentation rather than two-space indentation. I'll fix that in another PR.

Reviewer Checklist

  • Are the PR labels correct?
  • Does the code do what the ticket and PR says?
    • Does it change concurrent code, unsafe code, or consensus rules?
  • How do you know it works? Does it have tests?

Follow Up Work

Fix the underlying bug, and once we're sure it's fixed, remove the retries.

@teor2345 teor2345 added C-bug Category: This is a bug A-devops Area: Pipelines, CI/CD and Dockerfiles P-Critical 🚑 I-integration-fail Continuous integration fails, including build and test failures C-trivial Category: A trivial change that is not worth mentioning in the CHANGELOG labels Oct 9, 2023
@teor2345 teor2345 self-assigned this Oct 9, 2023
@teor2345 teor2345 requested a review from a team as a code owner October 9, 2023 05:13
@teor2345 teor2345 requested review from oxarbitrage and removed request for a team October 9, 2023 05:13
@teor2345
Copy link
Contributor Author

teor2345 commented Oct 9, 2023

If we need to wait or initalise the without-cached-state disk, we can do that inside the ssh command. But I'm not sure if that's needed, because there's no reference to /dev/sdb until we initialise it.

Using block devices also isn't recommended, so that could be part of our issue here:
https://docs.docker.com/storage/volumes/#block-storage-devices

Here's a manual Zebra full sync for testnet, and lightwalletd full sync for mainnet:

older buggy syncs

Here's a manual Zebra full sync for testnet, and lightwalletd full sync for mainnet, with a shell command bug!

Here's a manual Zebra full sync for testnet, and lightwalletd full sync for mainnet, with a bug that stops the checkpoint syncs working:

@gustavovalverde
Copy link
Member

This should no longer be needed after #7690

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-devops Area: Pipelines, CI/CD and Dockerfiles C-bug Category: This is a bug C-trivial Category: A trivial change that is not worth mentioning in the CHANGELOG I-integration-fail Continuous integration fails, including build and test failures
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants