-
Notifications
You must be signed in to change notification settings - Fork 502
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a few retries around failure points in release scripting #1353
Comments
/remove-priority important-soon |
Simple retries might help us limp along. But long term we likely must run our own container image registry and mirror external content. Requiring the internet to be consistent/coherent in order to build is always going to be problematic. Failing to build because we can't |
sounds good, I agree! |
@Verolop we've just discussed that there are a number of issues that are relatively similar to this, so I've tweaked the description slightly to cover the more abstracted case. There's max half a dozen points where a minimal additional retry loop in the shell script could make us much more likely to survive these random transient failures and save tonnes of release time and effort. |
(ie: rather than close a bunch of issues and create new one...just re-using/re-focusing this one for broad impact) |
An idea which came into my mind: What if we add a krel subcommand for pushing the git objects? Seems fairly straight forward and we could remove the bash bits from anago. Then I'd like to enhance the logging via logrus and maybe add some pre-checks: For example making the call fail only in some certain cases and assume that "everything is ok" if the tag is already present remotely. WDYT? |
This is the type of decomposition we need. The anago bash bits doing that push can be removed to instead have anago call a more robust pusher. |
Related on the topic of fail/retry/continue resilience: |
Does it still make sense to introduce the retries at this point, or should we just go ahead with @saschagrunert 's idea? |
We definitely need some retries still in anago too. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale This was mostly addressed in #1595 unless there are more suggestions and/or comments I think we can close this one |
Agree, if more things come up we can create a new issue /close |
@cpanato: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What would you like to be added:
Adding some retries around failure points in release scripting before failing completely. Eg:
Why is this needed:
In case of failure, often we just need to hack workaround and attempt to re-run the release process, which is very time and resource consuming. Adding retries will allow these attempts to take place on the same run.
The text was updated successfully, but these errors were encountered: