-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Fixed hanging http template manual retry with previous outputs. Fixes #11889 #12122
Conversation
…ixes argoproj#11889 Signed-off-by: Wesley Scholl <[email protected]>
Signed-off-by: Wesley Scholl <[email protected]>
Signed-off-by: Wesley Scholl <[email protected]>
Signed-off-by: Wesley Scholl <[email protected]>
cc @toyamagu-2021 Could you help review this? |
…rgoproj#11889 Signed-off-by: Wesley Scholl <[email protected]>
Apologies for multiple commits, I'm having GitHub Codespaces gpg key issues. |
…#11889 Signed-off-by: GitHub <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for submitting PR.
We should delete unnecessary file (likely from codespace?) and improve logging, but this PR fixes #11189.
FYI: root cause is described in original issue
Signed-off-by: GitHub <[email protected]>
Removed the modifications from Output from the CLI: webpack 5.89.0 compiled with 17 warnings in 82357 ms
Done in 85.07s.
# Pack UI into a Go file
/home/vscode/go/bin/staticfiles -o server/static/files.go ui/dist/app |
Huh we do have an exact version specified in the Makefile, odd that it produced different output. |
woc.log.Warnf("[SPECIAL][DEBUG] returning but assumed validity before") | ||
woc.log.Errorf("[DEBUG] Was unable to obtain node for %s", nodeID) | ||
return err | ||
woc.log.Info("Continuing to retry") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should definitely be more specific here -- if you read this message in the Controller's logs, it would be pretty ambiguous.
woc.log.Info("Continuing to retry") | |
woc.log.Debugf("Unable to obtain node for %s during TaskSet Reconciliation. Expected during a retry, continuing", nodeId) |
I also think debug
level makes more sense here, since this is now an expected error
After further testing, I discovered that the outputs from the failed pods were saved and not cleared. As mentioned here,
Adding this line at the beginning of ArgoWorkflowsManualRetryClearedOutputsFixed.mov |
This issue has been fixed in >= v3.5.5. ArgoMultiStepFailRetrySucccess720.mov |
Superseded by #12620 based on this issue comment. Thanks all! |
This PR Fixes #11889.
Motivation
Created the initial issue and indicated that I'd be willing to submit a PR to fix this issue.
Changes
As suggested here, Replaced
return err
withcontinue
to allowTaskSets
to be reconciled when manually retrying with previous outputs.Screenshots
Step Workflow With Exit Hook:
ManualRetryWithPreviousOutputsFix720.mov
Step Workflow Without Exit Hook:
Single Workflow With Exit Hook:
Single Workflow Without Exit Hook:
Multistep Workflow With Exit Hook:
Multistep Workflow Without Exit Hook:
Verification
Tested and verified manual retry does not hang for: