Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Support OOMKilled with container-set. Fixes #10063 #11484

Conversation

alexec
Copy link
Contributor

@alexec alexec commented Jul 31, 2023

Fixes #10063

@alexec alexec linked an issue Jul 31, 2023 that may be closed by this pull request
3 tasks
@JPZ13
Copy link
Member

JPZ13 commented Jul 31, 2023

Tagging @isubasinghe. Can you check this PR accounting for the expected changes you might push with your container set work?

@JPZ13 JPZ13 requested a review from isubasinghe July 31, 2023 13:20
Signed-off-by: Alex Collins <[email protected]>
@@ -70,24 +70,38 @@ func NewEmissaryCommand() *cobra.Command {
return fmt.Errorf("failed to unmarshal template: %w", err)
}

// setup signal handlers
signals := make(chan os.Signal, 1)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved from lower in the file.

@@ -7,6 +7,10 @@ import (
"github.com/argoproj/argo-workflows/v3/util/errors"
)

var (
Term = os.Interrupt
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is to os.Term because Windows does not have that. We fake it.

Signed-off-by: Alex Collins <[email protected]>
Signed-off-by: Alex Collins <[email protected]>
Signed-off-by: Alex Collins <[email protected]>
Comment on lines 97 to 107
data, err := os.ReadFile(filepath.Clean(varRunArgo + "/ctr/" + y + "/exitcode"))
if os.IsNotExist(err) {
time.Sleep(time.Second)
continue
}
exitCode, err := strconv.Atoi(string(data))
if err != nil {
return fmt.Errorf("failed to read exit-code of dependency %q: %w", y, err)
}
if exitCode != 0 {
return fmt.Errorf("dependency %q exited with non-zero code: %d", y, exitCode)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need a rebase here

@alexec alexec marked this pull request as ready for review August 2, 2023 01:17
@terrytangyuan terrytangyuan enabled auto-merge (squash) August 3, 2023 11:53
@terrytangyuan
Copy link
Member

Can you resolve conflicts?

Copy link
Member

@isubasinghe isubasinghe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested manually myself, verified to fix #10063.
Approved

@terrytangyuan terrytangyuan enabled auto-merge (squash) August 4, 2023 11:00
@terrytangyuan terrytangyuan merged commit b26f5b8 into master Aug 4, 2023
@terrytangyuan terrytangyuan deleted the 10063-pod-hangs-when-container-in-containerset-is-oom-killed branch August 4, 2023 21:15
@caelan-io
Copy link
Member

caelan-io commented Aug 25, 2023

@alexec Hi Alex! I'm not sure if you saw, but @terrytangyuan posted about difficulty merging this into the patch release branch. (see notes below from the cherry-pick release ticket)

Are you able to look into this and submit a PR to the v3.4.11 release branch?

Thanks again for pushing this fix! Since users had wanted it in 3.4.10, we wanted to see if it's possible to squeeze into 3.4.11.

These cannot be cherry-picked since the v3.4.10 release branch diverges from master branch too much and have non-trivial conflicts to resolve. If you'd like to get them into this release, please submit a PR to the release branch by the end of next Tuesday. https://github.com/argoproj/argo-workflows/tree/release-3.4.10

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pod hangs when container in ContainerSet is OOM Killed
6 participants