Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

job_monitor: OOMKilled job is sometimes considered successful #396

Closed
mdonadoni opened this issue Aug 2, 2023 · 0 comments · Fixed by #397
Closed

job_monitor: OOMKilled job is sometimes considered successful #396

mdonadoni opened this issue Aug 2, 2023 · 0 comments · Fixed by #397

Comments

@mdonadoni
Copy link
Member

In some particular cases, jobs killed after running out of memory are considered as successful by job monitor and their status is set to finished. Thus, the workflow continues running even though it should stop its execution.

Workflow that reproduces the issue in REANA:

version: 0.9.0
inputs: {}
workflow:
  type: serial
  specification:
    steps:
      - environment: 'ubuntu:22.04'
        commands:
          - echo before; tail /dev/zero; echo after

This happens because a pod can reach the Succeeded phase, even though the termination reason is OOMKilled, for example:

apiVersion: v1
kind: Pod
metadata:
  name: test
spec:
  containers:
  - name: test
    image: ubuntu:22.04
    command:
      - bash
      - "-c"
      - "echo before; tail /dev/zero; echo after;"
    resources:
      requests:
        memory: 64M
      limits:
        memory: 64M
  restartPolicy: Never

Final status:

$  kubectl get pod/test -o yaml
[...]
status:
  conditions:
    [...]
  containerStatuses:
  - containerID: [...]
    image: docker.io/library/ubuntu:22.04
    imageID: [...]
    lastState: {}
    name: test
    ready: false
    restartCount: 0
    started: false
    state:
      terminated:
        containerID: [...]
        exitCode: 0
        finishedAt: "2023-08-02T14:50:32Z"
        reason: OOMKilled
        startedAt: "2023-08-02T14:50:32Z"
  phase: Succeeded
  [...]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant