Workspace pod FailedScheduling event not being properly caught #977

AObuchow · 2022-11-14T22:55:42Z

Description

In some cases, a workspace deployment's pod will trigger the FailedScheduling event and DWO will not be able to properly catch it.
So far, this seems to occur when a workspace requests more memory or CPU than is available on any node on an OpenShift cluster.

What I have observed is that, although DWO sees the pod's FailedScheduling event, the event.count is set to 0. DWO then checks to see if the event.count is higher than the threshold required to report an error. Since the threshold is 1 (i.e. event.count must = 1 for an error to be reported), the error is never reported, and eventually, the workspace times out.

I have been trying to narrow down the exact cause of this bug, but have been unsuccessful thus far.
However, here are my current findings:

This issue seems to be consistently reproducible on OpenShift 4.11 and 4.12. On OpenShift 4.10, the FailedScheduling event is sometimes properly caught, and in other cases, the workspace times out.
- When this bug occurs (on OpenShift 4.10, 4.11 & 4.12) the event.count is set to 0 and event.series is nil (so event.series.count is unusable)
This issue does not occur on Minikube v1.27.1 with Kubernetes v1.25.2
- When testing, the event.count was set to 1 upon first encountering the FailedSchedulingevent, and the count increments with each occurrence of the event. The event.series is nil.
The event.count field has been deprecated in Kubernetes v1.25, and instead the event.series.count field is supposed to be used. However, in my testing on Minikube, the event.series field was still nil.
- The new event API is k8s.io/api/events/v1 (and the deprecated one which is currently in use in DWO comes from k8s.io/api/core/v1)
  - I have tried using k8s.io/api/events/v1 but encountered the same behaviour as k8s.io/api/core/v1: on affected OpenShift versions, event.count is set to 0 and event.series is nil. There is also auto-generated code to convert back and forth between core v1 events and events v1 (here and here)
The default Kubernetes scheduler (which is also used by OpenShift, unless configured otherwise) seems to use the following event recorder implementation, which does not initialize the event count field (thus, we'd expect event.count to be initialized to 0, but oddly, this is not the case on Kubernetes v1.25).
- The call stack seems to be: kube-scheduler executes runCommand -> Setup is called, which executes options.Config() -> The kube-scheduler's event broadcaster is initialized as an event broadcaster adapter. The event broadcaster adapter checks if the cluster supports k8s.io/api/events/v1 resources (i.e. the new events API), and if so, an events/v1 EventBroadcaster will be used (otherwise a core/v1 events EventBroadcaster is used). In my testing on OpenShift 4.11 and Minikube, k8s.io/api/events/v1 resources were supported, so the new event API should be in use by the EventBroadcaster.
- Continuing the call stack: The kube-scheduler obtains an (event) recorderFactory from the k8s.io/api/events/v1 EventBroadcaster. This recorderFactory is used by the kubernetes scheduler struct: a map of scheduling profiles to frameworks allows the association of pods to the k8s.io/api/events/v1 event recorder when a pod is being scheduled. Finally, when the scheduler's failure handler is run, the FailedScheduling event is created using the k8s.io/api/events/v1 event recorder.

Other notes:

@ibuziuk reported that the FailedScheduling event was being caught correctly (on CRC 4.10.14 I believe) when it was caused by a disk pressure issue. I am not sure whether there is something particular about disk-pressure warnings that cause them to be caught, or if this is a case of the FailedScheduling event being sometimes caught on OpenShift 4.10
I haven't been able to find any Kubernetes or OpenShift bugs directly related to this issue. However, I did find this Handle K8s events with invalid count or invalid Timestamps elastic/beats#31126 which describes a similar issue with the event count being set to 0.
- I considered using a similar approach to this PR, however this could still lead to breaking DWO's detection of the FailedMount event, as we allow 3 FailedMount events to occur before reporting an error.
- I'm not sure if it is easy to reproduce the FailedMount event for testing purposes, but if its event count is being correctly set and incremented, then the above approach could suffice.
We could add extra checks (after checking pod events) for the pod.status.conditions to see if the PodScheduled condition is set to False with the Unschedulable reason, and fail the workspace if this condition is found. However, this seems less fine-grained than checking pod events. Perhaps there's another way to determine if a pod will not be able to be scheduled, other than checking pod events and pod status conditions?

How To Reproduce

Apply either of the following devworkspaces to an OpenShift 4.11 or 4.12 cluster (on 4.10, this issue may or may not occur), and see that they will time out rather than fail immediately:

kind: DevWorkspace
apiVersion: workspace.devfile.io/v1alpha2
metadata:
  name: theia-next-high-cpu
spec:
  started: true
  template:
    projects:
      - name: web-nodejs-sample
        git:
          remotes:
            origin: "https://github.com/che-samples/web-nodejs-sample.git"
    components:
      - name: theia
        plugin:
          uri: https://che-plugin-registry-main.surge.sh/v3/plugins/eclipse/che-theia/next/devfile.yaml
          components:
            - name: theia-ide
              container:
                env:
                  - name: THEIA_HOST
                    value: 0.0.0.0
                memoryRequest: 2Gi
                memoryLimit: 16Gi
                cpuRequest: 4000m
                cpuLimit: 8000m
    commands:
      - id: say-hello
        exec:
          component: theia-ide
          commandLine: echo "Hello from $(pwd)"
          workingDir: ${PROJECTS_ROOT}/project/app

kind: DevWorkspace
apiVersion: workspace.devfile.io/v1alpha2
metadata:
  name: theia-next-high-mem
spec:
  started: true
  template:
    projects:
      - name: web-nodejs-sample
        git:
          remotes:
            origin: "https://github.com/che-samples/web-nodejs-sample.git"
    components:
      - name: theia
        plugin:
          uri: https://che-plugin-registry-main.surge.sh/v3/plugins/eclipse/che-theia/next/devfile.yaml
          components:
            - name: theia-ide
              container:
                env:
                  - name: THEIA_HOST
                    value: 0.0.0.0
                memoryRequest: 32Gi
                memoryLimit: 64Gi
                cpuRequest: 500m
                cpuLimit: 2000m
    commands:
      - id: say-hello
        exec:
          component: theia-ide
          commandLine: echo "Hello from $(pwd)"
          workingDir: ${PROJECTS_ROOT}/project/app

When checking the workspace pod with kubectl/oc or the OpenShift UI, you can see that the pod is pending and that an event similar to the following has occurred:
0/6 nodes are available: 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }, 6 Insufficient cpu. preemption: 0/6 nodes are available: 3 No preemption victims found for incoming pod, 3 Preemption is not helpful for scheduling.

Expected behaviour

The FailedScheduling event should be caught and reported, and the workspace should be failed immediately.

Additional context

Downstream issue

The text was updated successfully, but these errors were encountered:

ibuziuk · 2022-11-15T14:04:40Z

@AObuchow thank you for your investigation. I believe we need to report the issue to OpenShift BZ if the event count is not initialized / incremented correctly.

on the DWO end in order to mitigate the issue I would propose to have smth like UnrecoverablePodConditions and track Unschedulable in a similar fashion we track UnrecoverableEvents @l0rd @amisevsk wdyt?

status:
  phase: Pending
  conditions:
    - type: PodScheduled
      status: 'False'
      lastProbeTime: null
      lastTransitionTime: '2022-11-15T13:56:48Z'
      reason: Unschedulable
      message: >-
        0/15 nodes are available: 3 node(s) had taint
        {node-role.kubernetes.io/infra: }, that the pod didn't tolerate, 3
        node(s) had taint {node-role.kubernetes.io/master: }, that the pod
        didn't tolerate, 9 Insufficient memory.
  qosClass: Burstable

amisevsk · 2022-11-15T15:32:56Z

I don't recall why we didn't initially have an implementation that looks at pod conditions as well as deployment conditions, but this approach makes sense to me.

Fix devfile#977 Signed-off-by: Andrew Obuchowicz <[email protected]>

l0rd · 2022-11-15T23:09:41Z

Looking at the pod condition looks like a great idea.

I also think that checking that event.count is higher than a threshold is not required. A FailedScheduling should be reported as early as possible to the developer by the dashboard (and eventually let the developer decide to wait / retry / cancel).

amisevsk · 2022-11-16T00:34:21Z

I also think that checking that event.count is higher than a threshold is not required. A FailedScheduling should be reported as early as possible to the developer by the dashboard (and eventually let the developer decide to wait / retry / cancel).

The reason we have the threshold mechanism is that we were seeing random workspace failures due to occasional FailedMount events -- see #778. However, if count is never being incremented anymore, that functionality likely does not work anyways.

Another solution here might be to use thresholds of 0 instead of 1; that would amount to not checking the threshold at all.

AObuchow · 2023-02-16T21:17:27Z

The fix for this issue (0cad9a0) is effectively obsolete from the fix for #987. However, 0cad9a0 has the downside that, the user will be unable to ignore the FailedScheduling event if they chose to do so (because the PodUnschedulable condition will still be caught).

Thus, I think it's worth reverting the work done in 0cad9a0 as it has no added benefits, and has the disadvantage of preventing users from ignoring the FailedScheduling event.

See #1046

AObuchow added the bug Something isn't working label Nov 14, 2022

AObuchow added a commit to AObuchow/devworkspace-operator that referenced this issue Nov 15, 2022

fix: check workspace pod for unschedulable condition

9e7f9fe

Fix devfile#977 Signed-off-by: Andrew Obuchowicz <[email protected]>

AObuchow added a commit to AObuchow/devworkspace-operator that referenced this issue Nov 15, 2022

fix: check workspace pod for unschedulable condition

ecf946c

Fix devfile#977 Signed-off-by: Andrew Obuchowicz <[email protected]>

AObuchow added a commit to AObuchow/devworkspace-operator that referenced this issue Nov 15, 2022

fix: check pods for unschedulable condition

f994fa6

Fix devfile#977 Signed-off-by: Andrew Obuchowicz <[email protected]>

AObuchow added a commit to AObuchow/devworkspace-operator that referenced this issue Nov 15, 2022

fix: check pods for unschedulable condition

794227a

Fix devfile#977 Signed-off-by: Andrew Obuchowicz <[email protected]>

AObuchow added a commit to AObuchow/devworkspace-operator that referenced this issue Nov 15, 2022

fix: check pods for unschedulable condition

e9feebf

Fix devfile#977 Signed-off-by: Andrew Obuchowicz <[email protected]>

AObuchow mentioned this issue Nov 15, 2022

Add check for pod unschedulable conditon #978

Merged

3 tasks

ibuziuk closed this as completed in 0cad9a0 Nov 17, 2022

AObuchow mentioned this issue Nov 30, 2022

Cannot properly catch pod events on OpenShift 4.10/4.11/4.12 #987

Closed

amisevsk added this to the v0.18.x milestone Feb 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Workspace pod FailedScheduling event not being properly caught #977

Workspace pod FailedScheduling event not being properly caught #977

AObuchow commented Nov 14, 2022 •

edited

Loading

ibuziuk commented Nov 15, 2022

amisevsk commented Nov 15, 2022

l0rd commented Nov 15, 2022

amisevsk commented Nov 16, 2022

AObuchow commented Feb 16, 2023 •

edited

Loading

Workspace pod FailedScheduling event not being properly caught #977

Workspace pod FailedScheduling event not being properly caught #977

Comments

AObuchow commented Nov 14, 2022 • edited Loading

Description

How To Reproduce

Expected behaviour

Additional context

ibuziuk commented Nov 15, 2022

amisevsk commented Nov 15, 2022

l0rd commented Nov 15, 2022

amisevsk commented Nov 16, 2022

AObuchow commented Feb 16, 2023 • edited Loading

AObuchow commented Nov 14, 2022 •

edited

Loading

AObuchow commented Feb 16, 2023 •

edited

Loading