Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

operatorhubio-catalog is scheduled to run on a MS Windows worker node #1119

Closed
HansK-p opened this issue Nov 10, 2019 · 5 comments
Closed

operatorhubio-catalog is scheduled to run on a MS Windows worker node #1119

HansK-p opened this issue Nov 10, 2019 · 5 comments

Comments

@HansK-p
Copy link

HansK-p commented Nov 10, 2019

Bug Report

What did you do?
A clear and concise description of the steps you took (or insert a code snippet).
This is an AKS cluster running K8s 1.14.8 with multiple node pools enabled and both a Linux and a MS Windows Node Pool. The single node in the MS Windows Nodepool is tainted:

taints:
  - effect: NoSchedule
    key: os
    value: Win2019

The operator has been installed with and unmodified version of the install script, that is:

/install.sh 0.12.0
What we see is that the pod operatorhubio-catalog-h4zqh is scheduled to run on the MS Windows node, which of course doesn't work.

**What did you expect to see?**
I would expect the pod operatorhubio-catalog-h4zqh to end up on an Linux node. Especially as the MS Windows node is tainted.

**What did you see instead? Under which circumstances?**
The pod operatorhubio-catalog-h4zqh ended up on a MS Windows worker node.

**Environment**
* operator-lifecycle-manager version:

 0.12

* Kubernetes version information:

Client Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.2", GitCommit:"c97fe5036ef3df2967d086711e6c0c405941e14b", GitTreeState:"clean", BuildDate:"2019-10-15T19:18:23Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.8", GitCommit:"211047e9a1922595eaa3a1127ed365e9299a6c23", GitTreeState:"clean", BuildDate:"2019-10-15T12:02:12Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"linux/amd64"}

* Kubernetes cluster kind: Azure AKS cluster with support for multiple node pools and an active MS Windows node-pool.

**Possible Solution**
Make sure that the deployment of pod  operatorhubio-catalog-h4zqh respect taints and/or a node selector mechanism is used (as it already is for deployments in the same namespace).

**Additional context**
Add any other context about the problem here.
@HansK-p HansK-p changed the title operatorhubio-catalog ends up on MS Windows worker node operatorhubio-catalog is scheduled to run on a MS Windows worker node Nov 10, 2019
@exdx
Copy link
Member

exdx commented Nov 10, 2019

Hi @HansK-p, thanks for bringing this issue up. This is something we've been discussing this past week as we see more mixed linux and windows OS node pools. We will look into this and provide a fix.

@exdx
Copy link
Member

exdx commented Nov 18, 2019

@HansK-p now that I'm looking at the issue more closely, what you're describing is basically a dedicated node with a specific taint, and only pods which have tolerations which match would be scheduled, per https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/#example-use-cases.

The catalog source was scheduled onto the Windows node because catalog source pods have toleration operators set to Exist which is essentially a wildcard toleration. Exists is equivalent to wildcard for value, so the catalog pod can tolerate all taints of a particular category.

We can add a NodeSelector to the pod, which would help avoid the issue from occurring again (since that label is not present on the node).

@HansK-p
Copy link
Author

HansK-p commented Nov 18, 2019

I saw the Exists toleration, but didn't know that it was that "effective". The pod was scheduled on the MS Windows node even after I marked the node as unschedulable (uncordon). This was slightly frustrating....

Adding a NodeSelector to the pod sounds like a good idea. It should never be scheduled to run on anything but a Linux node (unless it actually works). I assume the NodeSelector will be:

  nodeSelector:
    beta.kubernetes.io/os: linux

@exdx
Copy link
Member

exdx commented Nov 19, 2019

Thanks @HansK-p. We've merged the NodeSelector fix and will QA it. In the meantime, if you need the fix in your cluster, try editing the pod spec and adding the NodeSelector manually, that should help.

@exdx
Copy link
Member

exdx commented Dec 10, 2019

I'm going to close this as we have developed and backported a fix in the upcoming release cycle. Please feel free to reopen if you see this issue again 🐛

@exdx exdx closed this as completed Dec 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants