-
Notifications
You must be signed in to change notification settings - Fork 580
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
component Deployment affinity not set #2361
Comments
I agree we should probably do this. |
/priority important-soon |
PTAL: #2377 |
I think we should backport this, as this can be considered as bugs. |
We're using the control plane node label as a proxy for "this node has the IAM instance profile the CAPA controller needs." When CAPA uses bootstrap credentials, it can run on any node. In these two cases, required affinity to control plane nodes is not necessary, and in the EKS case, the lack of control plane nodes prevents CAPA from running. Having a label that tells us what the IAM instance profile would help. But regardless of what label we use, I think we'll want to use required anti-affinity. In bootstrap and EKS clusters, zero nodes get labeled "cannot run CAPA," and in a standard EC2 cluster, all non-control-plane nodes get labeled "cannot run CAPA." |
@dlipovetsky I was following until the anti-affinity part, why use anti-affinity instead of an affinity with some new labels? |
My mistake. I forgot that there is no concept of node anti-affinity 😞
|
@dlipovetsky @dkoshkin - i like the idea of using node labels and affinity. From the EKS side this would work as we can create a node group with labels if we want to segregate CAPA. We could also consider taints/tolerations (we have #2399 to add taint support to managed machine pools) |
+1 to labels and affinity. |
Let's discuss this on May 31st meeting, I added it to the agenda. |
We're probably going to have to use a combination of tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/master
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: node-role.kubernetes.io/master
operator: Exists We should probably introduce a env var in the definitions to specify the controlplane node label name and default it to tolerations:
- effect: NoSchedule
key: ${K8S_MASTER_LABEL:=node-role.kubernetes.io/master}
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: ${K8S_MASTER_LABEL:=node-role.kubernetes.io/master}
operator: Exists I could do this as part of #2570 as i am making changes to the managers yaml anyway. |
/assign |
@richardcase does it make sense to add both labels similar to below? https://github.com/kubernetes-sigs/cluster-api-provider-aws/pull/2377/files#diff-44a79e28b76ffb84ad06c92f1ddc7296566adf4986fea1ee4e73db546c35fcebR28-R31
|
@dkoshkin - yes I think that's a good idea and it's what the existing PR has (I missed that). And then based on that we could make the label configurable: key: ${K8S_CP_LABEL:=node-role.kubernetes.io/control-plane} and leave the old master key as is without envsubst (i.e. |
PR created by: /assign vespian |
/kind bug
What steps did you take and what happened:
[A clear and concise description of what the bug is.]
Download https://github.com/kubernetes-sigs/cluster-api-provider-aws/releases/download/v0.6.5/infrastructure-components.yaml
Both deployments have:
But there is no affinity or node-selector to run on the control-plane.
Following the quickstart there will be 2 IAM polices, 1 that is used by the workers and 1 by the control-plane.
When running
clusterctl move
the controllers could land on worker nodes.What did you expect to happen:
The controller (and webhook?) should run on the control-plane nodes with elevated permissions.
Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]
Environment:
https://github.com/kubernetes-sigs/cluster-api-provider-aws/releases/tag/v0.6.5
kubectl version
):/etc/os-release
):The text was updated successfully, but these errors were encountered: