Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to set an custom nodeAffinity via values.yaml #495

Closed
comjf opened this issue Oct 7, 2021 · 13 comments
Closed

Unable to set an custom nodeAffinity via values.yaml #495

comjf opened this issue Oct 7, 2021 · 13 comments
Labels
Priority: Low This issue will not be seen by most users. The issue is a very specific use case or corner case stalebot-ignore To NOT let the stalebot update or close the Issue / PR Status: Work in Progress Type: Bug Something isn't working

Comments

@comjf
Copy link

comjf commented Oct 7, 2021

Problem:
Unable to add custom .Values.affinity because of duplicate nodeAffinity definition

This is because you have already defined a default affinity in your helm deployment.yaml template

Bug-in-action:
my-values.yaml

emitKubernetesEvents: true
checkASGTagBeforeDraining: true
enablePrometheusServer: true
enableSqsTerminationDraining: true
ignoreDaemonSets: true
managedAsgTag: aws-node-termination-handler/managed/cluster
queueURL: something-valid
serviceAccount:
  create: true
  name: node-termination-handler
  annotations:
    eks.amazonaws.com/role-arn: "something-valid"
affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
      - matchExpressions:
        - key: node-role
          operator: In
          values:
          - system

Helm generated output (using 0.15.3 on a dry run install: helm install . --dry-run --debug -f ../my-values.yaml --generate-name)

    spec:
      priorityClassName: "system-node-critical"
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                - key: "kubernetes.io/os"
                  operator: In
                  values:
                    - linux
                - key: "kubernetes.io/arch"
                  operator: In
                  values:
                    - amd64
                    - arm64
                    - arm
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: node-role
                operator: In
                values:
                - system
...

As you can see there are duplicates, and what ends up happening is that helm drops the default (your) nodeAffinity in favor of the values provided one. On my deployed pod, only that second affinity is listed.

In other words, this node-termination-handler bug doesn't affect our current deployment, however this behavior does fully break when using newer versions of fluxcd/flux2 and more specifically kustomize 4 as it now errors out on duplicate yaml keys

I know ya'll probably don't care too much about some tool on top of helm, but I believe more people will run into this and there is an underlying bug here with the way the template is written, so I figured best to report.

Proposed Solution
Why do we even need the hard-coded affinity on the deployment.yaml template?? I can see how it makes a difference for the daemonset approach, but for the SQS listener... is it really needed?

I tried to reconcile the two so you could pass in .Values.affinity and it will merge with your default hardcoded affinity... but I'm having difficulty getting that to work because of the _helper template and includes that are used to build the default affinity. If this is really needed, I'm open to suggestions on how to fix.

Thanks!

@bwagner5 bwagner5 added the Type: Bug Something isn't working label Oct 8, 2021
@bwagner5
Copy link
Contributor

bwagner5 commented Oct 8, 2021

We definitely care about this error, and thank you for reporting! We can probably move the nodeAffinity to a node selector which is probably easier to merge with custom helm values, but I think you're right that we only need that on the DaemonSet for IMDS mode. We'll take a closer look at this soon and get to a solution!

@ngoyal16
Copy link
Contributor

@bwagner5 i am happy to work on this need a guidance on which which files needs to change for this.

@bwagner5
Copy link
Contributor

@ngoyal16 thanks for the willingness to help out with this!

I believe we can remove the arch affinity completely on the DaemonSet (both windows and linux) and Deployment. We can then put an OS node selector on the DaemonSet for Linux and Windows respectively.

That will get rid of any hard coding affinities which should clear up the duplicate nodeAffinity key and let us not have to deal with a complicated merge.

@jillmon jillmon added Priority: Low This issue will not be seen by most users. The issue is a very specific use case or corner case Status: Work in Progress labels Oct 19, 2021
@ngoyal16
Copy link
Contributor

@bwagner5 i have fix this in deployment and windows daemonset as those are only containing arch and node selector in nodeAffinitiy.

But linux daemonset contains one more affinity rule which is not run this on farget compute-type. Any idea how we can fix this.

@ngoyal16
Copy link
Contributor

@bwagner5 i have fix this in deployment and windows daemonset as those are only containing arch and node selector in nodeAffinitiy.

But linux daemonset contains one more affinity rule which is not run this on farget compute-type. Any idea how we can fix this.

@snay2 can you help me on this

@snay2
Copy link
Contributor

snay2 commented Oct 22, 2021

@ngoyal16 The reason we don't want NTH to get scheduled onto a Fargate node is because the deployment fails because it's a daemonset. More details here: aws/eks-charts#198

I'd suggest pursuing the following questions:

  1. Is the problem with Fargate and daemonsets still an issue today (>1 year later)?
  2. Is there another way we can restrict NTH from being scheduled onto a Fargate node other than an affinity?
  3. If not and we do indeed need to keep the anti-Fargate affinity, then we will need to merge our nodeAffinity definition with whatever gets passed in via .Values.affinity. Does YAML support a way of doing this kind of a merge? If not, are there backwards-compatible ways we can restructure the inputs from values.yaml (perhaps by making them more granular) such that we can write the template to include our affinity seamlessly with the values the customer passes in?

I'll be offline next week, but hopefully this gives you a starting point, and my team can contribute their ideas as well.

@ngoyal16
Copy link
Contributor

@snay2 can't we add following key pair in node selector for deamonset

eks.amazonaws.com/compute-type : ec2

As we need it on EC2 servers only

@akhfa
Copy link

akhfa commented Nov 17, 2021

Actually, this is also happened with nodeSelector like this helm values for IMDS

nodeSelector:
  node-role.kubernetes.io/spot-worker: true

It will make nodeSelector to have 2 values:

      nodeSelector:
        kubernetes.io/os: linux
        node-role.kubernetes.io/spot-worker: true

@ngoyal16
Copy link
Contributor

ngoyal16 commented Nov 18, 2021

@akhfa I think nodeSelector allow multiple key value pairs. so it won't be the problem

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If you want this issue to never become stale, please ask a maintainer to apply the "stalebot-ignore" label.

@github-actions github-actions bot added the stale Issues / PRs with no activity label Dec 18, 2021
@github-actions
Copy link

This issue was closed because it has become stale with no activity.

@comjf
Copy link
Author

comjf commented Jan 4, 2022

I believe this needs to be re-opened as it's an acknowledged bug?

@snay2 snay2 reopened this Jan 4, 2022
@snay2 snay2 added stalebot-ignore To NOT let the stalebot update or close the Issue / PR and removed stale Issues / PRs with no activity labels Jan 4, 2022
@stevehipwell
Copy link
Contributor

@comjf could you try with the updated chart on the main branch (note the breaking changes in the PR)? This chart hasn't been released yet but should solve your issue, if it doesn't it'd be good to know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority: Low This issue will not be seen by most users. The issue is a very specific use case or corner case stalebot-ignore To NOT let the stalebot update or close the Issue / PR Status: Work in Progress Type: Bug Something isn't working
Projects
None yet
Development

No branches or pull requests

7 participants