Unable to set an custom nodeAffinity via values.yaml #495

comjf · 2021-10-07T17:08:38Z

Problem:
Unable to add custom .Values.affinity because of duplicate nodeAffinity definition

This is because you have already defined a default affinity in your helm deployment.yaml template

Bug-in-action:
my-values.yaml

emitKubernetesEvents: true
checkASGTagBeforeDraining: true
enablePrometheusServer: true
enableSqsTerminationDraining: true
ignoreDaemonSets: true
managedAsgTag: aws-node-termination-handler/managed/cluster
queueURL: something-valid
serviceAccount:
  create: true
  name: node-termination-handler
  annotations:
    eks.amazonaws.com/role-arn: "something-valid"
affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
      - matchExpressions:
        - key: node-role
          operator: In
          values:
          - system

Helm generated output (using 0.15.3 on a dry run install: helm install . --dry-run --debug -f ../my-values.yaml --generate-name)

    spec:
      priorityClassName: "system-node-critical"
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                - key: "kubernetes.io/os"
                  operator: In
                  values:
                    - linux
                - key: "kubernetes.io/arch"
                  operator: In
                  values:
                    - amd64
                    - arm64
                    - arm
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: node-role
                operator: In
                values:
                - system
...

As you can see there are duplicates, and what ends up happening is that helm drops the default (your) nodeAffinity in favor of the values provided one. On my deployed pod, only that second affinity is listed.

In other words, this node-termination-handler bug doesn't affect our current deployment, however this behavior does fully break when using newer versions of fluxcd/flux2 and more specifically kustomize 4 as it now errors out on duplicate yaml keys

I know ya'll probably don't care too much about some tool on top of helm, but I believe more people will run into this and there is an underlying bug here with the way the template is written, so I figured best to report.

Proposed Solution
Why do we even need the hard-coded affinity on the deployment.yaml template?? I can see how it makes a difference for the daemonset approach, but for the SQS listener... is it really needed?

I tried to reconcile the two so you could pass in .Values.affinity and it will merge with your default hardcoded affinity... but I'm having difficulty getting that to work because of the _helper template and includes that are used to build the default affinity. If this is really needed, I'm open to suggestions on how to fix.

Thanks!

The text was updated successfully, but these errors were encountered:

bwagner5 · 2021-10-08T02:07:49Z

We definitely care about this error, and thank you for reporting! We can probably move the nodeAffinity to a node selector which is probably easier to merge with custom helm values, but I think you're right that we only need that on the DaemonSet for IMDS mode. We'll take a closer look at this soon and get to a solution!

ngoyal16 · 2021-10-19T06:01:30Z

@bwagner5 i am happy to work on this need a guidance on which which files needs to change for this.

bwagner5 · 2021-10-19T18:01:01Z

@ngoyal16 thanks for the willingness to help out with this!

I believe we can remove the arch affinity completely on the DaemonSet (both windows and linux) and Deployment. We can then put an OS node selector on the DaemonSet for Linux and Windows respectively.

That will get rid of any hard coding affinities which should clear up the duplicate nodeAffinity key and let us not have to deal with a complicated merge.

ngoyal16 · 2021-10-20T02:32:51Z

@bwagner5 i have fix this in deployment and windows daemonset as those are only containing arch and node selector in nodeAffinitiy.

But linux daemonset contains one more affinity rule which is not run this on farget compute-type. Any idea how we can fix this.

ngoyal16 · 2021-10-22T02:25:51Z

@bwagner5 i have fix this in deployment and windows daemonset as those are only containing arch and node selector in nodeAffinitiy.

But linux daemonset contains one more affinity rule which is not run this on farget compute-type. Any idea how we can fix this.

@snay2 can you help me on this

snay2 · 2021-10-22T22:22:21Z

@ngoyal16 The reason we don't want NTH to get scheduled onto a Fargate node is because the deployment fails because it's a daemonset. More details here: aws/eks-charts#198

I'd suggest pursuing the following questions:

Is the problem with Fargate and daemonsets still an issue today (>1 year later)?
Is there another way we can restrict NTH from being scheduled onto a Fargate node other than an affinity?
If not and we do indeed need to keep the anti-Fargate affinity, then we will need to merge our nodeAffinity definition with whatever gets passed in via .Values.affinity. Does YAML support a way of doing this kind of a merge? If not, are there backwards-compatible ways we can restructure the inputs from values.yaml (perhaps by making them more granular) such that we can write the template to include our affinity seamlessly with the values the customer passes in?

I'll be offline next week, but hopefully this gives you a starting point, and my team can contribute their ideas as well.

ngoyal16 · 2021-10-23T00:57:29Z

@snay2 can't we add following key pair in node selector for deamonset

eks.amazonaws.com/compute-type : ec2

As we need it on EC2 servers only

akhfa · 2021-11-17T14:03:31Z

Actually, this is also happened with nodeSelector like this helm values for IMDS

nodeSelector:
  node-role.kubernetes.io/spot-worker: true

It will make nodeSelector to have 2 values:

      nodeSelector:
        kubernetes.io/os: linux
        node-role.kubernetes.io/spot-worker: true

ngoyal16 · 2021-11-18T01:34:30Z

@akhfa I think nodeSelector allow multiple key value pairs. so it won't be the problem

github-actions · 2021-12-18T17:07:25Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If you want this issue to never become stale, please ask a maintainer to apply the "stalebot-ignore" label.

github-actions · 2021-12-23T17:08:25Z

This issue was closed because it has become stale with no activity.

comjf · 2022-01-04T10:03:25Z

I believe this needs to be re-opened as it's an acknowledged bug?

stevehipwell · 2022-01-24T09:49:29Z

@comjf could you try with the updated chart on the main branch (note the breaking changes in the PR)? This chart hasn't been released yet but should solve your issue, if it doesn't it'd be good to know.

bwagner5 added the Type: Bug Something isn't working label Oct 8, 2021

jillmon assigned ngoyal16 Oct 19, 2021

jillmon added Priority: Low This issue will not be seen by most users. The issue is a very specific use case or corner case Status: Work in Progress labels Oct 19, 2021

ngoyal16 mentioned this issue Oct 20, 2021

Fix custom nodeAffinity via values.yaml #515

Merged

github-actions bot added the stale Issues / PRs with no activity label Dec 18, 2021

github-actions bot closed this as completed Dec 23, 2021

snay2 reopened this Jan 4, 2022

snay2 added stalebot-ignore To NOT let the stalebot update or close the Issue / PR and removed stale Issues / PRs with no activity labels Jan 4, 2022

jillmon unassigned ngoyal16 Feb 23, 2022

jillmon closed this as completed May 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to set an custom nodeAffinity via values.yaml #495

Unable to set an custom nodeAffinity via values.yaml #495

comjf commented Oct 7, 2021 •

edited

Loading

bwagner5 commented Oct 8, 2021

ngoyal16 commented Oct 19, 2021

bwagner5 commented Oct 19, 2021

ngoyal16 commented Oct 20, 2021

ngoyal16 commented Oct 22, 2021

snay2 commented Oct 22, 2021

ngoyal16 commented Oct 23, 2021

akhfa commented Nov 17, 2021

ngoyal16 commented Nov 18, 2021 •

edited

Loading

github-actions bot commented Dec 18, 2021

github-actions bot commented Dec 23, 2021

comjf commented Jan 4, 2022

stevehipwell commented Jan 24, 2022

Unable to set an custom nodeAffinity via values.yaml #495

Unable to set an custom nodeAffinity via values.yaml #495

Comments

comjf commented Oct 7, 2021 • edited Loading

bwagner5 commented Oct 8, 2021

ngoyal16 commented Oct 19, 2021

bwagner5 commented Oct 19, 2021

ngoyal16 commented Oct 20, 2021

ngoyal16 commented Oct 22, 2021

snay2 commented Oct 22, 2021

ngoyal16 commented Oct 23, 2021

akhfa commented Nov 17, 2021

ngoyal16 commented Nov 18, 2021 • edited Loading

github-actions bot commented Dec 18, 2021

github-actions bot commented Dec 23, 2021

comjf commented Jan 4, 2022

stevehipwell commented Jan 24, 2022

comjf commented Oct 7, 2021 •

edited

Loading

ngoyal16 commented Nov 18, 2021 •

edited

Loading