Support CNI Custom Networking #519

dippynark · 2019-09-19T16:16:26Z

I'm submitting a...

bug report
feature request
support request
kudos, thank you, warm fuzzy

to add the ability to configure CNI custom networking

What is the current behavior?

Currently the module does not allow custom k8s configuration (like the auth config) to be applied before the worker nodes are created.

For CNI custom networking to work on a particular node, it is necessary for the aws-node daemonset to be patched (e.g. AWS_VPC_K8S_CNI_CUSTOM_NETWORK_CFG env var set), ENIConfig applied and the kubelet parameters (--node-labels) modified before any Pod not using the host's networking namespace is scheduled to it.

Currently it's possible for some pods not using host networking (e.g. coredns applied by EKS) to land on a node before the above has happened, so nodes can come up in different states (i.e. some using the node's subnet for Pod IPs, some using a secondary subnet and some using both) depending on where these Pods land. Problem nodes (i.e. any using the node's subnet for Pod IPs) then need to be terminated.

What's the expected behavior?

To allow for CNI custom networking to be configurable in the module and to order the creation of the worker nodes so that the necessary steps happen before Pods are scheduled.

Are you able to fix this problem and submit a PR? Link here if you have already.

We (me and others) would be keen to submit a fix, but this is not yet done. Our proposed solution would be to do something similar to how the aws auth config is applied, but wanted to gather feedback as to how narrow the aim of this PR should be (i.e. should it aim to just address the CNI specific stuff, or should arbitrary resources be allowed).

I feel the former (just the CNI config) would be best as most configuration doesn't have the same race condition as described above.

Any other relevant info

There may be a recommended way of addressing this problem in the current state of module, but have not come across it yet.

Adding the --node-labels flag to the kubelet is not an issue as it can be solved already by modifying the pre_userdata, but just included it here for completeness.

The text was updated successfully, but these errors were encountered:

max-rocket-internet · 2019-09-25T14:36:25Z

Related: #316

max-rocket-internet · 2019-09-25T15:49:30Z

Hey @dippynark 👋

Essentially, we try to resist requests like this one because:

It adds too much complexity to the module. This module is probably already the most (!??) complex in the registry
It can probably be achieve by wrapping this module in another module (or outside of this module)
There is MUCH better tools for this (Helm, Addons, Kustomize etc)
Where does it end? Which CNIs? What about Helm/Addons/storage-classes/SSM/insights/etc?

You can see some other examples referenced in this issue: #99

Please convince me otherwise though 😄

dippynark · 2019-09-26T09:58:29Z

I agree it's already quite complex and this PR won't help with that
If you use this module to provision worker nodes, you can't (afaik) guarantee the worker nodes come up with the correct CNI configuration unless you terminate all the worker nodes after the ENI config is applied as described above, which is why I feel this module is the only sensible place this can be handled
Not too sure how those tools can help in this case - the ENI config can be applied by them but it's the ordering of the application of the config before any pods are scheduled that they cannot solve. IMO it's the implementation of the AWS CNI plugin that is at fault as its configuration converges to a different place depending on when the ENI config is applied, but not sure if that is going to change any time soon
This only concerns the AWS CNI plugin

I feel this is different to what's discussed in #99 as it's not functionality you could add as a wrapper as it requires the module to know about the ordering requirements - our plan is to maintain a fork with this ability, definitely understand the desire to keep complexity at a minimum here.

max-rocket-internet · 2019-09-26T15:20:06Z

I feel this is different to what's discussed in #99

I don't just mean discussion in that issue, I also mean in all the referenced issues.

This only concerns the AWS CNI plugin

I know but look at all the other requests we get 🙂

Not too sure how those tools can help in this case

aws/amazon-vpc-cni-k8s#336 😅

If you use this module to provision worker nodes, you can't (afaik) guarantee the worker nodes come up with the correct CNI configuration

What about if a dependency is created like this: cluster > external thing > all worker groups where external thing is something outside of the module that runs and sets up the CNI after cluster is created?

Maybe we could make that dependency in an a nice and elegant way that would enable you to run some kubectl patch xxx commands to edit the CNI daemonset? It could also be super handy for other things.

I think what we don't want is to tie this module in any way to release process of the CNI or any other unrelated things. For example getting PRs every time there's a new release or setting in the CNI.

devonkinghorn · 2019-09-27T19:03:15Z

I would really like this feature as well, but maybe it would be better to pressure amazon to allow cni variables to be defined at cluster creation time instead of after cluster creation. It seems hacky for aws to suggest patching then terminating instances in their eks workshop

Then it would be a simple update.

max-rocket-internet · 2019-11-20T12:26:05Z

Closing. You can use the helm chart: https://github.com/aws/eks-charts/tree/master/stable/aws-vpc-cni

Or do something outside of this module.

github-actions · 2022-11-29T02:18:16Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

max-rocket-internet marked this as a duplicate and then as not a duplicate of #316 Sep 25, 2019

max-rocket-internet closed this as completed Nov 20, 2019

js-timbirkett mentioned this issue Mar 10, 2020

Newly refactored module #774

Closed

chiayuan mentioned this issue Nov 4, 2021

Add pre-built provider for Helm hashicorp/terraform-cdk#1251

Closed

github-actions bot locked as resolved and limited conversation to collaborators Nov 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support CNI Custom Networking #519

Support CNI Custom Networking #519

dippynark commented Sep 19, 2019 •

edited

Loading

max-rocket-internet commented Sep 25, 2019 •

edited

Loading

max-rocket-internet commented Sep 25, 2019

dippynark commented Sep 26, 2019

max-rocket-internet commented Sep 26, 2019

devonkinghorn commented Sep 27, 2019

max-rocket-internet commented Nov 20, 2019

github-actions bot commented Nov 29, 2022

Support CNI Custom Networking #519

Support CNI Custom Networking #519

Comments

dippynark commented Sep 19, 2019 • edited Loading

I'm submitting a...

What is the current behavior?

What's the expected behavior?

Are you able to fix this problem and submit a PR? Link here if you have already.

Any other relevant info

max-rocket-internet commented Sep 25, 2019 • edited Loading

max-rocket-internet commented Sep 25, 2019

dippynark commented Sep 26, 2019

max-rocket-internet commented Sep 26, 2019

devonkinghorn commented Sep 27, 2019

max-rocket-internet commented Nov 20, 2019

github-actions bot commented Nov 29, 2022

dippynark commented Sep 19, 2019 •

edited

Loading

max-rocket-internet commented Sep 25, 2019 •

edited

Loading