Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deployment: Implicitly generate the worker ConfigMap name #640

Merged
merged 1 commit into from
Nov 12, 2021
Merged

deployment: Implicitly generate the worker ConfigMap name #640

merged 1 commit into from
Nov 12, 2021

Conversation

eliaskoromilas
Copy link
Contributor

fixes #627 (comment)

  1. "node-feature-discovery.fullname" is used to prefix the worker ConfigMap name
  2. worker.configmapName Helm parameter is dropped
  3. This change is documented.

Comments:

Since the template filename is still nfd-worker-conf.yaml, it makes sense to have the name suffix -worker-conf.

Generally, we should consider renaming the template file, to e.g. configmap.yaml, and drop that resource name suffix. This would make that ConfigMap resource the single source of information for any (future) nfd pod configuration (worker, master, etc.). Should I proceed with this here?

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Nov 3, 2021
@k8s-ci-robot
Copy link
Contributor

Hi @eliaskoromilas. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Nov 3, 2021
@k8s-ci-robot k8s-ci-robot requested review from kad and zvonkok November 3, 2021 09:51
@eliaskoromilas
Copy link
Contributor Author

/assign @marquiz

@marquiz
Copy link
Contributor

marquiz commented Nov 3, 2021

Thanks @eliaskoromilas for this

Generally, we should consider renaming the template file, to e.g. configmap.yaml, and drop that resource name suffix. This would make that ConfigMap resource the single source of information for any (future) nfd pod configuration (worker, master, etc.). Should I proceed with this here?

I think this sounds reasonable, I don't oppose 😄 Maybe in a separate PR, though. What I'm just thinking that we should change the kustomize-based deployment at the same time (to keep these in sync)

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 3, 2021
@eliaskoromilas
Copy link
Contributor Author

eliaskoromilas commented Nov 3, 2021

What I'm just thinking that we should change the kustomize-based deployment at the same time (to keep these in sync)

The problem is that in kustomize deployments everything is almost statically defined... and then patched. So everything is pretty much allowed.

The default overlay uses the worker-config component and that is why the documentation states that users may edit configmap/nfd-worker-conf. Of course, in a kustom (not the default) deployment this may be incorrect.

I think that the kustomize deployment reflects the project's initial intention to serve as a K8s extension that needs to be deployed just once in a cluster. Helm changes that and transforms NFD to a K8s application that can be deployed as many times (even in the same namespace), as different NFD instances, each (probably) serving different purposes.

Having both kustomize and Helm as deployment options, makes the projects intentions unclear. For example, /etc/kubernetes/node-feature-discovery/* volume mounts are not much aligned with the multi-instance deployments that Helm offers. My opinion is that dropping kustomize support would allow the project to evolve and better design decisions to be made.

Let me know if you want me to elaborate more with examples.

@marquiz
Copy link
Contributor

marquiz commented Nov 10, 2021

I think that the kustomize deployment reflects the project's initial intention to serve as a K8s extension that needs to be deployed just once in a cluster.

IMO this is still the intention and objective. Nodes are "cluster scoped", so are their features/configuration and so should NFD be. Multiple parallel NFD deployments should be a rare exception. There are use cases that currently cannot be nicely covered with one NFD deployment.

Helm changes that and transforms NFD to a K8s application that can be deployed as many times (even in the same namespace), as different NFD instances, each (probably) serving different purposes.

Trying to move away from this e.g. with #468

For example, /etc/kubernetes/node-feature-discovery/* volume mounts are not much aligned with the multi-instance deployments that Helm offers.

Hmm, that is true 🧐 Maybe there should be an option to disable these mounts or use /etc/kubernetes/node-feature-discovery-{{ .Values.instance }}/ if instance has been specified?

My opinion is that dropping kustomize support would allow the project to evolve and better design decisions to be made.

I wouldn't drop it. Makes it easy to deploy without any other tools than kubectl. The selection of deployment option(s)/method(s) really should not restrict the evolution of the project, dictate technical decisions or hold us back. We can loosen the coupling between kustomize and helm deployments if it causes pain.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: eliaskoromilas, marquiz

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 10, 2021
@eliaskoromilas
Copy link
Contributor Author

eliaskoromilas commented Nov 10, 2021

IMO this is still the intention and objective. Nodes are "cluster scoped", so are their features/configuration and so should NFD be.

I realize now that my problem is not the kustomize deployment but the Helm one. Helm doesn't "support" cluster-scoped charts, neither cluster-scoped chart dependencies.

This means that Helm apps, like the NVIDIA GPU Operator or the InAccel FPGA Operator, need to deploy their own NFD instances (with custom configuration) to make sure they will get the nodes labelled.

I didn't mean to question the project's intentions, I got confused by the available Helm options parameters that do not imply cluster-scoped usage.

Trying to move away from this e.g. with #468

But still such chart packages, that want to expose custom labels, can't depend on something they can't add as a dependency.

I wouldn't drop it. Makes it easy to deploy without any other tools than kubectl. The selection of deployment option(s)/method(s) really should not restrict the evolution of the project, dictate technical decisions or hold us back.

I agree. I just wasn't sure about project's objective. I believe though that offering strict deployment options (usually) helps users and not the opposite.

We can loosen the coupling between kustomize and helm deployments if it causes pain.

I believe this is inevitable. For example, many of the different overlays (default, default-combined, etc) that the kustomize deployment supports are not (yet ?) modeled in the Helm chart.

@marquiz
Copy link
Contributor

marquiz commented Nov 12, 2021

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 12, 2021
@k8s-ci-robot k8s-ci-robot merged commit b26a12c into kubernetes-sigs:master Nov 12, 2021
@marquiz
Copy link
Contributor

marquiz commented Nov 12, 2021

I realize now that my problem is not the kustomize deployment but the Helm one. Helm doesn't "support" cluster-scoped charts, neither cluster-scoped chart dependencies.

This means that Helm apps, like the NVIDIA GPU Operator or the InAccel FPGA Operator, need to deploy their own NFD instances (with custom configuration) to make sure they will get the nodes labelled.

Currently, this is a problem

I didn't mean to question the project's intentions, I got confused by the available Helm options parameters that do not imply cluster-scoped usage.

No worries, I appreciate that somebody asks this questions ☺️ As it makes me really think about the scope/direction, too

Trying to move away from this e.g. with #468

But still such chart packages, that want to expose custom labels, can't depend on something they can't add as a dependency.

Could they depend on NFD operator? Could the operators deploy cluster-scoped CRs?

#468 is probably not a full solution, yet. But hopefully a big piece of it.

I wouldn't drop it. Makes it easy to deploy without any other tools than kubectl. The selection of deployment option(s)/method(s) really should not restrict the evolution of the project, dictate technical decisions or hold us back.

I agree. I just wasn't sure about project's objective. I believe though that offering strict deployment options (usually) helps users and not the opposite.

By "strict", do you mean little configurability or just one support deployment method (e.g. kustomize vs. helm vs. operator)?

We can loosen the coupling between kustomize and helm deployments if it causes pain.

I believe this is inevitable. For example, many of the different overlays (default, default-combined, etc) that the kustomize deployment supports are not (yet ?) modeled in the Helm chart.

That's true. And I don't think we should model them there. I kind of dislike the multitude of overlays we currently have, but decided to keep them for backwards compatibility when switching over to kustomize. Maybe they should be moved to samples/ 🤔

@eliaskoromilas
Copy link
Contributor Author

eliaskoromilas commented Nov 15, 2021

Could they depend on NFD operator? Could the operators deploy cluster-scoped CRs?

#468 is probably not a full solution, yet. But hopefully a big piece of it.

That's true, OLM operators are meant for that. OLM supports dependency resolution, which means that other OLM operators may add specific versions of the CRDs, that NFD owns, (or even the whole NFD operator package) as a (cluster-scoped) dependency.

Helm, on the other hand, doesn't offer great support for CRDs and lifecycle management becomes too complicated.

By "strict", do you mean little configurability or just one support deployment method (e.g. kustomize vs. helm vs. operator)?

I mean that we should probably accept the different deployment options for what they are. For example:

  • kustomize as the simplest "non-packaged" deployment method, aligned with the "deploy once" principle of the project, with sane defaults, but manual configuration and upgrades.
  • OLM operator (maybe) as the preferred way to deploy NFD cluster-scoped resources. Configuration is done using NFD CRDs, that other operators may add as a dependency (which is a very important feature).
  • Helm chart as a nice declarative way to deploy/revise NFD on any cluster, with easy configuration management and upgrades, but (currently) not ideal for other Helm apps to add as a dependency (since we may end up with the same NFD functionality deployed more than once in the same cluster).

Having said that, it seems that Helm is the NFD deployment option with the less clear purpose.

  • Helm can be used to deploy cluster-scoped resources (like NFD does), there isn't a real restriction and many charts do it. The problems usually appear when somebody deploys such charts twice.
  • Other Helm apps may also want to add NFD as a dependency. Actually, custom/local NFD sources look like they were designed for that reason. What needs to be done here is to make sure that there is a configuration for adding NFD as a Helm chart dependency in a way that it won't overlap with other NFD Helm chart dependencies and/or other NFD instances that may be present in the cluster (e.g. the volume mounts that configure the local source is an example of such an overlap, and this PR is another).

What I propose is not to drop one method in favor of another, but to offer a clear strategy for all the available deployment methods.

@marquiz
Copy link
Contributor

marquiz commented Nov 16, 2021

Really good discussion. And also insightful for me 🙄 For example, there are aspects that I haven't even thought about much earlier, like dependencies and cluster-scoped resources with Helm.

mean that we should probably accept the different deployment options for what they are.

I agree.

  • kustomize as the simplest "non-packaged" deployment method, aligned with the "deploy once" principle of the project, with sane defaults, but manual configuration and upgrades.
  • OLM operator (maybe) as the preferred way to deploy NFD cluster-scoped resources. Configuration is done using NFD CRDs, that other operators may add as a dependency (which is a very important feature).
  • Helm chart as a nice declarative way to deploy/revise NFD on any cluster, with easy configuration management and upgrades, but (currently) not ideal for other Helm apps to add as a dependency (since we may end up with the same NFD functionality deployed more than once in the same cluster).

Well put. We should prolly try to incorporate that^ somehow even somewhere in the documentation 😊

Having said that, it seems that Helm is the NFD deployment option with the less clear purpose.

I think the original motivator for adding a Helm chart was to offer an "official" alternative for the multitude of downstream NFD Helm charts (and try to convince downstream to use ours).

  • Helm can be used to deploy cluster-scoped resources (like NFD does), there isn't a real restriction and many charts do it. The problems usually appear when somebody deploys such charts twice.

Regarding #468, we should be a bit safer as we would be only deploying the CRD (definintion), not any CRs (objects). But the nfd-topology-updater can and will cause problems if deployed multiple times in parallel. However, it's disabled by default.

  • Other Helm apps may also want to add NFD as a dependency. Actually, custom/local NFD sources look like they were designed for that reason. What needs to be done here is to make sure that there is a configuration for adding NFD as a Helm chart dependency in a way that it won't overlap with other NFD Helm chart dependencies and/or other NFD instances that may be present in the cluster (e.g. the volume mounts that configure the local source is an example of such an overlap, and this PR is another).

Yes, this needs to be considered. With parallel deployments one the -instance flag should be used. Maybe the mounts could be prefixed too, i.e. /etc/kubernetes/node-feature-discovery-$INSTANCE/? Created #650 to track this.

What I propose is not to drop one method in favor of another, but to offer a clear strategy for all the available deployment methods.

👍 Created #651 to keep this in mind

@marquiz marquiz mentioned this pull request Dec 22, 2021
22 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants