Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

helm: enable clusterapi namespace autodiscovery #5762

Conversation

jackfrancis
Copy link
Contributor

@jackfrancis jackfrancis commented May 17, 2023

What type of PR is this?

What this PR does / why we need it:

This PR enables usage of the clusterapi provider's namespace node autodiscovery configuration in the helm chart, by filtering for Machines by Cluster namespace (Machine and Cluster are Cluster API CRDs that represent a node and a Kubernetes cluster, respectively).

I've reworked the combinatoric helm template foo a bit so hopefully it's a bit easier to wrangle going forward (I think there's no way to avoid it being sort of horrible).

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. area/helm-charts labels May 17, 2023
@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label May 17, 2023
@jackfrancis
Copy link
Contributor Author

/assign @elmiko for a first pass to make sure I'm doing something sane here

@elmiko
Copy link
Contributor

elmiko commented May 17, 2023

thanks @jackfrancis , just starting to take a look

@jackfrancis
Copy link
Contributor Author

On the other hand maybe just passing in clusterNamespace in the auto discovery config for Cluster API doesn't work like I think it does?

On a cluster I have running right now:

--node-group-auto-discovery="[clusterapi:clusterNamespace=capz-e2e-3dyctz]"

I've got a cluster in that namespace, and its MachineDeployment is properly annotated:

$ k get clusters -n capz-e2e-3dyctz
NAME                 PHASE         AGE   VERSION
capz-e2e-3dyctz-ha   Provisioned   22m   
$ k get machinedeployment/capz-e2e-3dyctz-ha-md-0 -n capz-e2e-3dyctz -o yaml | grep 'autoscaler'
    cluster.x-k8s.io/cluster-api-autoscaler-node-group-max-size: "500"
    cluster.x-k8s.io/cluster-api-autoscaler-node-group-min-size: "1"

However, the cluster-autoscaler logs suggest that the machines in the above MachineDeployment aren't in a discovered node group:

I0517 18:24:34.147506       1 pre_filtering_processor.go:57] Node capz-e2e-3dyctz-ha-md-0-qrz8p should not be processed by cluster autoscaler (no node group config)
I0517 18:24:34.147541       1 pre_filtering_processor.go:57] Node capz-e2e-3dyctz-ha-md-0-t7qm5 should not be processed by cluster autoscaler (no node group config)

If it's not clear, I was expecting clusterNamespace to be a 1st class auto discovery configuration, like "add all clusters that match this namespace query". Is that incorrect?

Thanks!

@jackfrancis jackfrancis force-pushed the helm-chart-clusterapi-clusterNamespace branch from c81107a to 249a822 Compare May 17, 2023 19:17
@jackfrancis
Copy link
Contributor Author

Nevermind! I was using the wrong clusterapi key for --node-group-auto-discovery, should be namespace= not clusterNamespace=

@jackfrancis jackfrancis force-pushed the helm-chart-clusterapi-clusterNamespace branch from 249a822 to 87d1b3a Compare May 17, 2023 19:52
If you do not configure node group auto discovery, cluster autoscaler will attempt
to match nodes against any scalable resources found in any namespace and belonging
to any Cluster.
You must configure node group auto discovery to inform cluster autoscaler which cluster in which to find for scalable node groups.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elmiko FYI, I think this original statement is not correct. If you don't provide autoDiscovery configuration via the helm chart, you get an error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack, i have a feeling there might a bug around the autodiscovery mechanisms. it probably bears some investigation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking at this https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/clusterapi/clusterapi_controller.go#L419 it leads me to believe that if unconfigured then it should be watching everything. but perhaps the permissions for the rbac are not allowing it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't attempted to get further beyond the helm chart validation, so I haven't actually tried to get it to work using the runtime.

But because we can only pass in one kubeconfig, and that kubeconfig is only going to work with a single cluster (thinking of kubeconfig-incluster context here, so CA is installed on the mgmt cluster and managing a remote workload cluster) why would you pass in a specific workload cluster kubeconfig secret but not declare the specific workload cluster that that secret pairs with?

Maybe this scenario makes sense for a self-managed CAPI cluster? (CAPI control plane installed onto the workload cluster)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why would you pass in a specific workload cluster kubeconfig secret but not declare the specific workload cluster that that secret pairs with?

+1, seems like it should be specified explicitly

@jackfrancis jackfrancis force-pushed the helm-chart-clusterapi-clusterNamespace branch from 87d1b3a to 6e22e2b Compare May 17, 2023 19:55
@jackfrancis jackfrancis force-pushed the helm-chart-clusterapi-clusterNamespace branch from 6e22e2b to f43baf9 Compare May 23, 2023 16:01
@@ -1,4 +1,4 @@
{{- if or .Values.autoDiscovery.clusterName .Values.autoscalingGroups -}}
{{- if or ( or .Values.autoDiscovery.clusterName .Values.autoDiscovery.namespace .Values.autoDiscovery.labels ) .Values.autoscalingGroups }}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a bug, AFAICT this condition needs to be the same condition that encapsulates the main template matter in the deployment template. (See L1 in charts/cluster-autoscaler/templates/deployment.yaml below)

@@ -109,21 +109,52 @@ Return true if the priority expander is enabled
{{- end -}}

{{/*
Return the autodiscoveryparameters for clusterapi.
autoDiscovery.clusterName for clusterapi.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the gnarly diff in this file. Essentially with the introduction of namespace to the combinatorics below, it made sense to decompose this a bit. Not a non-horrible way to do this using go templates, at least I couldn't figure out one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't think I've ever seen a nice way to do this, but this is at least an easier to grok end result.

@jackfrancis
Copy link
Contributor Author

not sure why the helm docs CI is complaining (I did make changes there but I think that should be allowed)

@gjtempleton
Copy link
Member

Ah, that's something I've been meaning to update the docs on as it keeps on catching people out, the helm-docs action generates the README.md from the README.md.gotmpl, so changes to the text portions need to be made there.

@jackfrancis jackfrancis force-pushed the helm-chart-clusterapi-clusterNamespace branch from f43baf9 to 45472cd Compare May 23, 2023 23:00
@jackfrancis
Copy link
Contributor Author

@gjtempleton thanks for the pro tip, done!

@jackfrancis
Copy link
Contributor Author

@gjtempleton I wonder if this change in my PR that merged yesterday (which passed CI) is throwing a wrench in this PR?

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 13, 2023
@mwielgus
Copy link
Contributor

Please rebase the PR.

@jackfrancis jackfrancis force-pushed the helm-chart-clusterapi-clusterNamespace branch from 45472cd to bb49bdb Compare June 26, 2023 23:44
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 26, 2023
@jackfrancis jackfrancis force-pushed the helm-chart-clusterapi-clusterNamespace branch from bb49bdb to fdb0840 Compare June 26, 2023 23:53
@jackfrancis
Copy link
Contributor Author

@mwielgus rebased, thx for the nudge

@gjtempleton can't seem to figure out how to make helm-docs happy on this one...

@gjtempleton
Copy link
Member

@jackfrancis the fact the README.md hasn't been regenerated with new variable in the values file included in it is causing the failing test:

$ helm-docs charts/. && git diff --unified=0 | cat
INFO[2023-06-28T17:44:19+01:00] Found Chart directories [charts/cluster-autoscaler]
INFO[2023-06-28T17:44:19+01:00] Generating README Documentation for chart charts/cluster-autoscaler
diff --git a/charts/cluster-autoscaler/README.md b/charts/cluster-autoscaler/README.md
index cc18f7920..b054affee 100644
--- a/charts/cluster-autoscaler/README.md
+++ b/charts/cluster-autoscaler/README.md
@@ -347,0 +348 @@ vpa:
+| autoDiscovery.namespace | string | `nil` |  |

@jackfrancis jackfrancis force-pushed the helm-chart-clusterapi-clusterNamespace branch 2 times, most recently from f9c7359 to 59fa7d6 Compare June 28, 2023 18:34
@jackfrancis
Copy link
Contributor Author

@gjtempleton brew install norwoodj/tap/helm-docs && helm-docs did the trick :)

@jackfrancis jackfrancis force-pushed the helm-chart-clusterapi-clusterNamespace branch from 59fa7d6 to e154d5f Compare June 30, 2023 00:07
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 23, 2023
@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Nov 15, 2023
@elmiko
Copy link
Contributor

elmiko commented Nov 16, 2023

@jackfrancis happy to help get this merged, is a rebase and review all that's needed?

@Shubham82
Copy link
Contributor

@jackfrancis, Please rebase the PR so that it will merge.

@jackfrancis jackfrancis force-pushed the helm-chart-clusterapi-clusterNamespace branch from e154d5f to dfcf7d6 Compare February 12, 2024 17:51
@k8s-ci-robot k8s-ci-robot added area/provider/cluster-api Issues or PRs related to Cluster API provider and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Feb 12, 2024
Copy link
Member

@gjtempleton gjtempleton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nit, one query.

@@ -8,7 +8,7 @@ To verify that cluster-autoscaler has started, run:

##############################################################################
#### ERROR: You must specify values for either ####
#### autoDiscovery.clusterName or autoscalingGroups[] ####
#### autoDiscovery or autoscalingGroups[] ####
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, whitespace:

Suggested change
#### autoDiscovery or autoscalingGroups[] ####
#### autoDiscovery or autoscalingGroups[] ####

Comment on lines +43 to -50
Return labels, including instance and name.
*/}}
{{- define "cluster-autoscaler.labels" -}}
{{ include "cluster-autoscaler.instance-name" . }}
app.kubernetes.io/managed-by: {{ .Release.Service | quote }}
{{- if .Chart.AppVersion }}
app.kubernetes.io/version: {{ .Chart.AppVersion | quote }}
{{- end }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we meaning to dump this or is this a rebase artefact?

@@ -109,21 +109,52 @@ Return true if the priority expander is enabled
{{- end -}}

{{/*
Return the autodiscoveryparameters for clusterapi.
autoDiscovery.clusterName for clusterapi.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't think I've ever seen a nice way to do this, but this is at least an easier to grok end result.

Copy link
Contributor

@mwielgus mwielgus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 18, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jackfrancis, mwielgus

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 18, 2024
@k8s-ci-robot k8s-ci-robot merged commit 616cfb6 into kubernetes:master Mar 18, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/cluster-autoscaler area/helm-charts area/provider/cluster-api Issues or PRs related to Cluster API provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants