[cluster-autoscaler][clusterapi] Remove internal types in favor of unstructured #3312

detiber · 2020-07-10T16:34:25Z

Currently we define internal types for use with Cluster API and are serializing/deserializing to/from these internal types as we interact with Cluster API resources. I believe this causes confusion since the internal types do not necessarily represent the external Cluster API types that we are interacting with (that may span multiple versions). It also makes adding support for fields that may or may not exist for a subset of Cluster API versions more difficult to maintain (since we are attempting to abstract these differences away as part of the serialization/deserialization process).

This PR removes the use of internal type representations all together in favor of Unstructured directly.

detiber · 2020-07-10T16:34:49Z

/assign @elmiko @JoelSpeed

elmiko · 2020-07-10T17:48:11Z

Jason, this looks cool. i'm still reviewing and testing, but i'm amazed at how much we can remove switching to the unstructured. nicely done!

elmiko

i was trying to get this change built so i could test on a local cluster but i keep this hitting this:

$ make clean build
rm -f cluster-autoscaler
CGO_ENABLED=0 GO111MODULE=off GOOS=linux go build --ldflags "-s"  ./...
# k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/component-base/cli/flag
vendor/k8s.io/component-base/cli/flag/ciphersuites_flag.go:51:51: undefined: tls.TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256
vendor/k8s.io/component-base/cli/flag/ciphersuites_flag.go:52:51: undefined: tls.TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256
make: *** [Makefile:25: build] Error 2

i tried to clean extra files, and also looked at rebuilding the vendor dir, any advice?

elmiko · 2020-07-13T19:51:08Z

cluster-autoscaler/cloudprovider/clusterapi/clusterapi_provider.go

@@ -142,22 +146,32 @@ func BuildClusterAPI(opts config.AutoscalingOptions, do cloudprovider.NodeGroupD
 	}

 	// Grab a dynamic interface that we can create informers from
-	dc, err := dynamic.NewForConfig(externalConfig)
+	managementClient, err := dynamic.NewForConfig(externalConfig)


i thought we were separating this work into the follow up commit?

In this instance I just renamed the variable to be more descriptive of it's use, there is no separation between kubeconfigs that are used by the clients with this change.

ok, cool. i figured that was the case but wanted to be sure.

elmiko · 2020-07-13T20:16:59Z

just a follow up, i am seeing this build failure on master as well. so either i have a big problem or the repo does.

edit: i created #3315 just in case

detiber · 2020-07-13T20:57:00Z

i was trying to get this change built so i could test on a local cluster but i keep this hitting this:

$ make clean build
rm -f cluster-autoscaler
CGO_ENABLED=0 GO111MODULE=off GOOS=linux go build --ldflags "-s"  ./...
# k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/component-base/cli/flag
vendor/k8s.io/component-base/cli/flag/ciphersuites_flag.go:51:51: undefined: tls.TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256
vendor/k8s.io/component-base/cli/flag/ciphersuites_flag.go:52:51: undefined: tls.TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256
make: *** [Makefile:25: build] Error 2

i tried to clean extra files, and also looked at rebuilding the vendor dir, any advice?

Interesting, I'm not seeing any build errors locally, have you tried using the build-in-docker make target? I'm not sure if it's directly related, but I've seen similar cipher suite mismatch errors when trying to build k/k using the wrong version of Go.

elmiko · 2020-07-13T20:59:04Z

Interesting, I'm not seeing any build errors locally, have you tried using the build-in-docker make target? I'm not sure if it's directly related, but I've seen similar cipher suite mismatch errors when trying to build k/k using the wrong version of Go.

i haven't tried the build in docker yet, it was only a few commits ago that things broke. i was just trying to rebase your stuff on top of the old commit. probably easier to just build in docker lol

elmiko · 2020-07-13T21:31:49Z

ok, got my build working.

i tested this against a CAPD cluster(joined), i tried out scaling up and down with a workload and deleting the workload, also tested adding a new MachineDeployment and scaling that.

looks good so far, i didn't hit any errors with operation of the autoscaler.

/area provider/cluster-api
/lgtm

JoelSpeed

Found a couple of comments that might need an update, otherwise LGTM

cluster-autoscaler/cloudprovider/clusterapi/clusterapi_controller.go

cluster-autoscaler/cloudprovider/clusterapi/clusterapi_controller_test.go

JoelSpeed · 2020-07-17T14:43:27Z

Looks good, thanks!

/lgtm

enxebre · 2020-07-17T15:34:55Z

thanks a lot @detiber! This looks great. I'm planning to have a deeper look at this on Monday.
Since this is a big refactor and kube 1.19 should be code frozen by now and given the unfortunate lack of e2e testing we have atm for this provider I'd suggest we hold the merge until the ca main branch opens for 1.20 and proceed with any backport if we need so. Does it make sense to you?

detiber · 2020-07-17T15:39:21Z

@enxebre I would like to avoid holding off until 1.20, only because this helps enable the other functionality for running against workload clusters that exist in a separate management controller vs only against self-managed clusters. That said, I will defer to the larger community to make that decision.

Remove internal types for Cluster API and replace with unstructured access

detiber · 2020-08-05T21:30:01Z

@enxebre now that the 1.19 release has been cut, any objections to proceeding with this?

enxebre · 2020-08-06T10:00:52Z

@MaciekPytel is the main branch targeting 1.20 now?
thanks @detiber!
/approve

PTAL @JoelSpeed @elmiko

JoelSpeed · 2020-08-06T10:05:05Z

/lgtm

detiber · 2020-08-06T13:36:47Z

@MaciekPytel is the main branch targeting 1.20 now?

I'm not Maciek, but yes the main branch is targeting 1.20, the cluster-autoscaler-release-1.19 branch has already be cut along with the 1.19.0 release.

detiber · 2020-08-06T13:38:23Z

@MaciekPytel this should be good to go now, need cluster-autoscaler approval for the vendor changes

/assign @MaciekPytel

elmiko · 2020-08-06T13:41:10Z

thanks Jason!

ncdc · 2020-08-17T17:46:27Z

👋 hi, are there any updates on getting this reviewed for approval? Thanks!

elmiko · 2020-08-17T17:49:56Z

i think we are good on this from the CAPI side, just need an autoscaler owner to give approval. cc @mwielgus @MaciekPytel

ncdc · 2020-08-18T14:42:09Z

@elmiko it doesn't look like we need root OWNERS approval, just cluster-autoscaler. Do you think we should ask someone in https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/OWNERS?

elmiko · 2020-08-18T14:50:48Z

@elmiko it doesn't look like we need root OWNERS approval, just cluster-autoscaler. Do you think we should ask someone in https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/OWNERS?

++, thanks for pointing that out Andy =)

cc @aleksandra-malinowska @feiskyer @vivekbagade

feiskyer · 2020-08-19T03:24:20Z

/approve

k8s-ci-robot · 2020-08-19T03:24:47Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: enxebre, feiskyer

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~cluster-autoscaler/OWNERS~~ [feiskyer]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

[cluster-autoscaler][clusterapi] Remove internal types in favor of unstructured

This ensured that access to replicas during scale down operations were never stale by accessing the API server kubernetes#3104. This honoured that behaviour while moving to unstructured client kubernetes#3312. This regressed that behaviour while trying to reduce the API server load kubernetes#4443. This put back the never stale replicas behaviour at the cost of loading back the API server kubernetes#4634. This PR tries to satisfy both non stale replicas during scale down and prevent the API server from being overloaded. To achieve that it lets targetSize which is called on every autoscaling cluster state loop from come from cache. Also note that the scale down implementation has changed https://github.com/kubernetes/autoscaler/commits/master/cluster-autoscaler/core/scaledown.

This ensured that access to replicas during scale down operations were never stale by accessing the API server kubernetes#3104. This honoured that behaviour while moving to unstructured client kubernetes#3312. This regressed that behaviour while trying to reduce the API server load kubernetes#4443. This put back the never stale replicas behaviour at the cost of loading back the API server kubernetes#4634. Currently on e.g a 48 minutes cluster it does 1.4k get request to the scale subresource. This PR tries to satisfy both non stale replicas during scale down and prevent the API server from being overloaded. To achieve that it lets targetSize which is called on every autoscaling cluster state loop from come from cache. Also note that the scale down implementation has changed https://github.com/kubernetes/autoscaler/commits/master/cluster-autoscaler/core/scaledown.

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jul 10, 2020

k8s-ci-robot assigned elmiko and JoelSpeed Jul 10, 2020

k8s-ci-robot requested review from aleksandra-malinowska and enxebre July 10, 2020 16:34

detiber mentioned this pull request Jul 13, 2020

[cluster-autoscaler][clusterapi] Add support for node autodiscovery to clusterapi provider #3314

Merged

elmiko reviewed Jul 13, 2020

View reviewed changes

k8s-ci-robot added area/provider/cluster-api Issues or PRs related to Cluster API provider lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Jul 13, 2020

detiber mentioned this pull request Jul 14, 2020

[cluster-autoscaler] Support using --cloud-config for clusterapi provider #3203

Merged

JoelSpeed reviewed Jul 17, 2020

View reviewed changes

cluster-autoscaler/cloudprovider/clusterapi/clusterapi_controller.go Outdated Show resolved Hide resolved

cluster-autoscaler/cloudprovider/clusterapi/clusterapi_controller_test.go Outdated Show resolved Hide resolved

JoelSpeed mentioned this pull request Jul 17, 2020

Add DeepCopy generation for runtime.Objects within ClusterAPI provider #3293

Closed

detiber force-pushed the unstructured branch from 9b96e0d to cc997cf Compare July 17, 2020 14:40

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 17, 2020

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 17, 2020

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 20, 2020

Update vendor to pull in necessary new paths for client-go

312891f

detiber force-pushed the unstructured branch from cc997cf to a21e2dc Compare July 20, 2020 18:04

k8s-ci-robot removed lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Jul 20, 2020

Convert clusterapi provider to use unstructured

18d44fc

Remove internal types for Cluster API and replace with unstructured access

detiber force-pushed the unstructured branch from a21e2dc to 18d44fc Compare July 21, 2020 19:50

detiber mentioned this pull request Jul 28, 2020

Use scale client to Patch the scale subresource when setting the size for a scalableResource #2941

Open

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 6, 2020

k8s-ci-robot assigned MaciekPytel Aug 6, 2020

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 19, 2020

k8s-ci-robot merged commit 5159dae into kubernetes:master Aug 19, 2020

benmoss mentioned this pull request Aug 25, 2020

CAPI: Automation for deepcopy and audit internal types #3011

Closed

benmoss pushed a commit to benmoss/autoscaler that referenced this pull request Sep 28, 2020

Merge pull request kubernetes#3312 from detiber/unstructured

4eb203a

[cluster-autoscaler][clusterapi] Remove internal types in favor of unstructured

elmiko mentioned this pull request Oct 14, 2020

Add upstream patch to remove internal types in favor of unstructured openshift/kubernetes-autoscaler#177

Merged

enxebre mentioned this pull request Jul 13, 2022

Get capi targetsize from cache #5025

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[cluster-autoscaler][clusterapi] Remove internal types in favor of unstructured #3312

[cluster-autoscaler][clusterapi] Remove internal types in favor of unstructured #3312

detiber commented Jul 10, 2020

detiber commented Jul 10, 2020

elmiko commented Jul 10, 2020

elmiko left a comment

elmiko Jul 13, 2020

detiber Jul 13, 2020

elmiko Jul 13, 2020

elmiko commented Jul 13, 2020 •

edited

Loading

detiber commented Jul 13, 2020

elmiko commented Jul 13, 2020

elmiko commented Jul 13, 2020 •

edited

Loading

JoelSpeed left a comment

JoelSpeed commented Jul 17, 2020

enxebre commented Jul 17, 2020 •

edited

Loading

detiber commented Jul 17, 2020

detiber commented Aug 5, 2020

enxebre commented Aug 6, 2020 •

edited

Loading

JoelSpeed commented Aug 6, 2020

detiber commented Aug 6, 2020 •

edited

Loading

detiber commented Aug 6, 2020

elmiko commented Aug 6, 2020

ncdc commented Aug 17, 2020

elmiko commented Aug 17, 2020

ncdc commented Aug 18, 2020

elmiko commented Aug 18, 2020

feiskyer commented Aug 19, 2020

k8s-ci-robot commented Aug 19, 2020

[cluster-autoscaler][clusterapi] Remove internal types in favor of unstructured #3312

[cluster-autoscaler][clusterapi] Remove internal types in favor of unstructured #3312

Conversation

detiber commented Jul 10, 2020

detiber commented Jul 10, 2020

elmiko commented Jul 10, 2020

elmiko left a comment

Choose a reason for hiding this comment

elmiko Jul 13, 2020

Choose a reason for hiding this comment

detiber Jul 13, 2020

Choose a reason for hiding this comment

elmiko Jul 13, 2020

Choose a reason for hiding this comment

elmiko commented Jul 13, 2020 • edited Loading

detiber commented Jul 13, 2020

elmiko commented Jul 13, 2020

elmiko commented Jul 13, 2020 • edited Loading

JoelSpeed left a comment

Choose a reason for hiding this comment

JoelSpeed commented Jul 17, 2020

enxebre commented Jul 17, 2020 • edited Loading

detiber commented Jul 17, 2020

detiber commented Aug 5, 2020

enxebre commented Aug 6, 2020 • edited Loading

JoelSpeed commented Aug 6, 2020

detiber commented Aug 6, 2020 • edited Loading

detiber commented Aug 6, 2020

elmiko commented Aug 6, 2020

ncdc commented Aug 17, 2020

elmiko commented Aug 17, 2020

ncdc commented Aug 18, 2020

elmiko commented Aug 18, 2020

feiskyer commented Aug 19, 2020

k8s-ci-robot commented Aug 19, 2020

elmiko commented Jul 13, 2020 •

edited

Loading

elmiko commented Jul 13, 2020 •

edited

Loading

enxebre commented Jul 17, 2020 •

edited

Loading

enxebre commented Aug 6, 2020 •

edited

Loading

detiber commented Aug 6, 2020 •

edited

Loading