Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for extended resource definition in GCE MIG template #62

Open
wants to merge 460 commits into
base: datadog-master-8.0
Choose a base branch
from

Conversation

zaymat
Copy link

@zaymat zaymat commented Oct 6, 2022

Signed-off-by: Mayeul Blanzat [email protected]

Which component this PR applies to?

cluster-autoscaler

What type of PR is this?

/kind feature

What this PR does / why we need it:

This PR adds support for extended resources in GCE MIG template.

Today, the cluster-autoscaler on GCE only supports scaling decisions based the following resources: CPU, Memory, EphemeralStorage and GPU. However, Kubernetes allows defining an arbitrary number of resources through the Extended Resource API. This can be useful when your instances have special resources that you want to share between your pods. One example that I can think of is network bandwidth, which cannot be requested by pods except through extended resources.
Some other implementations of the CloudProvider interface, like AWS or Azure, already support scaling decisions based on extended resources.

This PR adds the possibility to define extended resources for a node group on GCE, so that the cluster-autoscaler can account for them when taking scaling decisions. This is done through the extended_resources key inside the AUTOSCALER_ENV_VARS variable set on a MIG template.

Example:

AUTOSCALER_ENV_VARS: kube_reserved=<...>;<...>;extended_resources=foo=10,bar=1M,foobar=2G

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


k8s-ci-robot and others added 30 commits July 5, 2022 03:04
…e-fix

GCE: Always add boot disk annotations to templates
feat: use non-root user for base-image
Adds a new flag `--balance-label` which allows users to balance between
node groups exclusively via labels.

This gives users the flexibility to specify the similarity logic
themselves when --balance-similar-node-groups is in use.
…le-version

update cloud-provider-azure version for azure imports
…on_Doc

Deduplicate Migration Doc from README.
…rom cloud provider that are still registered within Kubernetes"
IsCustomMachine didn't take machine types with family prefix
(e.g. n2-custom-2-2816) into account.
Revert "Adding support for identifying nodes that have been deleted from cloud provider that are still registered within Kubernetes"
CA: GCE: implement GetMachineFamily, fix IsCustomMachine
This ensured that access to replicas during scale down operations were never stale by accessing the API server kubernetes#3104.
This honoured that behaviour while moving to unstructured client kubernetes#3312.
This regressed that behaviour while trying to reduce the API server load kubernetes#4443.
This put back the never stale replicas behaviour at the cost of loading back the API server kubernetes#4634.

Currently on e.g a 48 minutes cluster it does 1.4k get request to the scale subresource.
This PR tries to satisfy both non stale replicas during scale down and prevent the API server from being overloaded. To achieve that it lets targetSize which is called on every autoscaling cluster state loop from come from cache.

Also note that the scale down implementation has changed https://github.com/kubernetes/autoscaler/commits/master/cluster-autoscaler/core/scaledown.
…-updater-flags

chore: Document params for VPA recommender & updater (similar to CA's FAQs)
…as-exoscale-documentation

exoscale provider: Update cluster autoscaler documentation
elmiko and others added 21 commits September 29, 2022 14:22
this change removes some unused values and adjusts the names in the unit
tests to better reflect usage.
cleanup unused constants in clusterapi provider
…mple-spec

Update the example spec of civo cloudprovider
Updated the golang version for the GitHub workflows.
Containers in recommendation can be different from recommendations in pod:

- A new container can be added to a pod. At first there will be no
  recommendation for the container
- A container can be removed from pod. For some time recommendation will contain
  recommendation for the old container
- Container can be renamed. Then there will be recommendation for container
  under its old name.

Add tests for what VPA does in those situations.
Containers in recommendation can be different from recommendations in pod:

- A new container can be added to a pod. At first there will be no
  recommendation for the container
- A container can be removed from pod. For some time recommendation will contain
  recommendation for the old container
- Container can be renamed. Then there will be recommendation for container
  under its old name.

Add tests for what VPA does in those situations, when limit range exists.
add example for multiple recommenders
…pod-recommendation-mismatch

E2e test admission pod recommendation mismatch
[gce]: skip instances on validation error
CA - AWS - Instance List Update 2022-09-16
…ls-insecure

magnum: add an option to create insecure TLS connections
…y_and_Preemption_links

Corrected the links for Priority in k8s API and Pod Preemption in k8s.
@zaymat zaymat force-pushed the mayeul/add-extended-resource-support-in-gce branch 2 times, most recently from 8ec94a5 to 17f8faf Compare October 10, 2022 15:09
@zaymat zaymat force-pushed the mayeul/add-extended-resource-support-in-gce branch from 17f8faf to 15bb504 Compare October 11, 2022 10:03
This commit adds the possibility to define extended resources for a node group on GCE,
so that the cluster-autoscaler can account for them when taking scaling decisions.

This is done through the `extended_resources` key inside the AUTOSCALER_ENV_VARS variable set on a MIG template.

Signed-off-by: Mayeul Blanzat <[email protected]>
@zaymat zaymat force-pushed the mayeul/add-extended-resource-support-in-gce branch from 15bb504 to e286a95 Compare October 11, 2022 12:34
…add more tests

* Malformed extended resource definition should not fail the template building function. Instead, log the error and ignore extended resources
* Remove useless existence check
* Add tests around the extractExtendedResourcesFromKubeEnv function
* Add a test case to verify that malformed extended resource definition does not fail the template build function

Signed-off-by: Mayeul Blanzat <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.