Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Support setting maxHealthyPercentage to configure ASG instance refresh #5140

Merged

Conversation

fiunchinho
Copy link
Contributor

@fiunchinho fiunchinho commented Oct 8, 2024

What type of PR is this?

/kind feature

What this PR does / why we need it:

It allows to configure the MaxHealthyPercentage setting of the ASGs instance refresh, to better control how instances are added and terminated.

Special notes for your reviewer:
This is adding a new field to the AWSMachinePool CRD, and it's not a breaking change.

Checklist:

  • squashed commits
  • includes documentation
  • includes emojis
  • adds unit tests
  • adds or updates e2e tests

Release note:

Support setting maxHealthyPercentage to configure ASG instance refresh

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Oct 8, 2024
@k8s-ci-robot k8s-ci-robot added needs-priority size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Oct 8, 2024
@fiunchinho fiunchinho force-pushed the support-maxhealthypercentage branch from ee5f9ee to 5022162 Compare October 8, 2024 09:26
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Oct 8, 2024
@fiunchinho fiunchinho force-pushed the support-maxhealthypercentage branch from 5022162 to a649112 Compare October 8, 2024 09:35
@fiunchinho fiunchinho force-pushed the support-maxhealthypercentage branch from a649112 to 7ef2f81 Compare October 8, 2024 10:20
@@ -171,7 +171,20 @@ type RefreshPreferences struct {
// The amount of capacity as a percentage in ASG that must remain healthy
// during an instance refresh. The default is 90.
// +optional
// +kubebuilder:validation:Minimum=0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding new validations to an existing field may break some users.

Just to confirm, if a user is specifying 101 as the value for this prior to this change would AWS be returning an error? (i assume so but want to check).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't tested it, but I'd assume so, according to AWS docs. I could leave this out of this PR if you prefer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be safer...unless we explicitly test it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just removed it, can you take a look again?

return allErrs
}

if r.Spec.RefreshPreferences.MaxHealthyPercentage != nil && r.Spec.RefreshPreferences.MinHealthyPercentage == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also consider using CEL instead of new validation webhook checks.

However, we don't have any CEL at present so its fine to be consistent.

@fiunchinho fiunchinho force-pushed the support-maxhealthypercentage branch from 7ef2f81 to e0b119c Compare October 8, 2024 15:01
@fiunchinho fiunchinho requested a review from richardcase October 8, 2024 15:01
@richardcase
Copy link
Member

Until the e2e pass:

/hold

But from my side:

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 8, 2024
@richardcase
Copy link
Member

/test ?

@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Oct 8, 2024
@k8s-ci-robot
Copy link
Contributor

@richardcase: The following commands are available to trigger required jobs:

  • /test pull-cluster-api-provider-aws-build
  • /test pull-cluster-api-provider-aws-build-docker
  • /test pull-cluster-api-provider-aws-test
  • /test pull-cluster-api-provider-aws-verify

The following commands are available to trigger optional jobs:

  • /test pull-cluster-api-provider-aws-apidiff-main
  • /test pull-cluster-api-provider-aws-e2e
  • /test pull-cluster-api-provider-aws-e2e-blocking
  • /test pull-cluster-api-provider-aws-e2e-clusterclass
  • /test pull-cluster-api-provider-aws-e2e-conformance
  • /test pull-cluster-api-provider-aws-e2e-conformance-with-ci-artifacts
  • /test pull-cluster-api-provider-aws-e2e-eks
  • /test pull-cluster-api-provider-aws-e2e-eks-gc
  • /test pull-cluster-api-provider-aws-e2e-eks-testing

Use /test all to run the following jobs that were automatically triggered:

  • pull-cluster-api-provider-aws-apidiff-main
  • pull-cluster-api-provider-aws-build
  • pull-cluster-api-provider-aws-build-docker
  • pull-cluster-api-provider-aws-test
  • pull-cluster-api-provider-aws-verify

In response to this:

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@richardcase
Copy link
Member

/test pull-cluster-api-provider-aws-e2e
/test pull-cluster-api-provider-aws-e2e-eks

@fiunchinho
Copy link
Contributor Author

/test pull-cluster-api-provider-aws-e2e

@richardcase
Copy link
Member

The e2e is failing because the AMIs are no longer available 😢 We have 2 options:

  1. Wait for the new AMIs to be published and then rebase this PR. This may take a few days to 1 week
  2. Merge without the e2e passing.

Personally, i'd lean towards 2). But would be good to get others view on this? @nrb @dlipovetsky @AndiDog any thoughts?

@damdo
Copy link
Member

damdo commented Oct 10, 2024

/test pull-cluster-api-provider-aws-e2e

@fiunchinho
Copy link
Contributor Author

I don't think this PR will break the e2e tests since it's adding a new field. Having said that, waiting 1 week for this to get merged is fine by me.

Copy link
Contributor

@AndiDog AndiDog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/test pull-cluster-api-provider-aws-e2e

No need to block this PR, IMO, since it's scoped and a very low-risk change.

@fiunchinho fiunchinho force-pushed the support-maxhealthypercentage branch from e0b119c to 9a7ec73 Compare October 14, 2024 09:17
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 14, 2024
@fiunchinho
Copy link
Contributor Author

/test pull-cluster-api-provider-aws-e2e

@k8s-ci-robot
Copy link
Contributor

@fiunchinho: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cluster-api-provider-aws-e2e 9a7ec73 link false /test pull-cluster-api-provider-aws-e2e

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@nrb
Copy link
Contributor

nrb commented Oct 14, 2024

We have two other approvals, and I agree that we should not gate this on fixing the AMIs.

/lgtm
/approve
/unhold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 14, 2024
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 14, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: nrb, richardcase

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 69aaac9 into kubernetes-sigs:main Oct 14, 2024
18 of 19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants