-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 Include machines in deleting status when calculate machineset replicas #3434
🐛 Include machines in deleting status when calculate machineset replicas #3434
Conversation
Hi @jzhoucliqr. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
controllers/mdutil/util.go
Outdated
@@ -521,7 +536,7 @@ func NewMSNewReplicas(deployment *clusterv1.MachineDeployment, allMSs []*cluster | |||
return 0, err | |||
} | |||
// Find the total number of machines | |||
currentMachineCount := GetReplicaCountForMachineSets(allMSs) | |||
currentMachineCount := GetTotalReplicaCountForMachineSets(allMSs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can assume that a MachineSet's status.replicas includes deleting or not yet ready Machines, then it seems like it should be okay to use GetActualReplicaCountForMachineSets rather than needing to take the max here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The max is for the case that, for two machinesets, one spec=0,status=2, the other spec=1,status=0. Then at this moment total count should be 3 instead of 2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha, I'm wondering if we need to avoid increasing the spec for the new machineSet until things have finished ensuring that the old machineSet has finished stabilizing if increasing would exceed the max surge... I guess I'm wondering if this is the correct place to try to do that enforcement, especially since using the max of spec.replicas and status.replicas seems to be more of an approximation of trying to do the right thing vs giving assurance that we are doing the right thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to understand this. Do you mean we could allow exceed max surge here in MS.spec, and do enforce within MS controller that won't actually create the Machine object if it already exceed surge? If this understanding is correct, I'm not sure how MS controller will be able to do that, because maxSurge and total number of replicas only available in MD.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Jason was suggesting if we need a check in the webhook validator to ensure that the status is up to date before allowing any type of resize. Probably worth a different issue though
/ok-to-test |
/test pull-cluster-api-verify |
be3034d
to
adcf5fa
Compare
/retest |
adcf5fa
to
276cd4c
Compare
/retest |
controllers/mdutil/util.go
Outdated
// Use max(spec.Replicas,status.Replicas) to cover the cases that: | ||
// 1. Scale up, where spec.Replicas increased but no machine created yet, so spec.Replicas > status.Replicas | ||
// 2. Scale down, where spec.Replicas decreased but machine not deleted yet, so spec.Replicas < status.Replicas | ||
func GetSumOfMaxOfSpecAndStatusReplicaCountForMachineSets(machineSets []*clusterv1.MachineSet) int32 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this make it better or worse ..
c899fc4
to
babff3f
Compare
LGTM pending squash |
2fcdc20
to
01a3e3c
Compare
Thanks a lot @vincepri |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
/assign @ncdc @fabriziopandini
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: vincepri The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold cancel |
/retest |
/milestone v0.3.10 |
Moving this to v0.4 /milestone v0.4.0 |
@vincepri do you think we if we can merge this in to v0.3.11 ? There is no API change, only minor behavior change. |
@jzhoucliqr Apologies missed your message, let's rebase and merge for v0.4.0 first and get another round of review. If the fix is appropriate, we should backport it to |
/assign |
01a3e3c
to
58fb498
Compare
Thanks @vincepri . Rebase done. |
58fb498
to
5da4031
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These comments are super minor nits that are non-blocking if we'd like to get this in without additional revisions, given the PR has been sitting for almost a month (so sorry about this!!!). @detiber did you get all your questions/comments answered & are you ok to proceed with this PR?
5da4031
to
214a31a
Compare
214a31a
to
bec5e12
Compare
/retest |
/lgtm |
What this PR does / why we need it:
This PR fix the issue that MachineDeployment RollingUpdate may launch machines more than (MD.Spec.Replicas + maxSurge). It include machines in deleting status when calculate MachineSet status replicas.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #3417