-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 Consistent ordering for deletion priority #6300
🐛 Consistent ordering for deletion priority #6300
Conversation
|
Welcome @thedadams! |
Hi @thedadams. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
aad2560
to
7dae429
Compare
Thanks! Change seems reasonable to me.
I'm curious have you seen this actually happening? i.e two different machines being returned in different calls in case of a tie? Change seems reasonable to me but regardless I'm assuming there's already a hidden determinism preventing this from happening? /ok-to-test |
Yes, we have seen the behavior I described in the description: scale a deployment up by two, as machines are provisioning scale down by one, because missing node ref and deleting machines have the same priority (we use delete oldest by default) then both provisioning machines get deleted and another machine gets provisioned once they are gone. |
/lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/lgtm |
@fabriziopandini @vincepri Is there anything I should be doing for this PR to move forward? |
I wonder a bit if we can fix this more holistically by treating machines already in deleting differently. But as this PR already fixes a known bug:
@thedadams Everything fine, just takes a bit (PTO) :) |
I'd be cautious about that since there're many nuances where we could impact by accident changing existing API behaviour unexpectedly. I agree this is a right bugfix with a reduced impact surface and further discussion can happen separately. |
Yeah me too. I looked at the surrounding code and that's why I definitely prefer "just" a bugfix for now. so tl;dr I definitely prefer just a bugfix for now. |
return m.priority(m.machines[j]) < m.priority(m.machines[i]) // high to low | ||
priorityI, priorityJ := m.priority(m.machines[i]), m.priority(m.machines[j]) | ||
if priorityI == priorityJ { | ||
return m.machines[i].Name < m.machines[j].Name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment above this line to explain why we're falling back to name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a comment, as you suggested.
If two machines had the same deletion priority, then on subsequent syncReplicas calls different machines could be deleted. For example, when using Oldest or Newest deletion priorities, then machines that are already deleting and machines without a node ref have the same priority. Then this could lead to multiple machines getting deleted on different calls to syncReplicas. That is, if a MachineSet is scaled down by 1, then one machine would get deleted, but then a second call to syncReplicas could delete a machine with no node ref instead of the machine that is already deleting. Signed-off-by: Donnie Adams <[email protected]>
7dae429
to
0c0db0c
Compare
/lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: vincepri The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
If two machines had the same deletion priority, then on subsequent syncReplicas
calls different machines could be deleted. For example, when using Oldest or
Newest deletion priorities, then machines that are already deleting and machines
without a node ref have the same priority. Then this could lead to multiple
machines getting deleted on different calls to syncReplicas.
That is, if a MachineSet is scaled down by 1, then one machine would get
deleted, but then a second call to syncReplicas could delete a machine with no
node ref instead of the machine that is already deleting.
What this PR does / why we need it:
By ensuring consistent ordering for machines with the same deletion priority, it can be assured that the correct number of machines get deleted when scaling down a MachineDeployment.
Which issue(s) this PR fixes
Fixes #6299