-
Notifications
You must be signed in to change notification settings - Fork 209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
always drain before reboot #294
Conversation
This is the same logic our rebooter does (we don't use kured) A cordon out of band does not mean a drain but if it did drain, the drain will be trivial/fast. However, if it was not drained yet, it is critical to drain before rebooting. /lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this.
Hi, the code looks fine to me, but could I ask you to update the commit description to say what it does and the thinking behind it? "idempotent" is valid as a justification, but not as a description. A word like "unconditional" is more descriptive of what is happening, or just spell it out - "drain node before reboot even if unschedulable". You could also put the text of the PR description into the commit message. Thanks |
cmd/kured/main.go
Outdated
@@ -344,7 +344,6 @@ func rebootAsRequired(nodeID string, window *timewindow.TimeWindow, TTL time.Dur | |||
if err != nil { | |||
log.Fatal(err) | |||
} | |||
nodeMeta.Unschedulable = node.Spec.Unschedulable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not an unnecessary assignment. The reason for this is to detect if the node was already cordoned such that when pod restarts it will remain cordoned. (Don't uncordon the node just because of the reboot).
Now, if that is no longer intended, you would want to also change the code above such that it always uncordons the node and does not look at this metadata to make that choice (since, well, that metadata is clearly never set).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really want to touch the nodes if they have been cordonned? I would say no.
This is indeed a good point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a different question.
There are two things:
- If the node is already cordoned and we are going to reboot it, we should leave it cordoned after reboot
- If the node is already cordoned, do we just ignore it
Before this PR, the behavior was the same as (1) above except is had the problem of not draining the node before rebooting. The point of this change was to not skip the drain if we are going to reboot, which is the simple removal of the conditional. We still need to collect the cordoned state of the node such that it remains so after reboot.
However, if we want to make a new policy (I claim a breaking change) that cordoned nodes are not rebooted, then the logic needs to be very different and the uncordon above should not be gated based on the metadata in the lock.
I claim that the core behavior of the code was correct and acting like (1) and that the main change was to also drain any node that was going to reboot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Michael is right we don't want to change this value, I've reverted it. It's actually passed by reference to the acquire()
func, so we need to make sure that the nodeMeta
instance has the correct Unschedulable
property value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Michael-Sinz That was indeed my understanding, sorry if I haven't been verbose. And yes, I was still asking that new question on top. But let's ignore it for now.
@jackfrancis ok, thanks!
Thanks for the comments, will do! |
2c66190
to
1f76c6f
Compare
@bboreham I've updated the commit message (and the PR title as well), thanks again for the comments! |
@evrardjp I'm not super familiar with these tests, but it doesn't appear that the CI failure for helm + 1.18 E2E is related to this changeset. May I kindly ping a maintainer to re-run the test to queue this up for merge? Thanks again! Lemme know if there's anything else I can do on this PR. |
Yes it might happen that this test is failing, depending on where the github action runs. As you can see, it's overall good. IMO, there is no need to retrigger the test, it's good enough for me. Sadly I am not aware of a way to retrigger the failing test (except for the "re-trigger" button, which appears only for those having write access to the repo IIRC). For me, this PR is ready. |
@dholbach Any thoughts on a final review and merge of this change? |
@bboreham Is there anything more we want to do to consider this for a merge? Should we retry the helm chart + 1.18 test job? |
I'm trying to find someone to rerun the test case -- I would prefer to see it green before it's merged. (Apparently I can merge it but I can't rerun a test 🤷) |
To short-circuit this, running tests in my personal GH: https://github.com/dholbach/kured/actions/runs/633086975 |
Looking good! 🚀 |
This changes the pre-reboot drain functionality so that it always runs, regardless of the value of the Unschedulable node property. Because kubectl drain is idempotent, we shouldn't have to worry about whether the node has already been set to Unschedulable (perhaps due to a prior, unsuccessful loop of the kured reboot cycle): we can run it over and over again. And because this drain func actually does a cordon + drain (and it only performs the drain if a cordon is successful), we can be sure that we aren't going to be thrashing this node w/ respect to scheduled pods. This also fixes an edge case: if the node has been marked Unschedulable out-of-band, but workloads remain Running on this node, kured will no longer reboot the node's underlying VM/machine while it is actively running pods.
1f76c6f
to
93c8242
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved, based on Michael's review.
This PR changes the pre-reboot drain functionality so that it always runs, regardless of the value of the
Unschedulable
node property. Ostensibly this "skip if unschedulable" foo was added due to the fact that the value ofUnschedulable
will be set to false as a side-effect of a kured reboot operation, and if we're in a "retry" loop here, we shouldn't have to "drain again".However, because kubectl drain is idempotent, we shouldn't have to worry about any of that: we can run it over and over again. And because this
drain
func actually does a cordon + drain (and it only performs the drain if a cordon is successful), we can be sure that we aren't going to be thrashing this node w/ respect to scheduled pods.And in fact, the current implementation of "only drain if node is marked Unschedulable" presents an edge case: if the node has been marked Unschedulable out-of-band, but workloads remain Running on this node, we will reboot the node's underlying VM/machine while it is actively running pods.
Fixes #18
PS 👋 thanks for kured! :)