Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

always drain before reboot #294

Merged
merged 1 commit into from
Mar 9, 2021
Merged

Conversation

jackfrancis
Copy link
Collaborator

@jackfrancis jackfrancis commented Jan 11, 2021

This PR changes the pre-reboot drain functionality so that it always runs, regardless of the value of the Unschedulable node property. Ostensibly this "skip if unschedulable" foo was added due to the fact that the value of Unschedulable will be set to false as a side-effect of a kured reboot operation, and if we're in a "retry" loop here, we shouldn't have to "drain again".

However, because kubectl drain is idempotent, we shouldn't have to worry about any of that: we can run it over and over again. And because this drain func actually does a cordon + drain (and it only performs the drain if a cordon is successful), we can be sure that we aren't going to be thrashing this node w/ respect to scheduled pods.

And in fact, the current implementation of "only drain if node is marked Unschedulable" presents an edge case: if the node has been marked Unschedulable out-of-band, but workloads remain Running on this node, we will reboot the node's underlying VM/machine while it is actively running pods.

Fixes #18

PS 👋 thanks for kured! :)

@Michael-Sinz
Copy link

This is the same logic our rebooter does (we don't use kured)

A cordon out of band does not mean a drain but if it did drain, the drain will be trivial/fast. However, if it was not drained yet, it is critical to drain before rebooting.

/lgtm

Copy link
Collaborator

@evrardjp evrardjp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this.

@bboreham
Copy link
Contributor

Hi, the code looks fine to me, but could I ask you to update the commit description to say what it does and the thinking behind it?

"idempotent" is valid as a justification, but not as a description. A word like "unconditional" is more descriptive of what is happening, or just spell it out - "drain node before reboot even if unschedulable".

You could also put the text of the PR description into the commit message.

Thanks

@@ -344,7 +344,6 @@ func rebootAsRequired(nodeID string, window *timewindow.TimeWindow, TTL time.Dur
if err != nil {
log.Fatal(err)
}
nodeMeta.Unschedulable = node.Spec.Unschedulable

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not an unnecessary assignment. The reason for this is to detect if the node was already cordoned such that when pod restarts it will remain cordoned. (Don't uncordon the node just because of the reboot).

Now, if that is no longer intended, you would want to also change the code above such that it always uncordons the node and does not look at this metadata to make that choice (since, well, that metadata is clearly never set).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want to touch the nodes if they have been cordonned? I would say no.

This is indeed a good point.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a different question.

There are two things:

  1. If the node is already cordoned and we are going to reboot it, we should leave it cordoned after reboot
  2. If the node is already cordoned, do we just ignore it

Before this PR, the behavior was the same as (1) above except is had the problem of not draining the node before rebooting. The point of this change was to not skip the drain if we are going to reboot, which is the simple removal of the conditional. We still need to collect the cordoned state of the node such that it remains so after reboot.

However, if we want to make a new policy (I claim a breaking change) that cordoned nodes are not rebooted, then the logic needs to be very different and the uncordon above should not be gated based on the metadata in the lock.

I claim that the core behavior of the code was correct and acting like (1) and that the main change was to also drain any node that was going to reboot.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Michael is right we don't want to change this value, I've reverted it. It's actually passed by reference to the acquire() func, so we need to make sure that the nodeMeta instance has the correct Unschedulable property value.

Copy link
Collaborator

@evrardjp evrardjp Jan 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Michael-Sinz That was indeed my understanding, sorry if I haven't been verbose. And yes, I was still asking that new question on top. But let's ignore it for now.

@jackfrancis ok, thanks!

@jackfrancis
Copy link
Collaborator Author

Hi, the code looks fine to me, but could I ask you to update the commit description to say what it does and the thinking behind it?

"idempotent" is valid as a justification, but not as a description. A word like "unconditional" is more descriptive of what is happening, or just spell it out - "drain node before reboot even if unschedulable".

You could also put the text of the PR description into the commit message.

Thanks

Thanks for the comments, will do!

@jackfrancis jackfrancis changed the title idempotent drain always drain before reboot Jan 12, 2021
@jackfrancis
Copy link
Collaborator Author

@bboreham I've updated the commit message (and the PR title as well), thanks again for the comments!

@jackfrancis
Copy link
Collaborator Author

@evrardjp I'm not super familiar with these tests, but it doesn't appear that the CI failure for helm + 1.18 E2E is related to this changeset. May I kindly ping a maintainer to re-run the test to queue this up for merge?

Thanks again! Lemme know if there's anything else I can do on this PR.

@evrardjp
Copy link
Collaborator

evrardjp commented Jan 20, 2021

Yes it might happen that this test is failing, depending on where the github action runs. As you can see, it's overall good. IMO, there is no need to retrigger the test, it's good enough for me. Sadly I am not aware of a way to retrigger the failing test (except for the "re-trigger" button, which appears only for those having write access to the repo IIRC).

For me, this PR is ready.

@jackfrancis
Copy link
Collaborator Author

@dholbach Any thoughts on a final review and merge of this change?

@jackfrancis
Copy link
Collaborator Author

@bboreham Is there anything more we want to do to consider this for a merge? Should we retry the helm chart + 1.18 test job?

@jackfrancis
Copy link
Collaborator Author

@dholbach do we think that @evrardjp's comments from Jan 20 are sufficient to lgtm and merge this? If you're able to retry the failed 1.18 test (doesn't seem to be testing anything related to this change -- no helm chart changes are included here), that would probably be a good idea.

Thanks!

@squaremo
Copy link

squaremo commented Mar 8, 2021

I'm trying to find someone to rerun the test case -- I would prefer to see it green before it's merged. (Apparently I can merge it but I can't rerun a test 🤷)

@dholbach
Copy link
Member

dholbach commented Mar 8, 2021

To short-circuit this, running tests in my personal GH: https://github.com/dholbach/kured/actions/runs/633086975

@dholbach
Copy link
Member

dholbach commented Mar 8, 2021

To short-circuit this, running tests in my personal GH: https://github.com/dholbach/kured/actions/runs/633086975

Looking good! 🚀

This changes the pre-reboot drain functionality so that it always runs, regardless of the value of the Unschedulable node property.

Because kubectl drain is idempotent, we shouldn't have to worry about whether the node has already been set to Unschedulable (perhaps due to a prior, unsuccessful loop of the kured reboot cycle): we can run it over and over again. And because this drain func actually does a cordon + drain (and it only performs the drain if a cordon is successful), we can be sure that we aren't going to be thrashing this node w/ respect to scheduled pods.

This also fixes an edge case: if the node has been marked Unschedulable out-of-band, but workloads remain Running on this node, kured will no longer reboot the node's underlying VM/machine while it is actively running pods.
Copy link
Member

@dholbach dholbach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved, based on Michael's review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Should always drain before reboot
6 participants