Configue kubelet.service to avoid crashlooping before config is present #1352

sysrich · 2020-06-10T14:49:24Z

What type of PR is this?

/kind bug

What this PR does / why we need it:

The current documentation makes a number of references to how kubelet will be crashlooping until it is configured. eg https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/

This has certainly caused some confusion for users who notice the errors also (eg. kubernetes/kubernetes#83936)

This is unnecessary as the kubelet.service can be configured to only attempt to start when there is a config.yaml provided. This PR introduces that requirement

Which issue(s) this PR fixes:

Fixes kubernetes/kubernetes#83936

Special notes for your reviewer:

None

Does this PR introduce a user-facing change?

kubelet.service will no longer attempt to start until /var/lib/kubelet/config.yaml exists, preventing CrashLooping before the kubelet is configured.

k8s-ci-robot · 2020-06-10T14:49:32Z

Welcome @sysrich!

It looks like this is your first PR to kubernetes/release 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/release has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

k8s-ci-robot · 2020-06-10T14:49:32Z

Hi @sysrich. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot · 2020-06-10T14:49:39Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: sysrich
To complete the pull request process, please assign hoegaarden
You can assign the PR to them by writing /assign @hoegaarden in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

neolit123 · 2020-06-10T15:28:43Z

thanks for the PR @sysrich
we had discussions about this but i don't think we have a tracking issue in k/kubeadm:
https://github.com/kubernetes/kubeadm
(@rosti can correct me about this)

one issue with this change is that it pins the kubelet config to a path, so consumer of the kubelet service outside of kubeadm usage will get a service that never starts unless they write that particular path.

perhaps it would be better to simply not start the service post installation and require users (or kubeadm) to start it manually. i'm sure there is a way to do that with systemd.

/hold for review
/sig cluster-lifecycle
/assign @rosti
/priority important-longterm

neolit123 · 2020-06-10T15:28:52Z

/kind feature

neolit123 · 2020-06-10T20:54:44Z

please have a look at kubernetes/kubeadm#2178 which proposes an alternative route.

EDIT: although i've just added some caveats, so this might actually be the better route.

neolit123 · 2020-06-10T22:20:16Z

@kubernetes/sig-release hello, i have a question: if we make a breaking change with an action-required, in say a kubelet DEB package, does this mean that this change will land in all the new PATCH releases or only in the latest MINOR release?

i remember we used to have some conditional Go code that was able to control some aspects of what lands in what version, which was not ideal and seemed as a workaround for the lack of branches in k/release.

depending on the response to the above question this PR might be better than the proposal in kubernetes/kubeadm#2178

tpepper · 2020-06-11T01:37:42Z

I don't think we have a strong documented policy or enforcing on this. But given that we support upgrades from 1.X to 1.(X+1), if there was some migration/mitigation code added to the (X+1) branch in a particular patch release, it should need to be conditionally active for that and all subsequent patch releases on that branch.

For example we have 1.17.6 and a user of that can upgrade to 1.18.3. If a bug is mitigated in 1.18.4 a user upgrading from 1.17.6 should get the mitigation. If we subsequently release 1.18.5, we can't require this user to upgrade from 1.17.6 to 1.18.4 first and only then 1.18.5.

Have I understood the question correctly?

BenTheElder · 2020-06-11T02:20:02Z

@tpepper I think @neolit123 is asking about mechanically if the tooling here can gate what versions this change lands in.

@neolit123 I'm pretty sure changes here are going to be picked up in all future releases on all versions IIRC ...

cc @justaugustus

Given that, I agree that this change is desirable, but currently somewhat problematic.

To circle back to Tim's comment ... as a matter of policy I think for a change like this ideally we would put it into the next minor release forward only, and have an "action required" release note.

This brings me back to kubernetes/kubernetes#88832 ...

BenTheElder · 2020-06-11T02:23:15Z

Really this file should be in the main repo, both so it can be version controlled with the sources & release branching and so that people not using an official release have a reference systemd unit.

This has forced pretty much all cluster lifecycle projects to fork the systemd unit into their own repo instead of just writing drop-ins with their customizations, which also means we don't even test this file ...

tpepper · 2020-06-11T02:58:03Z

Location of packaging automation and whether it should be forked or have conditional logic has been a protracted discussion, and in the meantime that is in k/release not forked and occasionally with conditional logic.

But 100% agree: packages' content should be coming from non-k/release repos (eg: k/k, k/kubelet, k/kubeadm). Systemd unit files are package content.

BenTheElder · 2020-06-11T05:13:26Z

Back on the topic of the change, does this actually work as intended?
So far what I've read suggests that it simply won't start if the file doesn't exist but it won't watch for it to exist and start it then ..?

For non-kubeadm users this is probably another issue then, even if they generate the file at the standard path, their kubelet may never start?

rosti · 2020-06-11T10:02:38Z

According to systemd documentation (see here) this would only cause the kubelet service to be skipped. It won't enter crash loop state and it won't be marked as "failed".

However, no file watching would be performed. Hence, the service must be restarted manually once the condition is satisfied for it to be reevaluated.

With that in mind, this highlights that there are two types of expectations WRT the kubelet service:

The service is always enabled
The service is crash looping

kubeadm, for instance, depends only on 1 to be true, but does not require 2 (since kubeadm restarts the service as needed). Hence, this change won't break kubeadm.
Other folks may depend on both 1 & 2 and this would be a breaking change for them.

For me, crash looping has always been an odd choice. I am not aware of any Linux distro that packages daemons as automatically enabled services that are missing config and therefore crash looping.
If there are no defaults and users are expected to provide config (either manually or via an additional tool like kubeadm), then the service should be installed disabled. It's the responsibility of users and/or external tools (like kubeadm) to enable the service after it's properly configured.

With that in mind and since this change can already break part of the users, I would advocate on making things more clean by simply disabling the service at install time.

neolit123 · 2020-06-11T13:20:45Z

But 100% agree: packages' content should be coming from non-k/release repos (eg: k/k, k/kubelet, k/kubeadm). Systemd unit files are package content.

since both kubeadm and kubelet are in k/k we should have the specs in there for the time being.
k/kubelet uses staging, so it technically branched/tagged as k/k ATM.
k/kubeadm is not branched at all ATM.

this was discussed in kubernetes/kubernetes#71677

the (less desired) alternative was to start branching k/release the same way as k/k:
#857

neolit123 · 2020-06-11T13:24:19Z

crashlooping aside, something to everyone's attention is that the kubelet is removing some/most of it's flags that are present in the 10-kubeadm file in the near future.

if such a kubelet change lands in e.g. k8s 1.21, we must release the change to 10-kubeadm only for version 1.21 of the packages.

saschagrunert · 2020-06-11T13:50:26Z

if such a kubelet change lands in e.g. k8s 1.21, we must release the change to 10-kubeadm only for version 1.21 of the packages.

Changes to the spec templates with respect to different Kubernetes versions can be applied by the logic inside kubepkg.

Should we only apply the changes in this PR if Kubernetes gets installed via kubeadm? Can’t users specify the kubelet configuration manually? Can we either:

Let the user know that this file has to be present to be able to start the kubelet, or
Apply the change to the unit file only when installed via kubeadm (maybe via a custom spec target)

saschagrunert · 2020-06-11T13:52:24Z

/ok-to-test

neolit123 · 2020-06-11T13:57:52Z

Changes to the spec templates with respect to different Kubernetes versions can be applied by the logic inside kubepkg.

this is what we used to do before kubepkg. arguably, it feels like a workaround and having the specs branched per k8s release is more sane.

Should we only apply the changes in this PR if Kubernetes gets installed via kubeadm? Can’t users specify the kubelet configuration manually? Can we either:

the change in this PR is generally not desirable for both kubeadm and non-kubeadm users.
if we want to stop the crashloop behavior of the service, ideally we should go with supplying it disabled by default. #1352 (comment)

sysrich · 2020-06-15T15:41:56Z

I'd be fine with having it disabled by default. Is there an issue/PR already in progress somewhere which I have failed to find?

neolit123 · 2020-06-15T15:43:28Z

issue is tracked here for the time being kubernetes/kubeadm#2178

saschagrunert · 2020-08-05T18:20:19Z

As discussed some time ago personally with @sysrich, let’s close this for now and look forward for the long term solution.

/close

k8s-ci-robot · 2020-08-05T18:20:32Z

@saschagrunert: Closed this PR.

In response to this:

As discussed some time ago personally with @sysrich, let’s close this for now and look forward for the long term solution.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

BenTheElder · 2020-08-08T02:52:14Z

is an alternative plan being actively pursued?

…

neolit123 · 2020-08-10T11:28:31Z

added a comment in kubernetes/kubeadm#2178

Configue kubelet.service to avoid crashlooping before config is present

2ebf530

k8s-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Jun 10, 2020

k8s-ci-robot requested review from idealhack and listx June 10, 2020 14:49

k8s-ci-robot added the sig/release Categorizes an issue or PR as relevant to SIG Release. label Jun 10, 2020

k8s-ci-robot added sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Jun 10, 2020

k8s-ci-robot assigned rosti Jun 10, 2020

k8s-ci-robot added priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. and removed needs-priority labels Jun 10, 2020

k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. and removed needs-kind Indicates a PR lacks a `kind/foo` label and requires one. labels Jun 10, 2020

neolit123 mentioned this pull request Jun 10, 2020

change the kubelet service crash loop behavior kubernetes/kubeadm#2178

Open

2 tasks

BenTheElder mentioned this pull request Jun 11, 2020

systemd specs should be in-repo kubernetes/kubernetes#88832

Open

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 11, 2020

k8s-ci-robot closed this Aug 5, 2020

This was referenced Feb 13, 2021

packaging specs should be per-kubernetes version and in-tree (k/k) #1913

Open

eliminate kubelet crashloop 🙃 kubernetes-sigs/kind#2072

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configue kubelet.service to avoid crashlooping before config is present #1352

Configue kubelet.service to avoid crashlooping before config is present #1352

sysrich commented Jun 10, 2020

k8s-ci-robot commented Jun 10, 2020

k8s-ci-robot commented Jun 10, 2020

k8s-ci-robot commented Jun 10, 2020

neolit123 commented Jun 10, 2020

neolit123 commented Jun 10, 2020

neolit123 commented Jun 10, 2020 •

edited

Loading

neolit123 commented Jun 10, 2020

tpepper commented Jun 11, 2020

BenTheElder commented Jun 11, 2020

BenTheElder commented Jun 11, 2020

tpepper commented Jun 11, 2020

BenTheElder commented Jun 11, 2020

rosti commented Jun 11, 2020

neolit123 commented Jun 11, 2020 •

edited

Loading

neolit123 commented Jun 11, 2020 •

edited

Loading

saschagrunert commented Jun 11, 2020 •

edited

Loading

saschagrunert commented Jun 11, 2020

neolit123 commented Jun 11, 2020 •

edited

Loading

sysrich commented Jun 15, 2020

neolit123 commented Jun 15, 2020

saschagrunert commented Aug 5, 2020

k8s-ci-robot commented Aug 5, 2020

BenTheElder commented Aug 8, 2020 via email

neolit123 commented Aug 10, 2020

Configue kubelet.service to avoid crashlooping before config is present #1352

Configue kubelet.service to avoid crashlooping before config is present #1352

Conversation

sysrich commented Jun 10, 2020

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

k8s-ci-robot commented Jun 10, 2020

k8s-ci-robot commented Jun 10, 2020

k8s-ci-robot commented Jun 10, 2020

neolit123 commented Jun 10, 2020

neolit123 commented Jun 10, 2020

neolit123 commented Jun 10, 2020 • edited Loading

neolit123 commented Jun 10, 2020

tpepper commented Jun 11, 2020

BenTheElder commented Jun 11, 2020

BenTheElder commented Jun 11, 2020

tpepper commented Jun 11, 2020

BenTheElder commented Jun 11, 2020

rosti commented Jun 11, 2020

neolit123 commented Jun 11, 2020 • edited Loading

neolit123 commented Jun 11, 2020 • edited Loading

saschagrunert commented Jun 11, 2020 • edited Loading

saschagrunert commented Jun 11, 2020

neolit123 commented Jun 11, 2020 • edited Loading

sysrich commented Jun 15, 2020

neolit123 commented Jun 15, 2020

saschagrunert commented Aug 5, 2020

k8s-ci-robot commented Aug 5, 2020

BenTheElder commented Aug 8, 2020 via email

neolit123 commented Aug 10, 2020

neolit123 commented Jun 10, 2020 •

edited

Loading

neolit123 commented Jun 11, 2020 •

edited

Loading

neolit123 commented Jun 11, 2020 •

edited

Loading

saschagrunert commented Jun 11, 2020 •

edited

Loading

neolit123 commented Jun 11, 2020 •

edited

Loading