Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configue kubelet.service to avoid crashlooping before config is present #1352

Closed
wants to merge 1 commit into from

Conversation

sysrich
Copy link

@sysrich sysrich commented Jun 10, 2020

What type of PR is this?

/kind bug

What this PR does / why we need it:

The current documentation makes a number of references to how kubelet will be crashlooping until it is configured. eg https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/

This has certainly caused some confusion for users who notice the errors also (eg. kubernetes/kubernetes#83936)

This is unnecessary as the kubelet.service can be configured to only attempt to start when there is a config.yaml provided. This PR introduces that requirement

Which issue(s) this PR fixes:

Fixes kubernetes/kubernetes#83936

Special notes for your reviewer:

None

Does this PR introduce a user-facing change?

kubelet.service will no longer attempt to start until /var/lib/kubelet/config.yaml exists, preventing CrashLooping before the kubelet is configured.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-priority needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 10, 2020
@k8s-ci-robot
Copy link
Contributor

Welcome @sysrich!

It looks like this is your first PR to kubernetes/release 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/release has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot
Copy link
Contributor

Hi @sysrich. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Jun 10, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: sysrich
To complete the pull request process, please assign hoegaarden
You can assign the PR to them by writing /assign @hoegaarden in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot requested review from idealhack and listx June 10, 2020 14:49
@k8s-ci-robot k8s-ci-robot added the sig/release Categorizes an issue or PR as relevant to SIG Release. label Jun 10, 2020
@neolit123
Copy link
Member

thanks for the PR @sysrich
we had discussions about this but i don't think we have a tracking issue in k/kubeadm:
https://github.com/kubernetes/kubeadm
(@rosti can correct me about this)

one issue with this change is that it pins the kubelet config to a path, so consumer of the kubelet service outside of kubeadm usage will get a service that never starts unless they write that particular path.

perhaps it would be better to simply not start the service post installation and require users (or kubeadm) to start it manually. i'm sure there is a way to do that with systemd.

/hold for review
/sig cluster-lifecycle
/assign @rosti
/priority important-longterm

@k8s-ci-robot k8s-ci-robot added sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Jun 10, 2020
@k8s-ci-robot k8s-ci-robot added priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. and removed needs-priority labels Jun 10, 2020
@neolit123
Copy link
Member

/kind feature

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. and removed needs-kind Indicates a PR lacks a `kind/foo` label and requires one. labels Jun 10, 2020
@neolit123
Copy link
Member

neolit123 commented Jun 10, 2020

please have a look at kubernetes/kubeadm#2178 which proposes an alternative route.

EDIT: although i've just added some caveats, so this might actually be the better route.

@neolit123
Copy link
Member

@kubernetes/sig-release hello, i have a question: if we make a breaking change with an action-required, in say a kubelet DEB package, does this mean that this change will land in all the new PATCH releases or only in the latest MINOR release?

i remember we used to have some conditional Go code that was able to control some aspects of what lands in what version, which was not ideal and seemed as a workaround for the lack of branches in k/release.

depending on the response to the above question this PR might be better than the proposal in kubernetes/kubeadm#2178

@tpepper
Copy link
Member

tpepper commented Jun 11, 2020

I don't think we have a strong documented policy or enforcing on this. But given that we support upgrades from 1.X to 1.(X+1), if there was some migration/mitigation code added to the (X+1) branch in a particular patch release, it should need to be conditionally active for that and all subsequent patch releases on that branch.

For example we have 1.17.6 and a user of that can upgrade to 1.18.3. If a bug is mitigated in 1.18.4 a user upgrading from 1.17.6 should get the mitigation. If we subsequently release 1.18.5, we can't require this user to upgrade from 1.17.6 to 1.18.4 first and only then 1.18.5.

Have I understood the question correctly?

@BenTheElder
Copy link
Member

@tpepper I think @neolit123 is asking about mechanically if the tooling here can gate what versions this change lands in.

@neolit123 I'm pretty sure changes here are going to be picked up in all future releases on all versions IIRC ...

cc @justaugustus

Given that, I agree that this change is desirable, but currently somewhat problematic.

To circle back to Tim's comment ... as a matter of policy I think for a change like this ideally we would put it into the next minor release forward only, and have an "action required" release note.

This brings me back to kubernetes/kubernetes#88832 ...

@BenTheElder
Copy link
Member

Really this file should be in the main repo, both so it can be version controlled with the sources & release branching and so that people not using an official release have a reference systemd unit.

This has forced pretty much all cluster lifecycle projects to fork the systemd unit into their own repo instead of just writing drop-ins with their customizations, which also means we don't even test this file ...

@tpepper
Copy link
Member

tpepper commented Jun 11, 2020

Location of packaging automation and whether it should be forked or have conditional logic has been a protracted discussion, and in the meantime that is in k/release not forked and occasionally with conditional logic.

But 100% agree: packages' content should be coming from non-k/release repos (eg: k/k, k/kubelet, k/kubeadm). Systemd unit files are package content.

@BenTheElder
Copy link
Member

Back on the topic of the change, does this actually work as intended?
So far what I've read suggests that it simply won't start if the file doesn't exist but it won't watch for it to exist and start it then ..?

For non-kubeadm users this is probably another issue then, even if they generate the file at the standard path, their kubelet may never start?

@rosti
Copy link
Contributor

rosti commented Jun 11, 2020

According to systemd documentation (see here) this would only cause the kubelet service to be skipped. It won't enter crash loop state and it won't be marked as "failed".

However, no file watching would be performed. Hence, the service must be restarted manually once the condition is satisfied for it to be reevaluated.

With that in mind, this highlights that there are two types of expectations WRT the kubelet service:

  1. The service is always enabled
  2. The service is crash looping

kubeadm, for instance, depends only on 1 to be true, but does not require 2 (since kubeadm restarts the service as needed). Hence, this change won't break kubeadm.
Other folks may depend on both 1 & 2 and this would be a breaking change for them.

For me, crash looping has always been an odd choice. I am not aware of any Linux distro that packages daemons as automatically enabled services that are missing config and therefore crash looping.
If there are no defaults and users are expected to provide config (either manually or via an additional tool like kubeadm), then the service should be installed disabled. It's the responsibility of users and/or external tools (like kubeadm) to enable the service after it's properly configured.

With that in mind and since this change can already break part of the users, I would advocate on making things more clean by simply disabling the service at install time.

@neolit123
Copy link
Member

neolit123 commented Jun 11, 2020

But 100% agree: packages' content should be coming from non-k/release repos (eg: k/k, k/kubelet, k/kubeadm). Systemd unit files are package content.

since both kubeadm and kubelet are in k/k we should have the specs in there for the time being.
k/kubelet uses staging, so it technically branched/tagged as k/k ATM.
k/kubeadm is not branched at all ATM.

this was discussed in kubernetes/kubernetes#71677

the (less desired) alternative was to start branching k/release the same way as k/k:
#857

@neolit123
Copy link
Member

neolit123 commented Jun 11, 2020

crashlooping aside, something to everyone's attention is that the kubelet is removing some/most of it's flags that are present in the 10-kubeadm file in the near future.

if such a kubelet change lands in e.g. k8s 1.21, we must release the change to 10-kubeadm only for version 1.21 of the packages.

@saschagrunert
Copy link
Member

saschagrunert commented Jun 11, 2020

if such a kubelet change lands in e.g. k8s 1.21, we must release the change to 10-kubeadm only for version 1.21 of the packages.

Changes to the spec templates with respect to different Kubernetes versions can be applied by the logic inside kubepkg.

Should we only apply the changes in this PR if Kubernetes gets installed via kubeadm? Can’t users specify the kubelet configuration manually? Can we either:

  • Let the user know that this file has to be present to be able to start the kubelet, or
  • Apply the change to the unit file only when installed via kubeadm (maybe via a custom spec target)

@saschagrunert
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 11, 2020
@neolit123
Copy link
Member

neolit123 commented Jun 11, 2020

Changes to the spec templates with respect to different Kubernetes versions can be applied by the logic inside kubepkg.

this is what we used to do before kubepkg. arguably, it feels like a workaround and having the specs branched per k8s release is more sane.

Should we only apply the changes in this PR if Kubernetes gets installed via kubeadm? Can’t users specify the kubelet configuration manually? Can we either:

the change in this PR is generally not desirable for both kubeadm and non-kubeadm users.
if we want to stop the crashloop behavior of the service, ideally we should go with supplying it disabled by default. #1352 (comment)

@sysrich
Copy link
Author

sysrich commented Jun 15, 2020

I'd be fine with having it disabled by default. Is there an issue/PR already in progress somewhere which I have failed to find?

@neolit123
Copy link
Member

issue is tracked here for the time being kubernetes/kubeadm#2178

@saschagrunert
Copy link
Member

As discussed some time ago personally with @sysrich, let’s close this for now and look forward for the long term solution.

/close

@k8s-ci-robot
Copy link
Contributor

@saschagrunert: Closed this PR.

In response to this:

As discussed some time ago personally with @sysrich, let’s close this for now and look forward for the long term solution.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@BenTheElder
Copy link
Member

BenTheElder commented Aug 8, 2020 via email

@neolit123
Copy link
Member

added a comment in kubernetes/kubeadm#2178

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/release-eng Issues or PRs related to the Release Engineering subproject cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/feature Categorizes issue or PR as related to a new feature. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/release Categorizes an issue or PR as relevant to SIG Release. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
7 participants