Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configure kubelet Shutdown Grace Period #276

Merged
merged 5 commits into from
Apr 6, 2023
Merged

Conversation

mnitchev
Copy link
Member

@mnitchev mnitchev commented Apr 5, 2023

What this PR does / why we need it

This PR configures both the shutdownGracePeriod and shutdownGracePeriodCriticalPods. The grace period is set to 5 minutes while the critical pods grace period is set to 1 minute. This means that kubelet will start terminating critical pods (pods with Priority class set to system-cluster-critical or system-node-critical) in the last 1 minute of the shutdownGracePeriod. This is important since our CNI (cilium) is a critical pod that needs to terminated last.

This feature is implemented using systemd Inhibitor Locs. The maximum inhibit delay is controlled by the InhibitDelayMaxSec in logind. On our AWS nodes the default is set to 30 seconds, so we need to override it. This is why the /lib/systemd/logind.conf.d/zzz-kubelet-graceful-shutdown.conf is mounted in the nodes (the zzz prefix is so it always lands as the last evaluated file).

Checklist

  • Update changelog in CHANGELOG.md.

References

  1. Graceful Node Shutdown docs
  2. Graceful Node Shutdown Proposal docs (there's some more info on how it works in here)
  3. logind.conf docs
  4. systemd inhibit docs

Trigger e2e tests

/test create
/test upgrade
/run cluster-test-suites

@tityosbot
Copy link

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@mnitchev mnitchev changed the title Pr shutdown grace period Configure kubelet Shutdown Grace Period Apr 5, 2023
@mnitchev
Copy link
Member Author

mnitchev commented Apr 5, 2023

/test all

@tityosbot
Copy link

@mnitchev: No jobs can be run with /test all.
The following commands are available to trigger jobs:

  • /test create
  • /test upgrade

In response to this:

/test all

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@mnitchev
Copy link
Member Author

mnitchev commented Apr 5, 2023

/test create
/test upgrade

@mnitchev mnitchev marked this pull request as ready for review April 5, 2023 09:41
@mnitchev mnitchev requested a review from a team April 5, 2023 09:41
Copy link
Contributor

@AndiDog AndiDog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM apart from comments

CHANGELOG.md Outdated Show resolved Hide resolved
@mnitchev mnitchev requested a review from AndiDog April 5, 2023 14:31
@mnitchev mnitchev enabled auto-merge (squash) April 6, 2023 07:23
@mnitchev
Copy link
Member Author

mnitchev commented Apr 6, 2023

/test create
/test upgrade

@mnitchev mnitchev merged commit 9d27381 into master Apr 6, 2023
@mnitchev mnitchev deleted the pr-shutdown-grace-period branch April 6, 2023 07:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants