Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-3960: Introducing Sleep Action for PreStop Hook #3961

Merged
merged 11 commits into from
Jun 16, 2023
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions keps/prod-readiness/sig-node/3960.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# The KEP must have an approver from the
# "prod-readiness-approvers" group
# of http://git.k8s.io/enhancements/OWNERS_ALIASES
kep-number: 3660
AxeZhan marked this conversation as resolved.
Show resolved Hide resolved
alpha:
approver: ""
AxeZhan marked this conversation as resolved.
Show resolved Hide resolved
361 changes: 361 additions & 0 deletions keps/sig-node/3960-pod-lifecycle-sleep-action/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,361 @@
# KEP-3960: Introducing Sleep Action for PreStop Hook

<!-- toc -->
- [Release Signoff Checklist](#release-signoff-checklist)
- [Summary](#summary)
- [Motivation](#motivation)
- [Goals](#goals)
- [Non-Goals](#non-goals)
- [Proposal](#proposal)
- [User Stories (Optional)](#user-stories-optional)
- [Story 1](#story-1)
- [Risks and Mitigations](#risks-and-mitigations)
- [Design Details](#design-details)
- [Implementation](#implementation)
- [Test Plan](#test-plan)
- [Prerequisite testing updates](#prerequisite-testing-updates)
- [Unit tests](#unit-tests)
- [Integration tests](#integration-tests)
- [e2e tests](#e2e-tests)
- [Graduation Criteria](#graduation-criteria)
- [Alpha](#alpha)
- [Beta](#beta)
- [GA](#ga)
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
- [Upgrade](#upgrade)
- [Downgrade](#downgrade)
- [Version Skew Strategy](#version-skew-strategy)
- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)
- [Feature Enablement and Rollback](#feature-enablement-and-rollback)
- [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning)
- [Monitoring Requirements](#monitoring-requirements)
- [Dependencies](#dependencies)
- [Scalability](#scalability)
- [Troubleshooting](#troubleshooting)
- [Implementation History](#implementation-history)
- [Drawbacks](#drawbacks)
- [Alternatives](#alternatives)
<!-- /toc -->

## Release Signoff Checklist

Items marked with (R) are required *prior to targeting to a milestone / release*.

- [x] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
- [ ] (R) KEP approvers have approved the KEP status as `implementable`
- [x] (R) Design details are appropriately documented
- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
- [ ] e2e Tests for all Beta API Operations (endpoints)
- [ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
- [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free
- [ ] (R) Graduation criteria is in place
- [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
- [ ] (R) Production readiness review completed
- [ ] (R) Production readiness review approved
- [ ] "Implementation History" section is up-to-date for milestone
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes

<!--
**Note:** This checklist is iterative and should be reviewed and updated every time this enhancement is being considered for a milestone.
-->

[kubernetes.io]: https://kubernetes.io/
[kubernetes/enhancements]: https://git.k8s.io/enhancements
[kubernetes/kubernetes]: https://git.k8s.io/kubernetes
[kubernetes/website]: https://git.k8s.io/website

AxeZhan marked this conversation as resolved.
Show resolved Hide resolved
## Summary

This KEP proposes the addition of a new sleep action for the PreStop lifecycle hook in Kubernetes, allowing containers to pause for a specified duration before termination. This enhancement aims to provide a more straightforward way to manage graceful shutdowns and improve the overall lifecycle management of containers.

## Motivation

Currently, Kubernetes supports two types of actions for PreStop hooks: exec and httpGet. Although these actions offer flexibility, they often require additional scripting or custom solutions to achieve a simple sleep functionality. A built-in sleep action would provide a more user-friendly and native solution for scenarios where a container needs to pause before shutting down, such as:

- Ensuring that the container gracefully releases resources and connections.
- Allowing a smooth transition in load balancers or service meshes.
- Providing a buffer period for external monitoring and alerting systems.

### Goals

- Allow containers to perform cleanup or shutdown actions before being terminated, by sleeping for a specified duration in the preStop hook.
- Improve the overall reliability and availability of Kubernetes applications by providing a way for containers to gracefully terminate.

### Non-Goals

- This KEP does not aim to replace other Kubernetes features that can be used to perform cleanup actions, such as init containers or sidecar containers.
- This KEP does not aim to provide a way to pause or delay pod termination indefinitely.

## Proposal

We propose adding a new sleep action for the PreStop hook, which will pause the container for a specified duration before termination. The API changes will include the following:

- Extending the LifecycleHandler object to support a new Sleep field.
- Adding a SleepAction object that includes a Duration field to specify the sleep period in seconds.

### User Stories (Optional)
AxeZhan marked this conversation as resolved.
Show resolved Hide resolved

#### Story 1
AxeZhan marked this conversation as resolved.
Show resolved Hide resolved
As a Kubernetes user, I want to configure my container to sleep for a specific duration during terminating with grace.And I want to do it without needing a sleep binary in my image.
AxeZhan marked this conversation as resolved.
Show resolved Hide resolved

AxeZhan marked this conversation as resolved.
Show resolved Hide resolved
### Risks and Mitigations

N/A

## Design Details

### Implementation

- Adding a SleepAction object that includes a Duration field to specify the sleep period in seconds.
```go
type SleepAction struct {
AxeZhan marked this conversation as resolved.
Show resolved Hide resolved
// Seconds is the number of seconds to sleep.
Seconds int32
}
```

- Adding a Sleep field to the LifecycleHandler struct, which represents the duration in seconds that the container should sleep before being terminated during the preStop hook.
AxeZhan marked this conversation as resolved.
Show resolved Hide resolved
```go
type LifecycleHandler struct {
// Sleep pauses further lifecycle progress for a defined time period.
AxeZhan marked this conversation as resolved.
Show resolved Hide resolved
Sleep *SleepAction
}
```

- When Kubernetes executes the preStop hook with sleep action, it'll simply sleep for a specific seconds.
```go
func (hr *handlerRunner) Run(ctx context.Context, containerID kubecontainer.ContainerID, pod *v1.Pod, container *v1.Container, handler *v1.LifecycleHandler) (string, error) {
switch {
case handler.Exec != nil:...
case handler.HTTPGet != nil:...
case handler.Sleep != nil:
hr.runSleepHandler(ctx, handler.Sleep.Seconds)
return "", nil
default:...
}
}

func (hr *handlerRunner) runSleepHandler(ctx context.Context, seconds int32) {
time.Sleep(time.Duration(seconds) * time.Second)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there should be some sort of checks that the container is already gone. As a high level implementation for the KEP this may be fine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sry, but can you show me an example of how to check the conainter status, I'm not very familiar with this 😢

AxeZhan marked this conversation as resolved.
Show resolved Hide resolved
}
```
### Test Plan

[x] I/we understand the owners of the involved components may require updates to
existing tests to make this code solid enough prior to committing the changes necessary
to implement this enhancement.

##### Prerequisite testing updates

##### Unit tests

- Test that the runSleepHandler function sleeps for the correct duration when given a valid duration value.
- Test that the runSleepHandler function returns without error when given a valid duration value.
- Test that the validation returns an error when given an invalid duration value (e.g., a negative value).
- Test that the validation returns an error when given duration is longer than the termination graceperiod.
- Test that the runSleepHandler function returns immediately when given a duration of zero.

##### Integration tests
AxeZhan marked this conversation as resolved.
Show resolved Hide resolved
N/A

##### e2e tests
- Basic functionality
1. Create a simple pod with a container that runs a long-running process.
2. Add a preStop hook to the container configuration, using the new sleepAction with a specified sleep duration (e.g., 5 seconds).
3. Delete the pod and observe the time it takes for the container to terminate.
4. Verify that the container sleeps for the specified duration before it is terminated.
AxeZhan marked this conversation as resolved.
Show resolved Hide resolved

- Sleep duration boundary testing
AxeZhan marked this conversation as resolved.
Show resolved Hide resolved
1. Create a simple pod with a container that runs a long-running process.
2. Add a preStop hook to the container configuration, using the new sleepAction with various sleep durations, including:1 seconds (minimum allowed value), values slightly above the minimum allowed value (to test edge cases).
3. For each sleep duration, delete the pod and observe the time it takes for the container to terminate.
4. Verify that the container sleeps for the specified duration before it is terminated.

- Interaction with termination grace period
thockin marked this conversation as resolved.
Show resolved Hide resolved
1. Create a simple pod with a container that runs a long-running process.
2. Add a preStop hook to the container configuration, using the new sleepAction with a specified sleep duration (e.g., 5 seconds).
3. Set the termination grace period to various values, including:
- Equal to the sleep duration
- Greater than the sleep duration
- Greater than the sleep duration, but reduced to less than the sleep duration at runtime
4. For each termination grace period value, delete the pod and observe the time it takes for the container to terminate.
5. Verify that the container is terminated after the min(sleep, grace).

### Graduation Criteria

#### Alpha

- Feature implemented behind a feature flag
- Initial unit/integration tests completed and enabled
AxeZhan marked this conversation as resolved.
Show resolved Hide resolved

#### Beta

- Gather feedback from developers and surveys
- Additional e2e tests are completed

#### GA

- No negative feedback
- No bug issues reported

### Upgrade / Downgrade Strategy

#### Upgrade
The previous PreStop behavior will not be broken. Users can continue to use their hooks as it is.
To use this enhancement, users need to enable the feature gate, and add sleep action in their prestop hook.

#### Downgrade
Kube-apiserver will ignore sleep in prestop hook.
AxeZhan marked this conversation as resolved.
Show resolved Hide resolved

### Version Skew Strategy

N/A
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disagree - it actually matters. It's definitely possible [even more, it will happen for sure at least for a moment] that the FG will be enabled only in one component and not the other.

Ideally, I would like to see matrix of:

  • kubeapiserver enabled/disable
  • kubelet enabled/disable
    with description of what exactly we expect

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, but I'm not sure what will happen in the scenario only the kube-apiserver enable this feature, will the creating request pass the validation and successsfully processed or will it be rejected?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wojtek-t this goes to the discussion about alpha/beta that I started and have not followed up on :)

Should the kubelet even have the gate? We are not consistent.

IOW:

  1. The feature gate only exists to prevent use of the field in the API. Once accepted on a pod, the feature is on for that pod. Kubelet treats the field like it is GA.

  2. The feature gate prevents new use of the field in the API and nullifies the effect on existing uses. If gate is disabled, Kubelet will see the field and ignore it.

What do you think makes more sense?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The semantic that I believe works best is close to (2) [although I didn't fully understand "and nullifies ..." part]
I believe what we should target is:

If FG is disabled:
(a) any attempt to set the field for an object doesn't succeed - the field is silently dropped - the "dropFields" strategy: https://github.com/kubernetes/kubernetes/blob/master/pkg/registry/core/pod/strategy.go#LL89C10-L89C31
(b) if the field was set once the FG was enabled, it stays to be set
[the above is what kube-apiserver is doing]

(c) if the FG is disabled, even if Kubelet (or in general any other component) observes the field it ignores its existence

[for (c) there are exceptions - the nice example appeared in Sidecar KEP and computing resources in scheduler, but I personally treat them as exceptions - by default I consider the above the desired semantic]

@thockin - do you have any concerns about this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this specific field, that seems somewhat acceptable because it only matters exactly once (when pod is deleted), although it is pretty surprising that the API says it is on but its really not.

But imagine a field which is used in real-time on a long-lived resource like Service or Deployment. The object was admitted, the feature was used, then the gate was disabled, the API stil says the feature is on, but suddenly it stopped working. Worse - it could stop working on some components and not others (e.g. disabled on some kube-proxy and noth others).

That makes testing MUCH more complicated.

Now imagine an enum sort of field or a loosening of validation, where a new value was allowed by the gate, and then the gate is disabled, and the value is no longer allowable. Do we fall back on some next-closest value without updating the API? That seems terrible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for the real-time example - I would actually say that the "stops working" (or "partially stops working") is still what I want. If I'm disabling a FG, I'm doing it on purpose, so I actually want this feature to really stop working, as this is presumably causing some troubles.
I definitely agree that the current support for it is poor, but I still think it's what I want.
And eventually the FeatureGate KEP will address the issue [you register a hook for disabling FG that is clearing this field, or doing whatever else is better in a given case]

Re enum example - this is harder one. The only ones that I've seen where effectively "yet another type of optional behavior", so disabling meant "you no longer have this, so you don't get anything". Which is kind of the same case as the above.
If it won't be an "optional behavior", but rather "some behavior is required" that would become super tricky and that may justify an exception. But I guess it depends on the specific case.

So I guess the summary of my answer is: if I'm requesting disabling a FG, I actually expect the feature to become disabled. And I acknowledge the fact that if someone was using it, they may got affected/broken, but I'm disabling FG for a reason.

@deads2k @johnbelamaric - FYI - this is an interesting discussion

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpbetz also :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, in this kep, should the kubelet have the gate?

if yes:

  1. only apiserver enable the FG
  • validation will pass, but kubelet will ignore this field when pod is terminating?
  1. only kubelet enable the FG
  • new pod will fail the validation, but the existing pods will exec the sleepAction.

if no:

  • FG only controls the validation, once a container with sleepAction is set, it will always sleep before terminating regardless of whether FG still holds

Am I understanding this correct?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

THis is correct.

And what I'm saying is that we should go with the first option.
I know it has drawbacks, but I think the gains from it outweigh those drawbacks. It seems that Tim is far from being convinced on the generic case, but he seems to be ok for this particular case, so let's go for it here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK


## Production Readiness Review Questionnaire

### Feature Enablement and Rollback

###### How can this feature be enabled / disabled in a live cluster?

- [x] Feature gate (also fill in values in `kep.yaml`)
- Feature gate name: PodLifecycleSleepAction
AxeZhan marked this conversation as resolved.
Show resolved Hide resolved
- Components depending on the feature gate: kubelet,kube-apiserver
- [ ] Other
- Describe the mechanism:
- Will enabling / disabling the feature require downtime of the control
plane?
- Will enabling / disabling the feature require downtime or reprovisioning
of a node?

###### Does enabling the feature change any default behavior?

No

###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?

The feature can be disabled in Alpha and Beta versions by restarting kube-apiserver with the feature-gate off. In terms of Stable versions, users can choose to opt-out by not setting the sleep field.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If users have set the sleep field and then the feature is disabled, it will just be ignored and the old behavior (i.e. no sleep) would apply?

Copy link
Member

@charles-chenzz charles-chenzz Jun 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then feature disabled

IIUC, you mean enabled first then setup the sleep field, then restart/disabled it? if that was the case, we want it be ignored and old behavior apply

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If users have set the sleep field and then the feature is disabled, it will just be ignored and the old behavior (i.e. no sleep) would apply?

Yes, in that case, the prestop hook will not take effect.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For most API changes, the feature gate controls the admission of new uses to the system, and not the actual implementation logic.

In other words, if you enable the feature, use the feature, then disable the feature - it keeps working, bt no NEW uses of the feature work. This is not universally true, we actually do not have a consistent rule here.


###### What happens if we reenable the feature if it was previously rolled back?

New pods with sleep action in prestop hook can be created.
AxeZhan marked this conversation as resolved.
Show resolved Hide resolved

###### Are there any tests for feature enablement/disablement?

No
AxeZhan marked this conversation as resolved.
Show resolved Hide resolved

### Rollout, Upgrade and Rollback Planning

###### How can a rollout or rollback fail? Can it impact already running workloads?

The change is opt-in, it doesn't impact already running workloads. But problems with the updated validation logic may cause crashes in the apiserver.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would the user determine this is the cause of crashes in the apiserver? Will there be any tests that help prevent this from making it into the release?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is misleading (crash), let me update later. If a pod with sleepAction was created and the featue is disabled. And this pod is recreated/updated by a user, the pod's yaml won't pass the validation.
In this case, an error will occur to point out the wrong field, instead of the "crash in the apiserver"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a hard rule that previously accepted objects must not later fail validation. When it comes to actual API review, we will ensure that :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a hard rule that previously accepted objects must not later fail validation. When it comes to actual API review, we will ensure that :)

Then I think, we can safely say that this feature will not impact already running workloads?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes


###### What specific metrics should inform a rollback?

N/A
AxeZhan marked this conversation as resolved.
Show resolved Hide resolved

###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?

Test manually
AxeZhan marked this conversation as resolved.
Show resolved Hide resolved

###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?

Yes
AxeZhan marked this conversation as resolved.
Show resolved Hide resolved

### Monitoring Requirements

###### How can an operator determine if the feature is in use by workloads?

Inspect the prestop hook configuration

###### How can someone using this feature know that it is working for their instance?

- [ ] Events
- Event Reason:
- [ ] API .status
- Condition name:
- Other field:
- [x] Other (treat as last resort)
- Details: Check the logs of the container during termination, check the termination duration.

###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?

N/A

###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?

- [ ] Metrics
- Metric name:
- [Optional] Aggregation method:
- Components exposing the metric:
- [x] Other (treat as last resort)
- Details: Check the logs of the container during termination, check the termination duration.

###### Are there any missing metrics that would be useful to have to improve observability of this feature?

N/A

### Dependencies

N/A

###### Does this feature depend on any specific services running in the cluster?

No

### Scalability

###### Will enabling / using this feature result in any new API calls?

No

###### Will enabling / using this feature result in introducing new API types?

No

###### Will enabling / using this feature result in any new calls to the cloud provider?

No

###### Will enabling / using this feature result in increasing size or count of the existing API objects?

No
AxeZhan marked this conversation as resolved.
Show resolved Hide resolved

###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?

No

###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?

No

###### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?

No

### Troubleshooting

###### How does this feature react if the API server and/or etcd is unavailable?
- In general, if the API server and/or etcd is unavailable, Kubernetes will be unable to coordinate container termination and the preStop hook may not be executed at all. This could result in the container being terminated abruptly without the opportunity to perform any necessary cleanup actions.

- If the sleep action is enabled for the preStop hook, it will still attempt to sleep for the specified duration before the container is terminated. However, if the API server and/or etcd is unavailable, Kubernetes may be unable to send the SIGTERM signal to the container, which could cause the container to continue running beyond the specified sleep period.

###### What are other known failure modes?

N/A

###### What steps should be taken if SLOs are not being met to determine the problem?

N/A

## Implementation History

- 2023-04-22: Initial draft KEP

## Drawbacks

N/A

## Alternatives

N/A
AxeZhan marked this conversation as resolved.
Show resolved Hide resolved
Loading