Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-3960: Introducing Sleep Action for PreStop Hook #3961
KEP-3960: Introducing Sleep Action for PreStop Hook #3961
Changes from 5 commits
5415316
38bdda7
703df90
b16afb5
e540aa1
4182346
d0d3567
4cd8ae0
c54cb31
2f53fd3
153d611
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there should be some sort of checks that the container is already gone. As a high level implementation for the KEP this may be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sry, but can you show me an example of how to check the conainter status, I'm not very familiar with this 😢
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Disagree - it actually matters. It's definitely possible [even more, it will happen for sure at least for a moment] that the FG will be enabled only in one component and not the other.
Ideally, I would like to see matrix of:
with description of what exactly we expect
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, but I'm not sure what will happen in the scenario
only the kube-apiserver enable this feature
, will the creating request pass the validation and successsfully processed or will it be rejected?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wojtek-t this goes to the discussion about alpha/beta that I started and have not followed up on :)
Should the kubelet even have the gate? We are not consistent.
IOW:
The feature gate only exists to prevent use of the field in the API. Once accepted on a pod, the feature is on for that pod. Kubelet treats the field like it is GA.
The feature gate prevents new use of the field in the API and nullifies the effect on existing uses. If gate is disabled, Kubelet will see the field and ignore it.
What do you think makes more sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The semantic that I believe works best is close to (2) [although I didn't fully understand "and nullifies ..." part]
I believe what we should target is:
If FG is disabled:
(a) any attempt to set the field for an object doesn't succeed - the field is silently dropped - the "dropFields" strategy: https://github.com/kubernetes/kubernetes/blob/master/pkg/registry/core/pod/strategy.go#LL89C10-L89C31
(b) if the field was set once the FG was enabled, it stays to be set
[the above is what kube-apiserver is doing]
(c) if the FG is disabled, even if Kubelet (or in general any other component) observes the field it ignores its existence
[for (c) there are exceptions - the nice example appeared in Sidecar KEP and computing resources in scheduler, but I personally treat them as exceptions - by default I consider the above the desired semantic]
@thockin - do you have any concerns about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this specific field, that seems somewhat acceptable because it only matters exactly once (when pod is deleted), although it is pretty surprising that the API says it is on but its really not.
But imagine a field which is used in real-time on a long-lived resource like Service or Deployment. The object was admitted, the feature was used, then the gate was disabled, the API stil says the feature is on, but suddenly it stopped working. Worse - it could stop working on some components and not others (e.g. disabled on some kube-proxy and noth others).
That makes testing MUCH more complicated.
Now imagine an enum sort of field or a loosening of validation, where a new value was allowed by the gate, and then the gate is disabled, and the value is no longer allowable. Do we fall back on some next-closest value without updating the API? That seems terrible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for the real-time example - I would actually say that the "stops working" (or "partially stops working") is still what I want. If I'm disabling a FG, I'm doing it on purpose, so I actually want this feature to really stop working, as this is presumably causing some troubles.
I definitely agree that the current support for it is poor, but I still think it's what I want.
And eventually the FeatureGate KEP will address the issue [you register a hook for disabling FG that is clearing this field, or doing whatever else is better in a given case]
Re enum example - this is harder one. The only ones that I've seen where effectively "yet another type of optional behavior", so disabling meant "you no longer have this, so you don't get anything". Which is kind of the same case as the above.
If it won't be an "optional behavior", but rather "some behavior is required" that would become super tricky and that may justify an exception. But I guess it depends on the specific case.
So I guess the summary of my answer is: if I'm requesting disabling a FG, I actually expect the feature to become disabled. And I acknowledge the fact that if someone was using it, they may got affected/broken, but I'm disabling FG for a reason.
@deads2k @johnbelamaric - FYI - this is an interesting discussion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jpbetz also :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, in this kep, should the kubelet have the gate?
if yes:
if no:
Am I understanding this correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
THis is correct.
And what I'm saying is that we should go with the first option.
I know it has drawbacks, but I think the gains from it outweigh those drawbacks. It seems that Tim is far from being convinced on the generic case, but he seems to be ok for this particular case, so let's go for it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If users have set the sleep field and then the feature is disabled, it will just be ignored and the old behavior (i.e. no sleep) would apply?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, you mean enabled first then setup the sleep field, then restart/disabled it? if that was the case, we want it be ignored and old behavior apply
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, in that case, the prestop hook will not take effect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For most API changes, the feature gate controls the admission of new uses to the system, and not the actual implementation logic.
In other words, if you enable the feature, use the feature, then disable the feature - it keeps working, bt no NEW uses of the feature work. This is not universally true, we actually do not have a consistent rule here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would the user determine this is the cause of crashes in the apiserver? Will there be any tests that help prevent this from making it into the release?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is misleading (crash), let me update later. If a pod with sleepAction was created and the featue is disabled. And this pod is recreated/updated by a user, the pod's yaml won't pass the validation.
In this case, an error will occur to point out the wrong field, instead of the "crash in the apiserver"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a hard rule that previously accepted objects must not later fail validation. When it comes to actual API review, we will ensure that :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then I think, we can safely say that this feature will not impact already running workloads?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes