-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-753: add PRR questionnaire answers for beta #4255
Conversation
matthyx
commented
Oct 1, 2023
- One-line PR description: add PRR questionnaire answers for beta
- Issue link: Sidecar Containers #753
- Other comments:
Signed-off-by: Matthias Bertschy <[email protected]>
#### Beta | ||
|
||
- Implement proper termination ordering. | ||
- Provide defaults for `restartPolicy` field on init containers, `nil` is not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that's true, why did we use a pointer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this has to merge for beta:
kubernetes/kubernetes#120176
edit: I was wrong, sorry
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the PR is not relevant to this.
We may have something like inheritPodRestartPolicy
by default?
I'd say nil is OK to indicate it is inherited from the pod's restartPolicy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do that, initContainers will become sidecars for all Pods created by ReplicaSets.
We should default to Never
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It depends on the pod's restartPolicy.
If the pod has restartPolicy of 'Never', its regular init containers will never restart.
If the pod has restartPolicy of 'OnFailure', or 'Always', its regular init containers will restart only on failure.
Not sure how to name it. How about just call it 'Default' and document the detail...? (just like the 'Default' of PodDNSPolicy)
@SergeyKanzhelev WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, so it basically means "I'm not a sidecar". Hmm, I would still prefer nil
then...
and validate the declared limits? | ||
--> | ||
|
||
No. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't sidecars consume these resources?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you take into account that before this KEP sidecars were already running as "regular" containers in the Pod, I don't consider them as being added.
Also requests and limits are respected.
/cc @SergeyKanzhelev |
5368472
to
92404ba
Compare
You can take a look at one potential example of such test in: | ||
https://github.com/kubernetes/kubernetes/pull/97058/files#diff-7826f7adbc1996a05ab52e3f5f02429e94b68ce6bce0dc534d1be636154fded3R246-R282 | ||
--> | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please link to tests [in the upgrade/downgrade testing you only point to tets that you would like to create, I would like to see those tests]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Friendly ping
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
apparently we don't have one... I'm going to push for it now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it fine if we add it to the code after the PRR merges but inside the k/k beta PR ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but please:
- add a sentence here that this test will be added before graduating the feature to beta in k/k
- add it explicitly to beta graduation criteria
###### What specific metrics should inform a rollback? | ||
|
||
<!-- | ||
What signals should users be paying attention to when the feature is young | ||
that might indicate a serious problem? | ||
--> | ||
|
||
Pods that don't feature sidecars are not affected by the KEP. | ||
|
||
Pods with sidecars might take a long time to exit and exceed the TGPS, a new |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be generally moved to the above question.
This question is about metrics. Can you please think through metrics that we can monitor?
@@ -1238,12 +1504,16 @@ Longer term, we may want to require automated upgrade/rollback tests, but we | |||
are missing a bunch of machinery and tooling and can't do that now. | |||
--> | |||
|
|||
TBD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if not running the test now, can you please describe the test that you will run here?
#3658 is a nice example
- [ ] Other (treat as last resort) | ||
- Details: | ||
- [X] Other (treat as last resort) | ||
- Details: `kubectl describe pod <pod-name>` will show the new field `.spec.initContainers[i].restartPolicy` for the sidecar containers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That doesn't mean the feature is working - that means that I wanted it to work.
What we probably want to check is if the init container with restartPolicy set is running together with the regular containers.
@@ -1446,8 +1769,12 @@ For each of them, fill in the following information by copying the below templat | |||
- Testing: Are there any tests for failure mode? If not, describe why. | |||
--> | |||
|
|||
None. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is not something I can easily believe.
Even if we didn't observe, there are definitely failure modes that we should mention..
0962b6d
to
f71978b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few more comments.
I would also like to see some resolution on #4183 before merging this one...
to reject such Pods when the feature gate is disabled to keep Downgrade safe. | ||
|
||
**Note**, For the control plane and kubelet we will implement logic to reject Pods | ||
with sidecar containers when feature gate got turned off. | ||
**Note**, We have implemented logic for the control plane and kubelet to reject Pods |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for being pedantic here, but can you link to that here?
We were going back and forth on that for Alpha release and what happened was actually not what we initially agreed on, so I would like to ensure that this time [which is much more important given Beta will be on by default] we actually do that right
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kube-apiserver drops the restartPolicy field:
https://github.com/kubernetes/kubernetes/blob/f19b62fc0914b38941922afefd1e34eb55f87ee7/pkg/api/pod/util.go#L554-L560
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we wrote down contradictory statements at that time.
In v1.28, if the FG is disabled,
- the apiserver just drops the restartPolicy field https://github.com/kubernetes/kubernetes/blob/f19b62fc0914b38941922afefd1e34eb55f87ee7/pkg/api/pod/util.go#L554-L560
- the scheduler does not schedule any node for the pod with sidecars https://github.com/kubernetes/kubernetes/blob/f19b62fc0914b38941922afefd1e34eb55f87ee7/pkg/scheduler/framework/plugins/noderesources/fit.go#L256-L262
- the kubelet does not admit the pod with sidecars https://github.com/kubernetes/kubernetes/blob/f19b62fc0914b38941922afefd1e34eb55f87ee7/pkg/kubelet/lifecycle/predicate.go#L78-L91
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
You can take a look at one potential example of such test in: | ||
https://github.com/kubernetes/kubernetes/pull/97058/files#diff-7826f7adbc1996a05ab52e3f5f02429e94b68ce6bce0dc534d1be636154fded3R246-R282 | ||
--> | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Friendly ping
@@ -1297,18 +1604,28 @@ These goals will help you determine what you need to measure (SLIs) in the next | |||
question. | |||
--> | |||
|
|||
- sidecar init containers are running and restarted as expected |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SLOs are effectively metrics+thresholds
Can you please update this section and formulate it more in the context of SLIs below?
39e6869
to
7fe8891
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two small comment - other than that it LGTM (though it will still also require SIG level approval).
with sidecar containers when feature gate got turned off. | ||
**Note**, We have implemented logic for the kubelet to reject Pods | ||
with sidecar containers when feature gate is turned off. For the control plane | ||
we simply ignore the new field to maintain Pod scheduling. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's be more explicit what exactly is happening as mentioned by @gjkim42
For the control plane - kube-apiserver is dropping the field (if it wasn't set before) and kube-scheduler is keeping pods with the field set unschedulable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: linking the code you pasted in this comment thread:
#4255 (comment)
would be extremely helpful here too
You can take a look at one potential example of such test in: | ||
https://github.com/kubernetes/kubernetes/pull/97058/files#diff-7826f7adbc1996a05ab52e3f5f02429e94b68ce6bce0dc534d1be636154fded3R246-R282 | ||
--> | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but please:
- add a sentence here that this test will be added before graduating the feature to beta in k/k
- add it explicitly to beta graduation criteria
Signed-off-by: Matthias Bertschy <[email protected]>
@wojtek-t added the few nits. Thanks for your patience and kindness, it's not always easy to juggle between work, OSS contributions and family life. |
Sure - thanks a lot for bearing with me and for pushing this work - it's super important and I'm really happy to see a progress there. Given the deadline I'm going to approve the PRR part without waiting for SIG-level approval - hopefully you will figure that our later today. /lgtm For SIG-level approval |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: matthyx, mrunalp, wojtek-t The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |