Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-753: add PRR questionnaire answers for beta #4255

Merged
merged 2 commits into from
Oct 5, 2023

Conversation

matthyx
Copy link
Contributor

@matthyx matthyx commented Oct 1, 2023

  • One-line PR description: add PRR questionnaire answers for beta
  • Other comments:

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Oct 1, 2023
@k8s-ci-robot k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 1, 2023
#### Beta

- Implement proper termination ordering.
- Provide defaults for `restartPolicy` field on init containers, `nil` is not
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that's true, why did we use a pointer?

Copy link
Contributor Author

@matthyx matthyx Oct 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this has to merge for beta:
kubernetes/kubernetes#120176

edit: I was wrong, sorry

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the PR is not relevant to this.

We may have something like inheritPodRestartPolicyby default?
I'd say nil is OK to indicate it is inherited from the pod's restartPolicy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do that, initContainers will become sidecars for all Pods created by ReplicaSets.
We should default to Never WDYT?

Copy link
Member

@gjkim42 gjkim42 Oct 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on the pod's restartPolicy.

If the pod has restartPolicy of 'Never', its regular init containers will never restart.

If the pod has restartPolicy of 'OnFailure', or 'Always', its regular init containers will restart only on failure.

Not sure how to name it. How about just call it 'Default' and document the detail...? (just like the 'Default' of PodDNSPolicy)

@SergeyKanzhelev WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, so it basically means "I'm not a sidecar". Hmm, I would still prefer nil then...

keps/sig-node/753-sidecar-containers/README.md Outdated Show resolved Hide resolved
and validate the declared limits?
-->

No.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't sidecars consume these resources?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you take into account that before this KEP sidecars were already running as "regular" containers in the Pod, I don't consider them as being added.
Also requests and limits are respected.

@bart0sh
Copy link
Contributor

bart0sh commented Oct 1, 2023

/cc @SergeyKanzhelev

You can take a look at one potential example of such test in:
https://github.com/kubernetes/kubernetes/pull/97058/files#diff-7826f7adbc1996a05ab52e3f5f02429e94b68ce6bce0dc534d1be636154fded3R246-R282
-->

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please link to tests [in the upgrade/downgrade testing you only point to tets that you would like to create, I would like to see those tests]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Friendly ping

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apparently we don't have one... I'm going to push for it now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it fine if we add it to the code after the PRR merges but inside the k/k beta PR ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but please:

  • add a sentence here that this test will be added before graduating the feature to beta in k/k
  • add it explicitly to beta graduation criteria

###### What specific metrics should inform a rollback?

<!--
What signals should users be paying attention to when the feature is young
that might indicate a serious problem?
-->

Pods that don't feature sidecars are not affected by the KEP.

Pods with sidecars might take a long time to exit and exceed the TGPS, a new
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be generally moved to the above question.

This question is about metrics. Can you please think through metrics that we can monitor?

@@ -1238,12 +1504,16 @@ Longer term, we may want to require automated upgrade/rollback tests, but we
are missing a bunch of machinery and tooling and can't do that now.
-->

TBD
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if not running the test now, can you please describe the test that you will run here?
#3658 is a nice example

- [ ] Other (treat as last resort)
- Details:
- [X] Other (treat as last resort)
- Details: `kubectl describe pod <pod-name>` will show the new field `.spec.initContainers[i].restartPolicy` for the sidecar containers.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That doesn't mean the feature is working - that means that I wanted it to work.

What we probably want to check is if the init container with restartPolicy set is running together with the regular containers.

@@ -1446,8 +1769,12 @@ For each of them, fill in the following information by copying the below templat
- Testing: Are there any tests for failure mode? If not, describe why.
-->

None.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is not something I can easily believe.
Even if we didn't observe, there are definitely failure modes that we should mention..

@SergeyKanzhelev

Copy link
Member

@wojtek-t wojtek-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few more comments.

I would also like to see some resolution on #4183 before merging this one...

to reject such Pods when the feature gate is disabled to keep Downgrade safe.

**Note**, For the control plane and kubelet we will implement logic to reject Pods
with sidecar containers when feature gate got turned off.
**Note**, We have implemented logic for the control plane and kubelet to reject Pods
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for being pedantic here, but can you link to that here?

We were going back and forth on that for Alpha release and what happened was actually not what we initially agreed on, so I would like to ensure that this time [which is much more important given Beta will be on by default] we actually do that right

xref: https://github.com/kubernetes/enhancements/pull/3968/files#diff-4db34302ceb08799a81f2a75eaccc05ee8c29524d34b3a4b02f8c09dcf26247bR1103

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@gjkim42 gjkim42 Oct 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

You can take a look at one potential example of such test in:
https://github.com/kubernetes/kubernetes/pull/97058/files#diff-7826f7adbc1996a05ab52e3f5f02429e94b68ce6bce0dc534d1be636154fded3R246-R282
-->

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Friendly ping

@@ -1297,18 +1604,28 @@ These goals will help you determine what you need to measure (SLIs) in the next
question.
-->

- sidecar init containers are running and restarted as expected
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SLOs are effectively metrics+thresholds

Can you please update this section and formulate it more in the context of SLIs below?

Copy link
Member

@wojtek-t wojtek-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two small comment - other than that it LGTM (though it will still also require SIG level approval).

with sidecar containers when feature gate got turned off.
**Note**, We have implemented logic for the kubelet to reject Pods
with sidecar containers when feature gate is turned off. For the control plane
we simply ignore the new field to maintain Pod scheduling.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's be more explicit what exactly is happening as mentioned by @gjkim42

For the control plane - kube-apiserver is dropping the field (if it wasn't set before) and kube-scheduler is keeping pods with the field set unschedulable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: linking the code you pasted in this comment thread:
#4255 (comment)

would be extremely helpful here too

You can take a look at one potential example of such test in:
https://github.com/kubernetes/kubernetes/pull/97058/files#diff-7826f7adbc1996a05ab52e3f5f02429e94b68ce6bce0dc534d1be636154fded3R246-R282
-->

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but please:

  • add a sentence here that this test will be added before graduating the feature to beta in k/k
  • add it explicitly to beta graduation criteria

Signed-off-by: Matthias Bertschy <[email protected]>
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 5, 2023
@matthyx
Copy link
Contributor Author

matthyx commented Oct 5, 2023

@wojtek-t added the few nits. Thanks for your patience and kindness, it's not always easy to juggle between work, OSS contributions and family life.

@wojtek-t
Copy link
Member

wojtek-t commented Oct 5, 2023

@wojtek-t added the few nits. Thanks for your patience and kindness, it's not always easy to juggle between work, OSS contributions and family life.

Sure - thanks a lot for bearing with me and for pushing this work - it's super important and I'm really happy to see a progress there.

Given the deadline I'm going to approve the PRR part without waiting for SIG-level approval - hopefully you will figure that our later today.

/lgtm
/approve PRR

For SIG-level approval
/assign @dchen1107 @mrunalp @SergeyKanzhelev

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 5, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: matthyx, mrunalp, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 5, 2023
@k8s-ci-robot k8s-ci-robot merged commit 8d8dae2 into kubernetes:master Oct 5, 2023
4 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.29 milestone Oct 5, 2023
@Priyankasaggu11929 Priyankasaggu11929 mentioned this pull request Oct 6, 2023
11 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/node Categorizes an issue or PR as relevant to SIG Node. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
Development

Successfully merging this pull request may close these issues.

9 participants