Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Allowing Probes to sidecar containers #11634

Closed
wants to merge 2 commits into from

Conversation

salaboy
Copy link
Member

@salaboy salaboy commented Jul 7, 2021

Fixes #8949

This PR is work in progress
I am waiting for my environment to be ready to be able to run e2e tests

Proposed Changes

  • removing validation for Probes in sidecar containers
  • checking that probes are maintained in the sidecar containers when they are set

Release Note

This change in the validation of Knative Revisions allows sidecar containers now to define liveness and readiness probes that will be performed by the kubelet. 

@google-cla google-cla bot added the cla: yes Indicates the PR's author has signed the CLA. label Jul 7, 2021
@knative-prow-robot knative-prow-robot added the area/API API objects and controllers label Jul 7, 2021
@knative-prow-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: salaboy
To complete the pull request process, please assign julz after the PR has been reviewed.
You can assign the PR to them by writing /assign @julz in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@knative-prow-robot knative-prow-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jul 7, 2021
@knative-prow-robot
Copy link
Contributor

Welcome @salaboy! It looks like this is your first PR to knative/serving 🎉

@knative-prow-robot knative-prow-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jul 7, 2021
@knative-prow-robot
Copy link
Contributor

Hi @salaboy. Thanks for your PR.

I'm waiting for a knative member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@markusthoemmes
Copy link
Contributor

/ok-to-test

@knative-prow-robot knative-prow-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jul 7, 2021
@markusthoemmes markusthoemmes changed the title Allowing Probes to sidecar containers [do not merge] [WIP] Allowing Probes to sidecar containers Jul 7, 2021
@knative-prow-robot knative-prow-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 7, 2021
@markusthoemmes
Copy link
Contributor

@salaboy changed the PR title to start with [WIP], which will actively block merging.

@salaboy
Copy link
Member Author

salaboy commented Jul 7, 2021

@markusthoemmes thanks!

@codecov
Copy link

codecov bot commented Jul 7, 2021

Codecov Report

Merging #11634 (f9f79ae) into main (98a184b) will decrease coverage by 0.04%.
The diff coverage is 100.00%.

❗ Current head f9f79ae differs from pull request most recent head f0625f8. Consider uploading reports for the commit f0625f8 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##             main   #11634      +/-   ##
==========================================
- Coverage   87.87%   87.83%   -0.05%     
==========================================
  Files         190      190              
  Lines        9255     9245      -10     
==========================================
- Hits         8133     8120      -13     
- Misses        874      875       +1     
- Partials      248      250       +2     
Impacted Files Coverage Δ
pkg/apis/serving/k8s_validation.go 97.92% <ø> (-0.04%) ⬇️
pkg/reconciler/nscert/nscert.go 71.60% <100.00%> (-1.34%) ⬇️
pkg/reconciler/configuration/configuration.go 84.61% <0.00%> (-2.31%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6459485...f0625f8. Read the comment docs.

@salaboy
Copy link
Member Author

salaboy commented Jul 7, 2021

/retest

@salaboy
Copy link
Member Author

salaboy commented Jul 8, 2021

@dprotaso @evankanderson I've created a simple PR with a test showing that the probes are maintained if the validation is removed from the validation web hook. The question still remains, do we want the sidecar containers probes to be performed by the kubelet?

I feel that I am missing context here to make a decision on that. In my head it makes some sense to only manage the "serving" container. But at the same time I do realise that I don't fully understand the entire lifecycle of KServices, which might be requiring the queue proxy container to run the probes for sidecars as well.

@dprotaso
Copy link
Member

dprotaso commented Jul 8, 2021

cc @julz who was poking at the probing earlier this year

The question still remains, do we want the sidecar containers probes to be performed by the kubelet? ... which might be requiring the queue proxy container to run the probes for sidecars as well.

There were a few reasons to have the queue proxy do the probing

  1. We can perform sub-second probing to become 'Ready' faster - there's now a KEP trying to introduce this upstream
  2. Faster readiness in 'mesh mode' - Istio IP tables tweaking would break the kubelet probing our container during startup time

I don't think we have a test harness to exercise these cases - I believe @markusthoemmes was poking at cold start latency that might have some insights.

The other factor is the activator does it's own probing of the queue proxy - so it might forward a request that has a 'ready' serving container but not a 'ready' sidecar. This sorta leads me to think we do need the queue proxy to signal readiness on behalf of both containers (via it's ready endpoint).

Since multi-container is alpha now I'm ok (if others are in agreement as well) to merge this change and we can address what I mentioned above in a follow up issue.

@julz
Copy link
Member

julz commented Jul 8, 2021

So a couple things make me nervous about this: our current semantics around readiness probes are.. not ideal. It's not clear whether we'd expect these probes to follow the same (very arguably wrong) behaviour, or whether these should wait till we fix that behaviour. See #10764 and #10765 - it'd e.g. be odd if probes were ignored for the main container when periodSeconds > 0, but not for sidecars, but extending this weird behaviour to apply to sidecars too also seems unfortunate.

We'd also probably want to extend our existing readiness tests to cover multi-container. I've been actively working on improving our coverage in these tests recently, which has historically been quite sparse (and actually we're already blocking a PR on finishing improving these tests..). I think we'd want to finish adding reasonable coverage for readiness before changing too much more about that path.

For these reasons I'd lean towards suggesting we nail down what behaviour we think one readiness probe should have, and make sure we have coverage for it, before we land support for multiple probes.

On the other hand I think I'd personally be a lot more relaxed about liveness probes, which I think have less odd/unclear semantics, and which we do less messing with as part of the QP optimisation, so it may make sense to start with only those.

Since multi-container is alpha

I think they're technically beta aren't they?

@salaboy
Copy link
Member Author

salaboy commented Jul 9, 2021

@julz @dprotaso thanks for all the background on this, if we are all ok with Liveness probes in sidecar containers for now, I can make the changes to remove the validation and add more tests to make sure that the probes are maintained for side car containers.

I totally agree with what @julz said about clearly defining readiness probes first and then applying the same for sidecar containers. @julz thanks a lot for all that context with the links. I will take a look at the conformance tests inside Serving to understand more about how probes are tested.

I've created this PR to start the discussion, but I am happy also if you folks want to close it.

@julz
Copy link
Member

julz commented Jul 12, 2021

Sounds good @salaboy - I think if you wanted to pick up the readiness side of this again after I finish adding coverage for readiness probes in general to the test suite that would be great. In the meantime I think loosening the liveness probe validation part seems totally reasonable.

@knative-prow-robot
Copy link
Contributor

@salaboy: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@knative-prow-robot knative-prow-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 24, 2021
@dprotaso
Copy link
Member

I'm going to close this out for now - feel free to reopen when you start working on it again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/API API objects and controllers cla: yes Indicates the PR's author has signed the CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow sidecars to specify probe port
5 participants