-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sidecar: Address all concerns raised #1980
Conversation
This was requested by Sergey here: kubernetes#1913 (comment) Signed-off-by: Rodrigo Campos <[email protected]>
As requested by Sergey: kubernetes#1913 (comment) Signed-off-by: Rodrigo Campos <[email protected]>
One decision to revisit was to see if really wanted to change the pod phase. It was clearly agreed here that we won't: kubernetes#1913 (comment) This commit just removes the section to discuss that (it is already agreed) and updates the KEP to reflect that. Signed-off-by: Rodrigo Campos <[email protected]>
This was added due to a comment from Istio long ago[1], but they don't need this anymore[2]. Furthermore, our use cases at Kinvolk also work just fine without this. [1]: kubernetes/community#2148 (comment) [2]: kubernetes#1913 (comment) Signed-off-by: Rodrigo Campos <[email protected]>
Signed-off-by: Rodrigo Campos <[email protected]>
This commit just move some section to caveats with a clarification on why it seems enough to do this on each. Signed-off-by: Rodrigo Campos <[email protected]>
These concerns have been discussed: * "No container type standard". As pointed, this was already discussed and agreed in the past. * "Is this change really worth doing?". It seems it is, given the community response and the problems stated there can only be solved at the Kubernetes layer * "Why this design seems extensible?". Didn't present any major concerns and it does seem extensible. * "Sidecar containers won't be available during initContainers phase". We had a VC with Sergey and Joseph and agreed that it is out of scope for this KEP. Signed-off-by: Rodrigo Campos <[email protected]>
As discussed in SIG-node on 08.09.2020 and suggested by Sergey, we can just document the caveat that it might take 4s longer to kill a pod with sidecar containers. This is very simple, even simpler than the alternatives listed. Signed-off-by: Rodrigo Campos <[email protected]>
Signed-off-by: Rodrigo Campos <[email protected]>
This was suggested by Sergey here: kubernetes#1913 (comment) Sadly, that branch is merged and can't click on commit suggestion now. Credits goes to Sergey anyways :-) Signed-off-by: Rodrigo Campos <[email protected]>
In the previous PR, istio devs commented that some things were not accurate. This commit just updates the text to (hopefully) correctly reflect it now. Removed the paragraph about this removing the need for an initContainer due to comment here: kubernetes#1913 (comment) I thought it was an okay to insert the iptables rules within the sidecar proxy container, but it is not okay as that requires more permissions (capabilities) on the sidecar proxy container which is not considered accetable by Istio devs. Signed-off-by: Rodrigo Campos <[email protected]>
Signed-off-by: Rodrigo Campos <[email protected]>
After the last changes to the KEP, this doesn't apply anymore: the podPhase is not modified. Furthermore, the preStop hooks are sent once but the PoC wasn't updated to reflect that. To avoid confussions, call it out there. Signed-off-by: Rodrigo Campos <[email protected]>
There were two sections with caveats in their name otherwise. Also, this is more similar to the latest KEP template. The diff looks tricky just because the text is being moved 2 sections below. However, git creates a diff showing the unchanged section is moved up instead. Signed-off-by: Rodrigo Campos <[email protected]>
Signed-off-by: Rodrigo Campos <[email protected]>
Oh, I thought the bot will assign to @SergeyKanzhelev and @sjenning too. Sorry, let me try with the cc command now. |
This was requested by Tim some months ago: kubernetes/kubernetes#79649 (comment) I didn't get that it should be removed, I just thought it should not be set by default. But Joseph Irving just clarified that for me: kubernetes#1980 (comment) Signed-off-by: Rodrigo Campos <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
In other words, Istio has an alternative to configure the traffic blockhole | ||
without an initContainer. But the other problems and hacks mentioned remain, | ||
though. | ||
Such rules are usually inserted as an initContainer (trying to run last, to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing that may be important is that no matter what ordering we do, its still broken to some extent. While you may be able to bypass the sidecar, you lose all it gives you. This includes routing rules (may see unexpected traffic), telemetry (no visibility) and typically/ideally, when using a service mesh all pods will enforce mTLS, so you will not be able to communicate with any pods in the mesh.
I know this is out of scope but I just wanted to clarify the workarounds are not sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, good point, thanks! I'll see how to clarify the text to highlight that those are the workarounds in use today, but they are not enough.
But sure, it is out of scope for this KEP :)
I want to capture my latest thoughts on sidecar concepts, and get a path forward. Here is my latest thinking: I think it’s important to ask if the introduction of sidecar containers will actually address an end-user requirement or just shift a problem and further constrain adoption of sidecars themselves by pod authors. To help frame this exercise, I will look at the proposed use of sidecar containers in the service mesh community. User story The proposed solution is the introduction of sidecar containers that change the pod lifecycle:
The issue with the proposed solution meeting the user story is as follows:
The real requirement is that the proxy container MUST stop last even among sidecars if those sidecars require network. Similar to the behavior observed with init containers (users externalize run-once setup from their main application container), the introduction of sidecar containers will result in more elements of the application getting externalized into sidecars, but those elements will still desire to be part of the mesh when they require a network. Hence, we are just shifting, and not solving the problem. Given the above gaps, I feel we are not actually solving a primary requirement that would drive improved adoption of a service mesh (ensure all traffic is mTLS from my pod) to meet auditing. Alternative proposal:
A few other notes based on Red Hat's experience with service mesh: Red Hat does not support injection of privileged sidecar containers and will always require CNI approach. In this flow, the CNI runs, multus runs, iptables are setup, and then init containers start. The iptables rules are setup, but no proxy is running, so init containers lose connectivity. Users are unhappy that init containers are not participating in the mesh. Users should not have to sacrifice usage of an init container (or any aspect of the pod lifecycle) to fulfill auditor requirements. The API should be flexible enough to support graceful introduction in the right level of a intra pod life-cycle graph transparent to the user. Proposed next steps:
/cc @thockin @smarterclayton @dchen1107 @sjenning @liggitt given past involvement in this space. |
Agree! We might as well tackle this general problem vs. doing it step by step with baggage added along the way. |
I agree @derekwaynecarr I think that in order to satisfy fully the use cases mentioned, we are gravitating toward systemd level semantics where there is just an ordered graph of You could basically collapse init containers into the normal containers map and add two fields to This is flexible enough to accommodate a Admittedly this would be quite the undertaking and there is API compatibility to consider. |
I have also been thinking about this. There are a number of open issues, feature-requests, etc that all circle around the topic of pod and container lifecycle. I've been a vocal opponent of complex API here, but it's clear that what we have is inadequate. When we consider init-container x sidecar-container, it is clear we will inevitably eventually need an init-sidecar. Some (non-exhaustive) of the other related topics:
We need to make it possible for admission controls to inject sidecars without knowing about each other, but we also need to be careful how much we throw on end-users or cluster-admins. The solution is very unclear to me. We need to handle multiple uncoordinated sidecars that demand to be "first". We need to compose with loose coupling. |
This is a big enough topic that we almost certainly need to explore multiple avenues before we can have confidence in any one. Assigning to myself, but only so I can track email better :) |
the dependency idea also would allow for doing an init container, then a sidecar network plugin, then more init containers, etc, which has some nice features. Also the readyness checks and oneshot could all play together with the dependencies so the next steps aren't started before ready. So, as a user experience, I think that api might be very nice. Implementation wise there are probably lots of edge cases to carefully consider there. |
@derekwaynecarr this is great idea to set up a working group to move it forward in bigger scope. One topic I suggest we cover early on in the discussions is whether we need to address the existing pain point of injecting sidecars in jobs in 1.20. This KEP intentionally limited the scope to just this - formalizing what people are already trying to do today with workarounds. From Google side we also would love the bigger scope of a problem be addressed, but hope to address some immediate pain points early if possible. Either in current scope or slightly bigger. |
@SergeyKanzhelev I would speculate that the dominant consumer of the job scenario is a job that required participation in a mesh to complete its task, and since I don't see much point in solving for the mesh use case (which I view as the primary motivator for defining side car semantics) for only one workload type, I would rather ensure a pattern that solves the problem in light of our common experience across mesh and k8s communities. |
@derekwaynecarr to be honest, I'm doing this and for our use case that is not a problem. Our use cases for jobs are not just about service mesh, but for example containers that uploads files after the processing finished. For pods that run indefinitely, we have like 7-8 sidecars and only one is a service mesh and we don't need them during initContainers phase. We might want that in the future, though. But is not the main use case for us now. This is why I think solving the problems for sidecar containers as we know them today (start after initContainers) has merit: solves race conditions on start/shutdown and allows jobs to use them. And as we talked on slack, if we decide to add other types later, this "type: sidecar" won't create a maintenance burden as it has a 1-1 matching with several semantics (like "DependsOn". If we add that, containers type sidecar just have a "DependsOn" containers type standard). As I said, I'm fine working on any alternative. I don't think that service mesh is the only use case, in fact we don't need it and I'd rather have this in 1.20 than wait a long time for an extended scope, given that it doesn't present downsides if pod startup becomes a directed graph later. But I'll be happy to work on any alternative we agree on :-) I'll send an email tomorrow (9pm here now :)) to coordinate a meeting, as we talked on slack: https://kubernetes.slack.com/archives/C0BP8PW9G/p1600966907009200. Thanks again! |
@rata we can discuss for sure. we each have unique use cases and needs. my personal opinion is incrementally designing an ordered init flow is going to cause future pain (like we now feel with init containers) versus just designing for flexible user-defined ordering. I think @sjenning proposal is worth contrasting to see if the cost for that approach is any different. Even if we do what was proposed here, we know the mesh community will not be satisfied, and so we will have another set of tweaks to just iterate to a generic outcome. Curious if others feel different. |
Thanks! I update here too, as some assigned to get emails here :-) I've just sent an email to have a meeting and coordinate how to proceed (here is the copy to the mail, I sent it to kubernetes-dev, sig-node and sig-arch): |
@derekwaynecarr please feel free to close or merge this PR. I think merging is better because the KEP in master now has lot of open questions and this PR and for me is simpler to continue from the document after this PR (some things are cleaned-up, the sections modified in this PR are adapted to the new KEP format, etc.) as some things will need to be re-done if we don't merge. Thanks again! |
I think we should merge this but add a note saying this KEP has been rejected, it sounds like any way forward would involve a complete re-write, so instead we could just create new KEPs that reference this as a rejected alternative. |
@rata @Joseph-Irving i agree merging the incremental updates makes sense for future reference. /lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: derekwaynecarr, rata The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This was requested by Tim some months ago: kubernetes/kubernetes#79649 (comment) I didn't get that it should be removed, I just thought it should not be set by default. But Joseph Irving just clarified that for me: kubernetes#1980 (comment) Signed-off-by: Rodrigo Campos <[email protected]>
Hi!
As we talked on the last SIG-node, here it goes a PR solving all the concerns with the proposed suggestion previously outlined (in some cases something simpler, as suggested in the last SIG-node call too :)).
The complete diff looks big and is hard to follow (I thought about using smaller PRs, but got many git conflicts). I tried to split on different logical changes so it can be easily reviewed. If that way of reviewing is not of your preference, another way to see the final result easily is also using the "show this file" github option. If you want me to split the changes in different PRs anyways, just let me know and I can maybe split this in more than one PR to open secuentially :)
Thanks again for your time and patience :)
cc @Joseph-Irving