-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sidecar: Add design callouts and alternatives discussed in the past 2 years #1913
Conversation
Ups, forgot to cc derek to the bot too. /cc @derekwaynecarr |
|
||
An example in the open source world about this is the [Istio CNI | ||
plugin][istio-cni-plugin]. This was created as an alternative for the | ||
initContainer hack that the service mesh needs to do. The initContainer will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The init container is not just a hack to avoid the startup problem, but also a permissions issue. Even if we could guarantee the service mesh starts first, we would still retain the init container, to avoid running as root.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see. The capabilities for NET_ADMIN
and NET_RAW
are added to the initContainer only, so the sidecar proxy can run just without them. Thanks for the clarification, will update the text
alternative requires that nodes have the CNI plugin installed, effectively | ||
coupling the service mesh app with the infrastructure. | ||
|
||
This KEP removes the need for a service mesh to use either an initContainer or a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not true, see above comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for pointing it out.
I think it still makes sense to discuss the other callouts in the PR, so will update this part later to reflect what you pointed out. Thanks! :)
* Recommend users to delay starting their apps by using a script to wait for | ||
the service mesh to be ready. The goal of a service mesh to augment the app | ||
functionality without modifying it, that goal is lost in this case. | ||
* To guarantee that traffic goes via the services mesh, an initContainer is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My view of the problem is close but slightly different
- If we set up iptables first (before pod starts with istio-cni, for example, or as the first init container), all init containers will fail as traffic is blackholed since the sidecar is not running
- If we set up iptables as last init container, then previous init containers will not have traffic blackholed but will also not use the mesh. In many clusters, this means they cannot connect to any services, which require mTLS, which will no longer happen
Neither of these are addressed by the KEP. A third option is potentially running the iptables in the actual sidecar, but this is not feasible due to permissions.
Note - I would be very (very!) happy with the changes in the KEP. But I don't want to confuse that this solves all service mesh startup ordering issues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, yes, none of these are addressed (sorry, I wasn't aware that was an important problem). It is a non-goal currently to run initContainer with sidecars containers.
The third option you mention, besides capabilities needed for the redirect, will not run concurrent with initContainers, so traffic is either blackholed or not, but initContainers traffic is not going through the istio sidecar either.
The question is, then, do we want to have initContainers with sidecar containers in-scope or not? What do you think @howardjohn ?
The trade-off, from my POV, is:
- Having it in-scope for this will solve/improve that problem (making this KEP way more complicated to move forward)
vs - Having it out of scope will not solve that problem but will make advances with the rest
IMHO, it makes sense to move forward with this proposal (i.e. not solve the sidecars running concurrently with initContainers
) but worries me a little that, if we later want to solve the initContainers with sidecar problem, the API might end up messy.
If we implement this KEP and in the future want to extend this in some way to address it, we can technically do it using other container types
. For example, type: daemon
(probably a better name :-D) can be used to mean that (those containers starts before initContainers too). Or type: OrderedPhase
with an Order: <int>
and, let's say, using Order: -1
can mean "before initContainers". However, it starts to get messy if you have containers in the Containers
array that start before initContainers.
So, that seems like a messy thing to me. If we want to go the initContainer+sidecars route, we might want to have a new sidecarContainers
array that start before initContainers too. That was discussed in the past (only the array, I think, not starting before initContainers IIRC), it is mentioned in the alternatives section and why it was discarded, though.
At the same time, I'm not sure there are many sidecar types that might want to run concurrent with initContainers. Are there any other use case? Maybe if you want to collect metrics/logs from the the initContainers using a sidecar? Not sure there are many users for this, though. Although, maybe it doesn't matter/hurt to most of sidecars. What do others think?
Taking a step back, a question from someone out of the service mesh world: does the service mesh really wants to be part of the pod, if the life-cycle it needs is so different? I guess probably yes for latency and to easily transparently redirect traffic that might not be possible otherwise. But just making sure there are no other solutions and we really want to think about this problem when the service mesh lives in the pod :)
Trying to be pragmatic, we can move to alpha stage without "initContainers + sidecars" (i.e. as it is proposed now) and gather feedback from users. We can gather feedback to see if it is worth doing a sidecarContainers
array (or any other alternative that seems best to address the feedback)?
What do others think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we implemented the KEP as is I would be very happy, it would improve a lot of issues for a lot of people. I would be even more happy if we also fixed the init container issue, but not at the expense of not fixing the startup/shutdown. no need to let perfect be the enemy of good.
It would probably make sense to future proof this so it can be done, but not worry about it right now. Having a new type
seems feasible to me.
As far as other use cases - all service meshes require this, as far as I know, and logging seems like something you would want for init containers as well. Honestly most sidecars are feasible to run with the init containers, for all the same reasons you want them to run with the main container. For example, some people have SQL proxies - same is needed if you use SQL in init container.
Taking a step back, a question from someone out of the service mesh world: does the service mesh really wants to be part of the pod, if the life-cycle it needs is so different
I don't think we need it to run in the pod, but there are no other existing alternatives that would work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commenting from other thread on
It was nice to have, but not many real use cases nor a big deal.
This claim is basically saying init containers that do networking is not a real use case, which isn't true in my opinion. There are very clear use cases for doing so, otherwise init containers would not likely exist 🙂. I think its more accurate to say that the people you talked to happen to not have users complain about them much, but I can't imagine that generalizes well to the broader Kubernetes community
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I agree that perfect is the enemy of good, I'm open to opinions whether sidecars make more sense before initContainers or after.
But yeah, IMHO, the best solution might be not so "theoretical": implement either one of them and gather user feedback :-)
@derekwaynecarr What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SergeyKanzhelev the biggest change is semantics and how we want to reflect that. But the whole KEP needs to be changed, IMHO, if we want that instead.
Furthermore, those are not sidecars as we know them today, so we need to define how these "new" kind of sidecars behave a little bit. For example, some things we may need to decide from the top of my head:
- Will we use the current
containers
array for that? In that case, theinitContainers
array will start before the containers not typesidecar
only (they are meant to start before any other container in the pod, the nameinitContainers
might become confugsing. We might be able to clarify with docs). That might be unexpected, so we might want to consider using asidecarContainers
array to avoid confusions. Not really sure, something to sleep on it. - Today sidecar containers (as a concept, not the type as it is not repesnt) start after
initContainers
have run. If someone is using theinitContainers
to initialize something for sidecar containers (totally legit), another way to provide that functionality might be needed. AnotherinitSidecarContainers
thingy or something else? This can become weird - What to do if a sidecar crashes while some initContainers are running? Just restarts, I guess? Then initContainers don't have the guarantee of a sidecar running either (as it can crash)
Thinking about this, I have mixed feelings, but I think I lean towards improving sidecars as we know them: sidecars that start after initContainers
and use the alpha phase to gather feedback and revisit that. If we see that, for example, both things make sense, we might want to add a different type
that starts before initContainers.
What do you think?
But I'm very open to thoughts :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry for delayed response. Somehow all notifications broke for me. It's great to have a ping at a meeting.
I though about these concerns. I understand the main concern is that sidecar containers that has a dependency on init containers would not work as expected when one simply adds a new type to the container? The assumption that everything will just work once conatiner marked as sidecar, may not be true. What if existing sidecar container has some synchronization logic that checks for payload to start and keeps it in "not started" state. This will break the entire pod initialization.
The same time both important scenarios - logs and mesh would need an init containers support. If we will conclude that the way to support init containers is to introduce a similar type field on a container, why wouldn't we do it now. Sidecars which needs to work before init containers will just work. Sidecars which needs to work after init container may wait for the first readiness probe and treat it as an indication that initialization was complete.
Same as @howardjohn I believe that this KEP is great in addressing many issues. I just want to make sure there is no quick win by simply changing startup order for sidecars to run before init. And figuring out details like what to do when sidecars crashed during the initialization.
Perhaps there may be a section in the doc saying why we do not want to run sidecars before init.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry for delayed response. Somehow all notifications broke for me. It's great to have a ping at a meeting.
Ups, thanks! :)
I though about these concerns. I understand the main concern is that sidecar containers that has a dependency on init containers would not work as expected when one simply adds a new type to the container? The assumption that
Yes, exactly, that is one concern I raised.
everything will just work once conatiner marked as sidecar, may not be true. What if existing sidecar container has some synchronization logic that checks for payload to start and keeps it in "not started" state. This will break the entire pod initialization.
Not sure if we are saying the same here. That is exactly my point, if the sidecar has a dependency on an initContainer doing something before it starts, it will be completely broken.
Or am I misunderstanding you?
The same time both important scenarios - logs and mesh would need an init containers support. If we will conclude that the way to support init containers is to introduce a similar type field on a container, why wouldn't we do it now.
I think both options can work: we can do it now or we can have an alpha without initContainers, see if what we did effectively solves the problems that we think it solves or needs something else. After that, try to expand to initContainers
and just focus on the problems of doing that, as we know what we have mostly works and the problems of extending it to that are quite different.
I lean towards to keep the current scope (more on why later in this comment, this really doesn't seem like a quick win as you say). We can try in the next few days if we figure out a way of extending it so we have sidecars + initContianer running concurrently, but if we can't find something obvious that will work in the next few days I'd fall back to the current scope.
What do you think?
Sidecars which needs to work before init containers will just work.
No, they won't on all cases. This is why it is not trivial and I'd rather focus on one problem at a time, and gather feedback.
Istio is one example, they mention in this thread that the service mesh sidecar should be running before initContainers but that itself needs some initialization first. They said:
A third option is potentially running the iptables in the actual sidecar, but this is not feasible due to permissions.
(To add some more context: the problem they refer to is the permissions needed in the container to run the iptables rules. They don't want to have those permissions on the sidecar pod, and instead have them in the initContainer or CNI plugin.)
They either need the CNI plugin to work in this case (for the iptables thing) or an initContainer that runs before sidecars, to create the iptables rules, then start the sidecars and then start the regular initContainer
(as they want the sidecar service mesh to be running for other initContainers) and then continue with the initialization.
But yeah, Istio has a way out with the CNI plugin, as they only want some iptables rules. That is, of course, not the case of all sidecars AFAIK (like populating volumes, etc.). Any sidecar relying on an initContainer won't work if started before initContainers, if they also want to run during initContainer phase (as a service mesh does).
Some hacks can be done, like auto-inject the initContainer to be run first. But that has also several problems. Because then several applications want to be "the first" and there is a race injecting there. The analogous is alread happening to inject the last initContainers. Here is an example of such a bug report (it is also in some part of the text this PR adds :)): linkerd/linkerd2#4758 (comment).
So, those workarounds don't seem reliable and will create issues.
Maybe it can be a limitation: if your sidecar runs concurrently during other initContainers
run, then you can't have initContainers
. 🤔
Sidecars which needs to work after init container may wait for the first readiness probe and treat it as an indication that initialization was complete.
Not sure what you mean exactly with this, can you please elaborate? Wait for which readiness? How that would be explicit in the pod spec?
I just want to make sure there is no quick win by simply changing startup order for sidecars to run before init
IMHO, it doesn't seem like a quick win, but a path to explore that will take time. I think people will be opinionated, with very valid reasons, about having containers in the containers
array starting before initContainers
vs creating a new array (sidecarContainers
), etc.
The more we talk about it, the more I think it will be just easier if we keep the current scope for the alpha phase (sidecars don't run during initContainers), see if the mechanism we add for that are effective for those use cases and then focus on extending this to initContainers (we can make it a show-stopper for beta graduation if that makes anyone more comfortable).
If extending this for sidecar+initContainer is difficult, it might be easier once we gather feedback and benefit users that don't need that shortly. After all, all sidecar containers today just run after initContainers (it is not possible otherwise), they will benefit. And we can just focus on the problem of extending it afterwards and how it is best solved, after having solved the "easier" problem (that is not so easy either, IMHO) and have a better understanding and feedback.
If extending it is easy, then we either figure it out in the next days (and we add it into the scope) or we add it later, it is easy :)
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Istio is one example, they mention in this thread that the service mesh sidecar should be running before initContainers but that itself needs some initialization first.
It would actually work fine I think. The sidecar does not need iptables set up. Our only requirement is the iptables must be set up before the app container. So:
- init, then app/sidecar in parallel is ok (but not ideal - hence the kep)
- sidecar, then init, then app is ok
- sidecar, then app, then init is not ok
I can see hypothetical scenarios that do require init containers to run first though, like if we use init containers to provision certificates. It could be worked around though, for example have the sidecar get ready without the certs and have the init container poll to ensure it doesn't exit until the sidecar has picked up the certs. Note this is still a much better scenario because it involves the sidecar/init container to make a change (2 images, likely owned by Istio/Linkerd/) rather than every single application container.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rata I think we are on the same page understanding pros and cons of init and sidecars running in parallel. I think we prioritize different things while weighting options. So my questions is mostly to understand your priorities and understand the complexity.
The way I see it - sidecar containers are something (as you mentioned earlier) that almost like "outside the pod" feature that represent an ambient state of a pod. Something that always exists and does not block termination. So developer should not think of synchronizing with it. And sidecar owner has guarantees that all app logic will be executed when sidecar is active. And I see init and regular containers as a developer's created app logic.
If developer wants to create it's own sidecar that is an app logic (like queue management or config synchronization), and it depends on some Init containers, this will be a little bit more involved as it will require a bit of synchronization with Init containers. I don't see the need for this synchronization implementation as a big deal. Today sidecars like this has are probably implementing even more logic.
As for semantic - we can call this KEP "daemon" or "ambient" containers instead of "sidecar" if this will make semantic clearer.
Finally for the implementation complexity - are you talking about kubelet logic complexity or complexity for end user?
I also pinged you on Slack in case you want to discuss it in-person before writing more text =).
This proposal aims to: | ||
* Allow Kubernetes Jobs to have sidecar containers that run continuously | ||
without any coupling between containers in the pod. | ||
* Allow pods to start a subset of containers first and, only when those are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this is starting all of the contains at once right? Its not ordered like init containers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, exactly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know if you think I should clarify that :)
the `containers` array. | ||
|
||
For most users we talked about (some big companies and linkerd, will try to | ||
contact istio soon) this doesn't seem like a problem. But wanted to make this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment above on how this impacts Istio (and I suspect linkerd - I am surprised by this sentence).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I had a call with some Linkerd devs and my understanding was that it wasn't very important to them. It was nice to have, but not many real use cases nor a big deal. For other companies using a Kubernetes fork with sidecars I talked to, they just don't need the sidecars to run with initContainers either. That functionality is not present on their fork.
But let's continue the initContainers + sidecars conversation there :-)
> ... | ||
> The Istio CNI plugin performs the Istio mesh pod traffic redirection in the Kubernetes pod lifecycle’s network setup phase, thereby removing the requirement for the NET_ADMIN and NET_RAW capabilities for users deploying pods into the Istio mesh. The Istio CNI plugin replaces the functionality provided by the istio-init container. | ||
|
||
In other words, when using the CNI plugin it seems that InitContainer don't use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its worse - the init container cannot do any networking at all
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, didn't know that drawback from the CNI plugin. Thanks!
1. Run preStop hooks for non-sidecars | ||
1. Send SIGTERM for non-sidecars | ||
1. Run preStop hooks for sidecars | ||
1. Send SIGTERM for sidecars |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@howardjohn Another question, if it is not too much trouble: will this alternative work for you guys?
To make it short: the current KEP proposes to run preStop hooks for sidecars as the first step in the shutdown sequence, to let service mesh drain traffic. This PR proposes (the reasons are explained ^) to modify that behavior and have a TerminationHook
that is run first. It will be the same, just under a new field called TerminationHook
.
IIUC, this is needed for some service mesh (I'm working with a client with a in-house service mesh that needs this, for example) and I think Istio needs it, but wanted to confirm.
Does Istio needs this and would it work for you guys?
PS: Linkerd, for example, doesn't need this as they use kubernetes services and pods in terminating state are removed from the endpoints
objects (except for services with externalTrafficPolicy: Local
, but that is addressed in another KEP), so that alone drains the service mesh connections. Not 100% sure in which case Istio currently is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we have a need to run any prestop hooks. I am not sure why we would start draining traffic. Isn't that what the KEP is fixing? The application shuts down, at which point all connections are closed. Then we can just terminate the sidecar immediately.
We certainly need this draining today but I don't think we do if this is implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you need to de-register the pod from service discovery or something, when pod is about to terminate, so the app can just finish in-flight request and stop, without receiving new incoming requests via the service mesh?
On shutdown just having:
- SIGTERM non-sidecars containers (using preStop hooks and all as expected)
- SIGTERM sidecars containers (using preStop hooks and all as expected)
will work for istio?
I mean, the current KEP says:
PreStop Hooks will be sent to sidecars before containers are terminated. This will be useful in scenarios such as when your sidecar is a proxy so that it knows to no longer accept inbound requests but can continue to allow outbound ones until the the primary containers have shut down.
Istio doesn't need that at all? (either being called preStop hooks or TerminationHook)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without Istio when pod termination is triggered:
- Pod starts to terminate
- Pod is removed from Endpoint
- Endpoint update is sent to kube-proxy, stops sending traffic to this pod
- Pod fully shuts down
Hopefully in that order, but not guaranteed!! There is no guarantee of zero downtime here. See https://blog.sebastian-daschner.com/entries/zero-downtime-updates-kubernetes for more detailed explaination.
With Istio when pod termination is triggered:
- Pod starts to terminate
- Pod is removed from Endpoint
- Endpoint update is sent to istio-proxy, stops sending traffic to this pod
- Pod fully shuts down
You'll notice this is really the same.
A valid solution to making the race conditions mentioned in the blog is adding a preStop hook - it doesn't have to be the sidecars preStop hook though. Any of:
- Telling users to add preStop hook if they want this behavior (this doesn't violate the transparency goal of service mesh, as the same issue is present without service mesh)
- Add the preStop hook to our sidecar
- Add the preStop hook to the users container (we control injection, we can modify whatever we want).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed answer!
@Joseph-Irving do you remember why that part (running preStop hook twice) was added to the KEP?
I mean, I know use cases where that is needed for the very same reason the KEP says, but thought Istio/Linkerd would need it too, but it doesn't seem to be the case. I'm okay keeping it, though, I know use cases for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rata to be clear, I'm not saying that TerminationHook
wouldn't be useful, I suspect it would be. It just isn't clear that it would be absolutely necessary for Istio, assuming the sidecar is guaranteed to be up and ready for the duration of the application container.
Effectively, I'm echoing @howardjohn's "valid solution", which just uses preStop
hooks. Specifically, the option where we inject a preStop
hook into the application container, which effectively puts the sidecar in lameduck mode. This would basically just be a curl command.
If TerminationHook
were available, however, it would be cleaner WRT injection since we could handle lameducking in the sidecar's TerminationHook
rather than mucking about with the application container.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nmittler Cool, thanks a lot! A curl command seems like a trade-off worth considering. Thanks again for the input :-)
For other reviewers, here goes a tl; dr: The TerminationHook
field is a proposal to remove calling two times the preStop hooks, due to the problems that doing that creates (see this very same section the comment is on for additional context). Given that implementation-wise is quite similar to what we already have in the code, I think keeping it can be simple too.
Just let us know if you think it will be simpler to remove it for now (either from an implementation POV or from a KEP design POV) so we settle on one or the other.
The main reason to add this 2 years ago was istio, but there are today some other in-house service mesh apps I'm working with that must have this to work properly. Although Istio now is fine without it, having it seems better too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The concept of a Termination hook may be broadly useful so perhaps it makes sense to remove it from this KEP and develop it independently? It has no dependency on this proposal. It would also simplify this KEP if we removed the pre-stop behaviour and just focus on startup/shutdown ordering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 ... that's really the thrust of much of my discussion above. I think a separate KEP makes a lot of sense here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with either of the choices (keep it or leave it in this KEP) as long as we move forward with this KEP :-D. I'd like to know what @derekwaynecarr or others from SIG-node think?
Removing this from the KEP effectively removes a step from the shutdown sequence and that changes how we might want to split the time in the shutdown sequence. For example, this affects other callouts like this one). So, if this needs to be added as a different KEP, it will change the time for the shutdown sequence for sidecar containers as described in this KEP. IMHO, we can remove it for now but if we want to add it later, we may want to add it to this KEP.
|
||
Rodrigo will reach out to Istio devs to see if the situation changed since 2018. | ||
|
||
[istio-bug-report]: https://github.com/kubernetes/kubernetes/issues/65502 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@howardjohn one last question to address this ^ section, if it is not too much to ask.
Do you still need to use, as the bug says, some scripts for other applications in the pod to wait for the service mesh to be ready? Or was the situation changed in these 2 years?
If it changed a lot and you can share your current pain points and which ones this KEP effectively address, it would be great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @nmittler that opened the bug, maybe? :)
##### Suggestion | ||
|
||
Confirm with users that is okay to not have sidecars during the initContainers | ||
phase and they don't foresee any reason to add them in the near future. | ||
|
||
It seems likely that is not a problem and this is a win for them, as they can | ||
remove most of the hacks. However, it seems worth investigating if the CNI | ||
plugin is a viable alternative for most service mesh and, in that case, how much | ||
they will benefit from this sidecar KEP. | ||
|
||
It seems likely that they will benefit for two reason: (a) this might | ||
be simpler or model better what service mesh need during startup, (b) they still | ||
need to solve the problem on shutdown, where the service mesh needs to drain | ||
connections first and be running until others terminate. These are needed for | ||
graceful shutdown and allowing other containers to use the network on shutdown, | ||
respectively. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got a little confused by this section. The CNI plugin is only being used just for setting up the iptables forwarding. The main goal of this KEP (IMO), i.e. setting the proper startup and shutdown ordering among sidecars and regular containers, remains a critical need regardless of whether the forwarding rules are set through CNI or through an initContainer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, sorry, it was a misunderstanding from my side. As it was explained in this comment, will update the text regarding this :)
##### Alternative 1: Add a per container fatalToPod field | ||
|
||
One option is to add a `fatalToPod` bool field _per container_. This will mean | ||
that if the given container crashes, that is fatal to the pod so the pod is | ||
killed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds like a useful feature, no just for sidecars, that can be tackled separately 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but if we don't add it now, you can create jobs that will be stalled there forever. Kubernetes won't have any built-in mechanism to prevent them from collecting those jobs.
I mentioned that alternative too.
Any reasons to prefer to have "stalled pods"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't prefer stalled pods; just thinking on keeping the KEP as focused as possible. I wouldn't mind if stalled pods were a known inconvenient edge case say during alpha. And if the fatalToPod
type has broader applicability than just sidecars IMHO it makes more sense to implement it separately.
|
||
Pods with sidecar containers only change the behaviour of the startup and | ||
shutdown sequence of a pod: sidecar containers are started before non-sidecars | ||
and stopped after non-sidecars. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd emphasize here that they forcefully stopped when all other containers finished. E.g.:
and stopped after non-sidecars. | |
and stopped after non-sidecars. Sidecar containers are also terminated when no non-sidecar containers finished. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, thanks! Will amend to current patches, if that is okay to you :)
* Reimplement an init-system alike semantics for pod containers | ||
startup/shutdown | ||
* Allow sidecar containers to run concurrently with initContainers | ||
|
||
Allowing multiple containers to run at once during the init phase - this could be solved using the same principal but can be implemented separately. //TODO write up how we could solve the init problem with this proposal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a duplication of an item above Allow sidecar containers to run concurrently with initContainers
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I just left it as it had a "TODO" note. I can remove it as part of this PR, if you think it is better. Let me know :)
Aim to use alternative 1, while in parallel we collect more feedback from the | ||
community. | ||
|
||
It will be nice to do nothing (alternative 2), but probably the first step |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but probably the first step
It can always be added after the feedback. So we can go with do nothing and add exactly what is required later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. If all agree on that, LGTM :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was added by request of istio devs, though. Let's see what they say here: #1913 (comment)
And then decide :)
can we move this under a sig-node directory? implementation of this feels largely oriented to sig-node domain. |
@derekwaynecarr done! :) |
nit: @rata good way to rename files is to use |
I did that: |
Reviewed-by: Joseph-Irving <[email protected]> Signed-off-by: Rodrigo Campos <[email protected]>
As suggested by derekwaynecarr here: kubernetes#1874 (comment) Reviewed-by: Joseph-Irving <[email protected]> Signed-off-by: Rodrigo Campos <[email protected]>
This was amended by Rodrigo Campos to use 80 columns and address Joseph suggestions, while also adding some minor tweaks.
bd6e981
to
354fad5
Compare
@SergeyKanzhelev @derekwaynecarr can you pelase lgtm and approve again? Merge was not possible due to a silly conflict (this PR #1939 was merged and somehow caused a conflict), so I rebased without changes. Thanks again for the reviews! :) |
/lgtm thank you @rata! |
This was requested by Sergey here: kubernetes#1913 (comment) Signed-off-by: Rodrigo Campos <[email protected]>
As requested by Sergey: kubernetes#1913 (comment) Signed-off-by: Rodrigo Campos <[email protected]>
One decision to revisit was to see if really wanted to change the pod phase. It was clearly agreed here that we won't: kubernetes#1913 (comment) This commit just removes the section to discuss that (it is already agreed) and updates the KEP to reflect that. Signed-off-by: Rodrigo Campos <[email protected]>
This was added due to a comment from Istio long ago[1], but they don't need this anymore[2]. Furthermore, our use cases at Kinvolk also work just fine without this. [1]: kubernetes/community#2148 (comment) [2]: kubernetes#1913 (comment) Signed-off-by: Rodrigo Campos <[email protected]>
This was requested by Sergey here: kubernetes#1913 (comment) Signed-off-by: Rodrigo Campos <[email protected]>
As requested by Sergey: kubernetes#1913 (comment) Signed-off-by: Rodrigo Campos <[email protected]>
One decision to revisit was to see if really wanted to change the pod phase. It was clearly agreed here that we won't: kubernetes#1913 (comment) This commit just removes the section to discuss that (it is already agreed) and updates the KEP to reflect that. Signed-off-by: Rodrigo Campos <[email protected]>
This was added due to a comment from Istio long ago[1], but they don't need this anymore[2]. Furthermore, our use cases at Kinvolk also work just fine without this. [1]: kubernetes/community#2148 (comment) [2]: kubernetes#1913 (comment) Signed-off-by: Rodrigo Campos <[email protected]>
This was suggested by Sergey here: kubernetes#1913 (comment) Sadly, that branch is merged and can't click on commit suggestion now. Credits goes to Sergey anyways :-) Signed-off-by: Rodrigo Campos <[email protected]>
In the previous PR, istio devs commented that some things were not accurate. This commit just updates the text to (hopefully) correctly reflect it now. Removed the paragraph about this removing the need for an initContainer due to comment here: kubernetes#1913 (comment) I thought it was an okay to insert the iptables rules within the sidecar proxy container, but it is not okay as that requires more permissions (capabilities) on the sidecar proxy container which is not considered accetable by Istio devs. Signed-off-by: Rodrigo Campos <[email protected]>
This was requested by Sergey here: kubernetes#1913 (comment) Signed-off-by: Rodrigo Campos <[email protected]>
As requested by Sergey: kubernetes#1913 (comment) Signed-off-by: Rodrigo Campos <[email protected]>
One decision to revisit was to see if really wanted to change the pod phase. It was clearly agreed here that we won't: kubernetes#1913 (comment) This commit just removes the section to discuss that (it is already agreed) and updates the KEP to reflect that. Signed-off-by: Rodrigo Campos <[email protected]>
This was added due to a comment from Istio long ago[1], but they don't need this anymore[2]. Furthermore, our use cases at Kinvolk also work just fine without this. [1]: kubernetes/community#2148 (comment) [2]: kubernetes#1913 (comment) Signed-off-by: Rodrigo Campos <[email protected]>
This was suggested by Sergey here: kubernetes#1913 (comment) Sadly, that branch is merged and can't click on commit suggestion now. Credits goes to Sergey anyways :-) Signed-off-by: Rodrigo Campos <[email protected]>
In the previous PR, istio devs commented that some things were not accurate. This commit just updates the text to (hopefully) correctly reflect it now. Removed the paragraph about this removing the need for an initContainer due to comment here: kubernetes#1913 (comment) I thought it was an okay to insert the iptables rules within the sidecar proxy container, but it is not okay as that requires more permissions (capabilities) on the sidecar proxy container which is not considered accetable by Istio devs. Signed-off-by: Rodrigo Campos <[email protected]>
Hi!
This PR adds a "Proposal decisions to discuss" section to discuss some design issues/callouts from the KEP, as discussed on SIG-node meetings. It also adds the "Alternatives" section with the alternatives considered in the past 2+ years this KEP is open.
The motivation, goals, etc. sections are also improved to clarify the KEP scope and motivation. As this PR is quite long, however, I didn't add more sections that will make it longer. Let me know if you think it is worth adding some specific section (as part of this PR).
My idea of the "Proposal decisions to discuss" section was to use it only to discuss and agree alternatives, and completely remove it afterwards (updating the KEP accordingly). We can capture what is relevant from the decision making process in the alternatives section also. This is why, in part, this section is long: I tried to raise all the concerns I could find, to make sure we are all at the same page.
The "Alternatives" section part discussing the other design alternatives was purely done by @Joseph-Irving and @mrbobbytables . The rest of the commits were kindly reviewed by @Joseph-Irving too :)
Please let me know if I can help in any way. Thanks for your time and help so far! :)
Tagging @derekwaynecarr as talked on SIG-node
/cc @sjenning @SergeyKanzhelev