-
Notifications
You must be signed in to change notification settings - Fork 487
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Service-level configuration #196
Comments
/remove kind/feature |
cc @bowei |
Thanks @hbagdi this is a great write up. |
I think I've missed a few things so we will have to get feedback and add more to the list. I've intentionally not included TLS-level settings and AppProtocol to the list because I'm not sure if they are service-level or not. Some of those are but not all. |
cc @danehans |
/assign |
/cc @jpeach @youngnick |
Discussion from office hours: Common use case where different teams (e.g. cluster op vs app author), cluster op wants to "fix" up application settings due to regulatory, resource issues:
It is likely that we can't do 100% of these use cases (use Open Policy Agent) but we should keep this use case in mind. |
A couple of questions/comments:
|
@yiyangy, great question. I've seen this being done at both-levels in different setups. I defined this on the service-level because that's what I've seen more widely adopted. We can certainly revisit. @youngnick mentioned in one of the meetings about how users use Gateway or proxies to fix up things. Meaning, if services haven't implemented retries or timeouts correctly, or if all services have different behavior around this, gateway is used to fix things up. |
Re. Timeout and similar: https://istio.io/docs/reference/config/networking/destination-rule/ |
Composing by name seems like a very clean and simple solution - on 'server' namespace they would match the Service name, on client namespace they would be based on the qualified service name ( svc.namespace.svc ) |
My notes from the 5/28/20 meeting:
|
One thing to note, I think there are two distinct but related topics here. All of these examples are based on client and server having an Envoy sidecar since I am familiar with this model but should generally apply. First is where the config is actually applied. This can be
The second is where the config comes from. For "server-side only", the only reasonable location for the config is alongside the server (in Service or a related resource - certainly in the same namespace). However for client settings its reasonable to be able to configure defaults in the server namespace, but overrides in the client namespace. |
Hello all, |
I threw together a couple of ideas here: https://docs.google.com/document/d/1Kz2X7zKfaSGW9YTlzqFeFuTCj5uDxdB9D363nMkB6xk/edit# |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
@hbagdi |
@hbagdi This issue has become pretty large in scope, would it be possible to split this up into smaller issues so it is easier to track progress on each part? For a bit more context, I'm trying to build out a list of what we might want to accomplish in the next API release. Ideally we'd be able to link each list item to a unique GitHub issue, likely still with this umbrella issue. Still trying to figure out the best way to structure/organize all this. |
Update:
Please let me know if something was missed here. |
@hbagdi: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Current Status
We are collecting issues and use cases that should be associated with core.Service.
Once we have a good understanding of problems, we will start discussing potential solutions, ultimately ending in a KEP. Please feel free to add your comments to this thread and we will try to incorporate those here.
Summary
There are service-level configuration properties that cannot be specified anywhere in k8s or service-apis resources currently.
The community seems to have a consensus that
core.Service
is an overloaded resource with features concerning different areas and the more fields should not be added to further bloat the Service abstraction.While these features are laid out with Ingress or Gateway in mind, some of these configurations are also valid in other areas such as clients of a Service running within the cluster perimeter or outside, Service Mesh deployments, etc.
Issues
Load balancing
Some examples of load balancing features are (not exhaustive):
core.Service
has some aspects of load balancing using fields likeSessionAffinity
, andExternalTrafficPolicy
. The existing fields are (intentionally?) not exhaustive but also don’t provide any extension mechanisms.Related issues:
Traffic management
Service upgrades are performed in a gradual way to avoid surprises. Canarying and Mirroring traffic are some of the most common ways this is performed. Ingress and mesh vendors provide such features using CRDs. Service-apis project has also seen a lot of interest in this area. While this spans multiple core.Service resources, it might be worth to explore how an end-user should think of Services and endpoints when it comes to upgrades.
Related issues:
Health checking
While kubernetes performs health checking of pods that are associated with a Service, it is common for proxies and Load Balancer to perform health checking on their part as well.
There have been some implementations which derive the health checking info from the liveliness or readiness probes but those seem incorrect as the reasons/concerns for kubelet and the proxy are different.
Feature requests:
Related issues:
L4 details
Some examples of such properties (not exhaustive):
Most of these properties have an overlap with other sections as these generic networking properties.
Related issues/PRs:
Existing workarounds
Currently, there is no good place to put these configurations anywhere.
This has led to various projects adding such properties into networking.Ingress like resources.
Such duplication cause:
Misc notes
Global default
While per-service configuration is required, cluster operators would like to configure a sane global default, which should be used in absence of the any configuration.
Client vs Server
Definitions:
core.Service
abstraction in k8s. Server and Service is used interchangeably in this section.Configurations such as timeouts are properties of the Server and client both. A client when attempting to connect to a server, will specify a timeout. On the other hand, the server wants to protect itself from too many idle connections and will have a timeout on connections from its side.
In the context of this issue, we are referring to client configurations and not Server.
That begs the question: Why then associate such configuration to a Service and not a client controlled resource such as HTTPRoute?
I(@hbagdi) think this is because we want to define the configuration in the context of a Service and want clients to use it whenever they are communicating with the Service. There is no way to enforce that clients will use these configurations but if a client is communicating with a Service, they should follow the above defined properties if they expect certain SLAs to be met (thereby making such properties part of the contract between the client and the server/service). This could be thought of how a client should use a published port if it wishes to communicate with a Service.
Note: When considering a proxy or gateway, there are connections on two sides, one between the user and the gateway and the other between the gateway and the Service. This issue discusses the latter.
Extensibility
While standardization of above features would provide a good experience for end-users, extensibility is important. Areas like Load balancing and L4 properties can vary widely between implementations.
If a solution lacks hooks for extensions, we risk devising a solution that only works for a small population of our users and #existing-workarounds will continue to exist. The goal should not be to eliminate such workarounds but minimize them as much as possible.
kube-proxy
We need to explore where does kube-proxy fit into this. It seems to be another internal-gateway or client of the service. There are some aspects of the above that kube-proxy does implement. Should it continue to implement those? Should those be deprecated instead? Should the scope of kube-proxy be limited to not further complicate it?
The text was updated successfully, but these errors were encountered: