-
Notifications
You must be signed in to change notification settings - Fork 487
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extensible Service policy and configuration #611
Comments
(ported from a Slack comment) We had a lot of similar problems with HTTPProxy, Contour's own CRD, which we tried solving in a similar way, but we found it very quickly grew unwieldy because of the interactions between different policy options. The best example for HTTPProxy was client request timeout. In HTTPProxy's API, we allow setting this to the special value "infinity", which means that Envoy will disable this timeout. However, when running a central ingress proxy, some organisations don't want to allow people to do this. We experimented with a few ways to do this in the API, and eventually just recommended that, if you need that level of granularity, you use Gatekeeper as an admission controller, and we provide example PolicyTemplates and Policies. That lets an infra admin put strong bounds around settings, on a per-field level, that give early UX feedback (objects that violate the policy are never accepted). It also lets administrators who need even greater flexibility build that for themselves (if you want to allow a single namespace to set client request timeout to infinity, that's very possible with Gatekeeper). That's not to say that we shouldn't do what you're saying in the doc (I've taken a quick read, will go over again momentarily), just that we should consider the UX and portability of the API as well. I agree that the API will never succeed without a relatively standard way for implementations to do their own custom things in many places. But we need to make sure that we have a process for promoting features from ImplementationSpecific -> Extended -> Core, and what that would mean for various conformance profiles (once we have them). In particular, I think that settings like Minimum TLS version are great candidates for a relatively rapid advancement through the levels. Without a strong focus on promoting common things out of ImplementationSpecific CRDs, then it will be very difficult to get the portabiility gains we're shooting for with this API. |
In the early days of I'm wondering if the policies should be first applied to GatewayClasses giving you that set of "defaults" that can be defined? |
Thanks @stevesloka and @youngnick - on one hand I'm glad that we independently came to similar solutions, on the other hand, I'm bummed to hear that you also found no "slam dunk" proposal. I agree that some imperfect way of extended the API with custom fields is likely better than none, or else implementers will always continue to support their proprietary APIs for the advanced functionality
I agree. If there isn't forward motion of these fields into extended and core then it shows that Gateway API is not evolving. I think there a couple things that we can do to promote this conveyer belt:
Are there any proposals that you can share that we can learn from?
Technically ServicePolicies don't prevent Gatekeeper from being used in this manner. For example, ServicePolicies may be freely configureable at the GatewayClass and Gateway levels, but at the HTTPRoute and Service levels Gatekeeper could be used to totally disallow certain fields or certain values for immediate feedback whenever a ServicePolicy is referenced.
Good question, I originally didn't think of ServicePolicies for GatewayClasses, but I do think it makes perfect sense. A ServicePolicy referenced by a GatewayClass is effectively a default that appears intrinsic to the GatewayClass, but ServicePolicies would give us a nice API for making that more formalized and configurable. I think it would be a good way of making default clear without even requiring documentation to know what the defaults are (just look at your GatewayClass). Check out #583 (comment) for an example. One of the reasons that I proposed
|
Sorry to take a while to come back to this one @mark-church.
I like those rules, but I think there are some interactions with what parts of the API an implementation will support (which may come under conformance profiles), that will make them more complicated than we have here, I suspect.
The best one is actually the request timeout. We allowed the setting of the default to whatever the cluster admin wanted, using the global config file. However, we also allowed a per-Route override, which put this back in the hands of the application developers. In the case of the request timeout, there can be conflicting usecases, where the cluster admin doesn't want to allow So, we needed to give the cluster admin a set of global configuration options to set the maximum and minimum allowed request timeouts. Meaning that we now had to keep track of the following request timeout properties:
Whenever we needed to do anything with request timeout, we would need to think about all four values. And that's for one timeout! Envoy has > 10 timeouts, many of which would need similar treatement. On top of all that, the code that was required to validate the timeouts ends up looking very similar to the Rego code that Gatekeeper uses anyway, and the UX is bad because you don't get immediate feedback about if the timeout value you supplied is okay (as an app developer, you have to apply your object, and then check if it's been accepted with a followup Once we moved to Gatekeeper instead, we got the following:
The cost is, of course, that advanced users now have to run another thing to get bounds checking for Contour values. We believed that the overhead of Gatekeeper should be less because, if you have the need for bounds checking on individual Ingress objects, you probably have the need for per-field checking on other objects, so you should be running something like it anyway. Overall though, I can see how the idea hangs together, and it's quite neat. But I worry about how we would be able to keep things portable here, and how we would define the API itself. For example, you have the It looks like the entries under there are some sort of I guess what I'm saying is that, it seems like a huge risk to the portability goals that are at the heart of the Gateway APIs. I understand that having a way to specify common properties without needing to create fields in the upstream API is great, but at the same time we need to ensure there is a little pressure to make your fields standard too. |
Thanks for all the great discussion in this thread @youngnick, @stevesloka, and @mark-church! I've got a few follow ups + a new doc for further discussion.
Agree that this is a very real concern. As we've shown with things like timeout though, there are at least some concepts that likely can't be portable. As much as I'd like to pull as much as possible to into the API spec, I don't think we should if <50% of implementations will be able to support a concept. That means we need some kind of standard way to extend the API. With Ingress, that was unspecified and resulted in confusing and inconsistent extension mechanisms. In my mind we need to:
I hope if there are any concepts that are only subtly different we'll be able to pull them into a shared policy resource that is included in this API spec (or include the concepts directly on Route or Gateway resources). With that said, I think it's reasonable to expect that each implementation will have some unique config that others will not understand.
I agree that we need to be very careful here, but I'm also concerned about the lack of guidance/standards here. I think implementation specific policy extensions are going to happen regardlessly. If we don't provide a clear pattern for how that could work, we'll end up with the kind of fragmented/inconsistent extensions that ingress has. With all that said, I've been working on an updated proposal for how we can attach policy to resources. I've been trying to find the right balance between all of the concepts/hesitation above with the need to have some kind of consistent pattern here. So here's another take on how we could approach policy attachment. Although I tried out a few different approaches (also covered in the doc) I ended up with something that was largely similar to merging how BackendPolicy is attached with the Service Policy proposal @mark-church linked to start this issue. |
This proposal draws inspiration from our current TLS design and offers a uniform method of attaching arbitrary and implementation-specific configuration to the Gateway API core resources.
It aims to achieve three different goals that I think are very important for a standard API spec. This is based on the assumptions that 1) every implementation will have proprietary functionality that's also unique and 2) we can't predict what this functionality is or where in the load balancing hierarchy it should be configured. As a result, to make Gateway a standard that can actually replace proprietary APIs, it must have a robust way of accomplishing the following three things:
This is just a proposal so I'm happy to take any suggestions on what should change or how we can make it better. cc @bowei @thockin @robscott @hbagdi @youngnick @danehans @howardjohn
The text was updated successfully, but these errors were encountered: