Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GEP-1058 Add support for Route Inclusion and Delegation #1085

Closed
wants to merge 16 commits into from

Conversation

Jeff-Apple
Copy link
Contributor

What type of PR is this?
/kind gep

What this PR does / why we need it:
This is the formal GEP-10058 document

NONE

@k8s-ci-robot k8s-ci-robot added kind/gep PRs related to Gateway Enhancement Proposal(GEP) cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 4, 2022
@k8s-ci-robot
Copy link
Contributor

Hi @Jeff-Apple. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Apr 4, 2022
@robscott
Copy link
Member

robscott commented Apr 4, 2022

Thanks @Jeff-Apple!

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 4, 2022
@robscott
Copy link
Member

robscott commented Apr 4, 2022

Looks like this test failure was an issue caused by my webhook integration, hopefully it's just a flake.

/retest

@youngnick
Copy link
Contributor

I haven't got to it today, but I will open a PR back to your branch @Jeff-Apple, with my changes to the status section tomorrow.

Copy link
Member

@robscott robscott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Jeff-Apple! This is a well written and massive GEP. I took a first pass, but still need to think about some of the details more here. At a high level, I think we should be trying to start with a lot of guardrails around route inclusion to simplify the implementation wherever possible. For example, maybe Child Routes can only have Parent Routes as Parents, not Gateways. Maybe Child Routes can't specify Hostnames? This is a very complex topic, so wherever we have an opportunity to simplify the scope, I think we should consider it.

The second use case is allowing the owner of a route to delegate responsibility
for portions of the route configuration.

This proposal is specifically focused on httpRoutes but creates a design
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny Nit: We usually use "HTTPRoute" throughout docs.

throughput this document and the example configurations are based on it.

**Example Scenario 1:**
![Example Scenario 1](images/1058-example-scenario.png "Example Scenario 1")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great diagram!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/s/throughput/throughout

site-src/geps/gep-1058.md Outdated Show resolved Hide resolved
site-src/geps/gep-1058.md Outdated Show resolved Hide resolved
site-src/geps/gep-1058.md Outdated Show resolved Hide resolved
site-src/geps/gep-1058.md Outdated Show resolved Hide resolved
that model is extended to allow `hostname` to be specified at the listener,
parent route **and** child route levels. Specifying `hostname` at any
combination of these levels is allowed but the `hostnames` are always combined
in the sequence: listener, parent route, child route.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an important use case? If not, would it simplify implementation if we said that child routes could not specify hostnames?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would simplify things, yes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, 3 way wildcard match is too much

Comment on lines +423 to +425
6. Path Matching: If `HTTPPathMatch` is configured in both the parent and child
`Routes`, allow **only** substring prefix matching on path, without any
wildcard or RegEx matching,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think path matching will always be configured because our default path match is prefix + "/". A lot of these limitations seem like things we could enforce with webhook validation. Maybe we should have a section covering what additions we'll need to webhook validation.

site-src/geps/gep-1058.md Outdated Show resolved Hide resolved
- [Request Filtering Between Gateways and Namespaced Routes · Issue #634 · kubernetes-sigs/gateway-api (github.com)](https://github.com/kubernetes-sigs/gateway-api/issues/634)
- [Request Filtering Between Gateways and Namespaced Routes - Google Docs](https://docs.google.com/document/d/1-0mgRRAY784OgGQ1_LCOshpLLbeAtIr4eXd0YVYK4RY/edit#heading=h.8cfxzle5tmqb)
- [Clarify how RouteGateways would work if we supported Route->Route delegation · Issue #411 · kubernetes-sigs/gateway-api (github.com)](https://github.com/kubernetes-sigs/gateway-api/issues/411)
- [Route inclusion/delegation security and policy implications · Issue #1042 · kubernetes-sigs/gateway-api (github.com)](https://github.com/kubernetes-sigs/gateway-api/issues/1042)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we likely need some more coverage in this GEP about how policy attachment interacts with route inclusion. Unfortunately I'm not sure what the answer is yet, but it does add another dimension of complexity here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think that there's absolutely an action here to do some definition about how policy attachment interacts with route inclusion.

Copy link
Contributor

@mikemorris mikemorris Aug 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be relevant for GAMMA too, e.g. thinking about how a service owner defining timeout policy on a mesh HTTPRoute for a service interacts with timeout defaults or override set by a gateway administrator. I think it might actually be relatively straightforward, just an additional level deep but with the same default/override pattern as already established?

@bowei
Copy link
Contributor

bowei commented Apr 26, 2022

/assign bowei

@robscott robscott added this to the v0.6.0 milestone May 23, 2022

The remainder of this document covers the proposed enhancements to the Gateway API.

## Patent/Child Route Relationships
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parent/Child

Copy link
Contributor

@howardjohn howardjohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ways this technique can be used. In the first case, traffic needs to be routed
to "Core Services" (`core-svces-ns`) from two different starting points. Using
route inclusion, the routing configuration for the Core Services are put in a
child route and then that child route is included in two sections of a parent
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

matches is already a list that can easily list /store and /api with minimal duplication. This seems like a complex solution to a non-problem

Another way route inclusion is used in Scenario 1 is by configuring a
HTTPRouteFilter, that inserts a header, in the parent route. This results in the
filter being added to each RouteRule in the child route `store-route`. The child
route is both shorter and easier to read than it would have been without this
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would disagree that having logic spread amongst multiple resources is easier. Shorter is not better if we just spread it throughout the cluster. Especially putting two resources in different namespaces just so each one is a bit shorter.

All of these types of issues can be solved by higher level config tools like helm templates.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I realize that the proposal could stand alone even without this section since this is only part of the use case. However, this is pretty important I think because I feel this is pushing anti-patterns and complexity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The important part of having the resources be in separate namespaces is not that they're shorter, it's that they can be owned by different people, and those people can coordinate without having to coordinate edits of a single resource.

In Contour's implementation of inclusion, we had the requirement that the owner of the root HTTPProxy (equivalent in this proposal to the Parent HTTPRoute) had to specify the included resources by name, and "the owner of the parent resource has to make edits for each new thing" was explicitly called out by users as a pain point: see projectcontour/contour#2206 for a lot more detail.

The point I'm trying to make here is that the people who want to break config up in this way need it desperately, so remove manual human action from the config management loop.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The important part of having the resources be in separate namespaces is not that they're shorter, it's that they can be owned by different people, and those people can coordinate without having to coordinate edits of a single resource.

I agree. There are different personas that delegation is used for.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@youngnick I 100% agree with that use case and I think is a good motivation for this GEP. I think that is "Partial Administrative Delegation of Routing" though - you are delegating to different roles. This section ("Compose routes from reusable config fragments") seems to be suggesting it should be used as just a way to DRY config within the same ownership which I don't agree is a good idea.

The design may make this usage possible (I think we would have to go out of our way to stop it, if we wanted, by not allowing same-namespace delegation, which seems odd), but I don't think we should list it as a use case or encourage it.

Totally possible I am missing more context on why this is useful though.

Copy link

@costinm costinm Jul 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very important - 'different personas' also means different roles and privileges. The Gateway admin has control over the host DNS and certificates, and can decide which namespaces to trust and for what hostname and path prefix, and can set policies for the host.

Maintaining security and separation of roles should be a clear requirement for any delegation proposal. My concern with the complexity of this proposal is in large part because it makes the security properties hard to understand - which will lead to CVEs. The 'functionally equivalent with 2 separate gateways' model preserves isolation and can be
implemented as 2 separate deployments - the gateway API allows an implementation to either combine or separate.

`RouteRules` in the parent route, the binding between the two routes is still a
single transaction.

The binding transaction has two stages; reconciliation and activation. The
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit confused by this.. it sounds like its describing an implementation. Can we rephrase it as a spec?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the idea is describe a way of thinking about how the reconciliation and binding process works, but I can see how this reads like implementation guidance. I think we should update this to make that clearer.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An implementation may choose to use a controller to combine the routes and generated 'flat' routes - instead of implementing this at 'runtime'. Or may use tools in CI/CD to combine the routes and deploy flat routes.

I agree a spec should not require or describe a particular implementation.

### Failed Binding

The binding transaction MUST fail if there is A non-recoverable error during the
binding process. A failed binding should result in no changes to the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So say I have parent route v1 and child route v1. This is all valid config ("active Ephemeral").

Now, I change the child route to 'v2'. Its now invalid. Per the spec, I am supposed to use the old "active Ephemeral".

Do I understand right? if so, this seems un-implementable without a stateful implementation storing prior state outside of k8s?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what we've done elsewhere in the spec is probably relevant here - we should probably suggest that in the event of a conflict we detach/break immediately. We can still leave rooms for stateful implementations to provide a more graceful fallback if possible, but I don't think that's broadly implementable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the necessity of storing state makes being nice to users who make mistakes extremely tricky, sadly.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is another thread on 'roles' and permissions: I don't think a bad 'child' route in a delegated namespace should ever break or invalidate a 'parent' route, no matter when the parent is added.

The first level of routes should be validated independently and be applied first, even if it conflicts with any child route.

I would also require that child route validation and application is isolated at namespace boundary - I still don't know how delegation a /prefix to multiple namespaces works, but an error in one namespace should not impact a different one or prevent it from adding routes.


**A key principle guiding the design of how parent and child routes are
combined, is that the data plane can apply the resulting route settings in a
single processing cycle (per direction) of the network traffic.**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a requirement that the parent and child routes can statically (in control plane) be reduced into a single route?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what the ephemeral route stuff is talking about, I think. So yes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so. Might be nice to list it explicitly. It was a bit unclear to me since 'ephemeral route' is describing an implementation. This may be overly pedantic though

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would extend this to: the requirement should not conflict with an implementation that reduces routes statically in an external CI/CD system, allowing the feature to work with Gateway implementations that opt out from implementing this in the control plane.

That means the entire 'ephemeral' stuff should not be part of the spec, and the rules for resolving conflict should
be deterministic and stateless.

This idea was to allow `backendRef` to, optionally, specify a route instead of a
service. Some of the reasons for not taking this approach include:

- It would require changes to `backendRefs`. No changes to `allowedRoutes` are
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree. backendRef can refer to any type, including route. Quite the opposite - this proposal requires changes to all route types to add allowedRoutes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another way to look at this would be that route delegation may only make sense for a subset of routes, maybe just HTTPRoute to start. I personally don't like using backendRef for this concept because another route is not really a backend, it's just additional routing configuration.

Maybe more importantly, the separation of concerns here seems like a critical part of the design. I want to be able to say that I trust Route admins in the foo namespace to handle routing for anything under /foo. I shouldn't have to care about what the specific Route(s) are called in that namespace.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with the last point - and we should match whatever GAMMA does.

For example, if we go with 'attach routes to Services' - then the backend remains a Service ( no change ), and we can delegate to the HttpRoute attached to the service in the child namespace.

If we add a new CRD to represent a virtual service - this can be the backendRef. If mesh would treat this virtual service with attached routes as a backend - so should the gateway spec.

Service and whatever resource GAMMA defines will be named entities - you shouldn't care how the HttpRoute is named, but the name of the Service or virtual Service is reflected in DNS and exposed to users.

A nice side-effect is that the 'child' routes could be directly used from the mesh - don't know if we want to allow this, see my comment on interactions between gateways and mesh.


- It would require changes to `backendRefs`. No changes to `allowedRoutes` are
needed to use it for this.
- To minimize the changes to `backendRef`, it would require the owner of the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily - we could allow abstractions such as just selecting the namespace and leaving name blank, which would be functionally equivalent to allowNamespaces.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, or we can have the backendRef match whatever we define in GAMMA.

Leaving it blank is a bit problematic because in a namespace you may have children of multiple gateways. For example /admin may be delegated by foo.com and bar.com. I don't think requiring hostname in child routes is ideal - and also not sufficient, I think it is possible for 2 different gateways in 2 namespaces to define 'foo.com' with different IPs - and still delegate.

needed to use it for this.
- To minimize the changes to `backendRef`, it would require the owner of the
parent route to know the exact name of the child route. If the two routes are
in different namespaces, that might require cross-organization coordination
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already needed though, just in one direction.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's much more reasonable to require a child to know details of the parent resource than the other way around. As an owner of a parent route, I don't want to have to worry about what Route(s) are named in namespace foo, just that I trust Route owners in that namespace to handle requests under /foo.

Copy link

@costinm costinm Jul 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think an important use case is to allow helm charts to be written and independent of cluster config.

For example in servlets web.xml routes are independent of hostname ( and even context prefix ). This is a very useful property and allows a .war to be deployed in any server without changes.

Not sure if 'parent' knowing the naming details in the child namespace is true - the BackendRef has a name and namespace, and the name is quite significant and required and must be the exact Service name. Why would delegating to a route be different ? A user may already implement delegation by creating a Gateway in the child namespace, with an associated Service - and have the parent use a route to the gateway service. While I understand
some disagree with this model - it is possible and useful in many cases, and unlikely to go away ( since it allows
strong isolation and policies like WASM or envoy filters in the child). In this model the parent must use the exact name of the service in the child.

parent route to know the exact name of the child route. If the two routes are
in different namespaces, that might require cross-organization coordination
between administrators.
- It would result in there being one way to attach routes to listeners and a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe from an implementation perspective. From a user perspective a route can be seen as a backend -- its logically going through both of the routes, they just happen to be merged together as part of the implementation details (of some implementations! some may not actually do this at all).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other thing I don't think we mentioned here is that I think that the backendRef implies an extra routing hop, which I don't think will be common. This is a way to break up the config, not a way to represent extra routing hops.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would backendRef 'imply' an extra routing hop ? It allows implementations to use an extra hop, if strong isolation between parent/child is required - or to flatten the route.

Technically Service is not a real backend and does require CNI to route requests to a Pod, and the spec allows any CRD
to be used as backend - there is no mention of 'hops' in the spec.

...
- name: store
...
allowedRoutes:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems extremely complex for what we are doing. First, I would assume the majority of use cases are to a single namespace -- maybe even go as far as to say we should only allow a single namespace, as allowing multiple is problematic from an ownership perspective. We also don't need namespaces nesting, since kinds is not relevant here since we only allow same-kind.

What that leaves us with is the need to refer to a single namespace. This means everything else is boilerplate, really. We could consider simplifying this entire to something like routeNamespace: store-ns even.

I do prefer using backendRef anyways ,but if we are going with this approach I think it could be dramatically simplified

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah as I've been thinking about this more it seems simpler and safer to limit this capability to a single namespace and replace this with a single string instead of the struct. I think I'd originally argued for consistency with Gateway -> Route, but that's very much overkill here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with only allowing a single namespace is that it screws over migrations. If an app is moving to a new namespace, there will need to be a tightly coordinated sequence of edits to specific objects to ensure that config isn't lost.

In order to address that, we need more than one namespace available, and once we have more than one, it's easier to include a label selector than use a list of named namespaces (we've talked about this before).

In the interest of avoiding boilerplate, we could conceivably add a singular AllowedNamespace field that's just a string name, and a plural AllowedNamespaces field that has the label selector?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For migration, could you do this:

Pre: routeNamespace: before, route defined in before
Transition: routeNamespace: before, route defined in before and the same route defined in after
Switch over: routeNamespace: after, route defined in before and the same route defined in after
Complete: routeNamespace: after, route defined in after

?

It seems mostly the same flow, except with multiple namespaces you have both child routes 'active' at one time and are relying on conflict resolution to ensure only one is picked at time?

I am not confident in ^, LMK if that makes sense...

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's why we have a list of BackendRefs and weights.

To migrate you add the second namespace with weight 0, and gradually shift. The parent has full control, and you can't accidentally have a random namespace using same labels mess up, or complicated conflict resolution between namespaces.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even a single namespace allowed, there will be cases, multiple delegate rules referenced, how can we process it?

In istio, we allow specifying the delegate by namespace/name, there can be only one delegate for a root rule.

Copy link
Contributor

@mikemorris mikemorris Aug 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My original intent in proposing AllowedRoutes was to offer config parity between how Gateways allow routes to bind and how HTTPRoute could similarly allow child routes (as @robscott mentioned above).

With the added same-kind constraint, we can definitely skip at least the first level. (Even if we wanted to consider extensibility for e.g. TCPRoute -> HTTPRoute delegation in a future iteration, I don't think additionally adding AllowedRoutes as a peer at that time would be a breaking change or backwards-incompatible, and this does reduce boilerplate for the simpler case.)

Using the RouteNamespaces struct directly (which we may want to rename to AllowedNamespaces for clarity in the standalone use case?) could support both the single-namespace case, the selector for granular delegation to teams managing multiple namespaces or migrations, as well as global free-for-all for small teams managing everything.

For the simplest case, could we add an additional enum variant "Namespace" to FromNamespaces? Following that, we could add an additional optional field name to RouteNamespaces, with similar constraints as selector:

name string (Optional)
Name must be specified when From is set to “Namespace”. In that case, only Routes in Namespaces matching this Name will be selected by this Gateway (or Route). This field is ignored for other values of “From”.

This would have the benefit of keeping our struct types limited and additionally improving the config UX for simple cases of regular Gateway -> Route ownership delegation.

Even a single namespace allowed, there will be cases, multiple delegate rules referenced, how can we process it?

In istio, we allow specifying the delegate by namespace/name, there can be only one delegate for a root rule.

I think @hzxuzhonghu's point is valid, even for the single-namespace-only case, and we should offer prescriptive guidance/conformance tests for how this should be handled. I think specifying name in a delegation slot leans too much toward requiring coordination across the delegation boundary though, and doesn't offer any meaningful additional constraint (team in delegated namespace could rename any route to match, it's just annoying).

@costinm
Copy link

costinm commented Jul 26, 2022

Not sure policy attachment is well defined - if a policy is attached to a parent, does it apply to child ? Or the other way ? What happens if policies conflict as result of merging ?

And more important - what are the security implications, child seems to have a lot of power influencing parent routing.

and is listed as a parentRef object in a different Route. Technically, a parent
route is any route that has a non-nil AllowedRoutes block
- **Child Route:** A Route that lists a Route as its parentRef object.
- **Ephemeral Route:** The route configuration that results from combining (i.e.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend not using the "ephemeral" to describe this, as it has a temporal meaning. Maybe "resolved" or "materialized"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think that the way we've used language here makes this feel more like an implementation direction than a useful abstraction. I think we intended this more to be the latter.

I'll make a separate suggestion.

managed by an app dev team)
- **Inclusion:** When a child Route’s configuration is combined with a parent
Route’s config during runtime. The two routes may or may not be owned by the same
persona or organization. All delegations are inclusions, but not all inclusions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommend making language more direct:

All delegations are inclusions, but not all inclusions are delegations =>

Inclusion refers to the mechanism for combining a parent and child route. This may be separate from the notion of delegation, which is specific to crossing ownership boundaries. All delegations of will require using inclusion, but there may be cases where inclusion is useful under single ownership (e.g. user uses the inclusion mechanism it to manage segment their configuration)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that putting the definition of inclusion first makes sense.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also defining requirements for both inclusion and delegation, and maybe having separate docs since I don't think one solution can satisfy both.

I don't think it is safe for inclusion to cross namespace boundaries - while delegation main purpose is for crossing.
The biggest problem in this spec is mixing the 2 and failing to address either problem in a simple way.

are delegations.

## Introduction

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to put front and center one requirement that the resolution of delegation can be done as a rewrite by the controller -- e.g. there will not be a need for specific proxy implementation features to support delegation.

The litmus tests for this is whether or not the delegation can be implemented generically, e.g. as a library in gateway-api vs something that must be implementation-specific.

(Note: you might have this later in the doc, but I think it deserves to go up front somewhere so it's not lost).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend that we can work on this as a library that can be included in every implementation (if they desire). It will resolve a lot of ability to implement issues wrt to the spec.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we should be clear that this feature is all about Gateway API resources and how they are combined, not about what happens in the data plane.


The remainder of this document covers the proposed enhancements to the Gateway API.

## Patent/Child Route Relationships
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spelling Parent

child `Routes` by specifying `AllowedRoutes` where `BackendRef` would
be used otherwise.
5. Multiple child routes may attach to a single `RouteRule`, as long as they
are distinct.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does as "long as they are distinct" mean

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this should say "as long as they don't conflict"? I believe we're using "distinct" to mean "different and don't conflict".

1. A child route “attaches" to the parent route’s `RouteRule` that matches
that `SectionName`. The child route must also match the criteria
specified in the `RouteRule’s AllowedRoutes`.
4. If the `ParentRefs` of a child route doesn’t include a `SectionName`, the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not very clear to me what would motivate this use case.

We could also simply disallow attachment without SectionName?

This feels like it could result in some unintended consequences.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just about avoiding boilerplate in "simple" cases, I think. But I think it's equally fair to say that for this to work, route.Name must be specified, and the child route must reference it.

Copy link
Contributor

@mikemorris mikemorris Aug 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the same logic as how HTTPRoutes bind to Gateway Listeners without a SectionName - keeping this behavior consistent feels reasonable to me for simple cases? Not opposed to adding a restriction though if it helps encourage clarity though.

5. A parent `Route` and any related child `Routes` must be the same type of
`Routes`.
1. Example: A child httpRoute can not point to a parent tcpRoute.
6. A `Route` MUST NOT be a parent to a `Route` AND a child of another `Route`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add:

Specifically

  1. A ParentRef refers to a Route Kind, no rules can have a non-nil AllowedRoutes
  2. Similarly, a Route with non-nil AllowRoutes cannot have ParentRef to another Route.

it is either a parent or child (or misconfigured as both) without
requiring the graph traversal that a design like ForwardTo may need to
avoid cycles.
2. If an implementation chooses to not enforce this limitation, that must be
Copy link
Contributor

@bowei bowei Jul 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to have this carve out for the time being?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, we can leave this out because the whole feature is Extended conformance.

## Binding Parent and Child Routes

Binding a child route to a parent route refers to the process of reconciling
(i.e. combining) the two routes. Binding is treated as a single transaction for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should refer to the Gateway/ Route binding.

We should also (can be separate subsection) comparing where these are same/different.

`RouteRules` in the parent route, the binding between the two routes is still a
single transaction.

The binding transaction has two stages; reconciliation and activation. The
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a declarative API, we need to make sure that states we are talking about is represented explicitly as transitions in the stored state/status of the resource. I think this section will be the most challenging to get right.

When we talk about a sequential ordering, we need to note:

It doesn't sound like we are saying there is a specific ordering in which children are presented to be bound, but there is some serialized order which the control performs the evaluation (e.g. like lock acquisition between competing threads).

To make sure I understand this, is this something like:

  • P: parent route
  • BC: set of bound children to parent (i.e. children that have been successfully bound)
  • UC: set of yet-unbound children (i.e. children that the controller is looking at that are not bound).

Binding is the event that takes a unbound child and successfully binds it to the parent. This is an atomic decision -- looking at the current configuration generated (ephemeral route) by looking at P + BC, adding the incremental route r from UC -- if it the config is valid, then r is bound (i.e. a member of BC).

As part of the transition, the status of r and parent will be updated to reflect the successful binding.

Note: we may want to make the notion of ephemeral route concrete -- especially if we require that all route inclusion is a rewrite to something that is representable as a single route. In this case, it's actually an entirely valid resource.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The complexity of getting this right is why I'm of the opinion that we should err on the side of "what you ask for is what you get", and that if you ask for something that doesn't work, you get something that doesn't work, at least to begin with.

The alternative is that we need to carefully design the status to make clear that deadlocks have occurred and human intervention is required, since reconciliation cannot progress.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that is the case, do we need to update the proposal? The current language (as I understand it), talks about a conceptually sequential algorithm for resolving conflicts. Which is not necessarily a bad approach, just pointing out we need to make sure it is well-defined.

Copy link
Contributor

@youngnick youngnick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Big review from me, sorry.

I'd summarize this as:

  • we need another pass over the status design, especially keeping in mind some of the changes we've made to what counts as "Attached".
  • we need to define the interaction between this and Policy Attachment.

Before we do all of that though, I think that there seems to be some reasonably strong objections to this whole concept - it seems like the value proposition here is not clear, and I think that we may need to do more to explain:

  • the circumstances under which an implementation would support this
  • that it's completely valid for an implementation that already has some of these capabilities to not support it (in particular, I think that Service Mesh implementations have many similar functions as part of their mesh).

and is listed as a parentRef object in a different Route. Technically, a parent
route is any route that has a non-nil AllowedRoutes block
- **Child Route:** A Route that lists a Route as its parentRef object.
- **Ephemeral Route:** The route configuration that results from combining (i.e.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think that the way we've used language here makes this feel more like an implementation direction than a useful abstraction. I think we intended this more to be the latter.

I'll make a separate suggestion.

Comment on lines +35 to +38
- **Ephemeral Route:** The route configuration that results from combining (i.e.
reconciling) a parent route and the child route(s) that attaches to it. The
ephemeral route represents the configuration that is actually applied to the
data plane.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- **Ephemeral Route:** The route configuration that results from combining (i.e.
reconciling) a parent route and the child route(s) that attaches to it. The
ephemeral route represents the configuration that is actually applied to the
data plane.
- **Ephemeral Route:** A logical construct used in this document to explain the
route configuration that results from combining (i.e. reconciling) a parent route
and the child route(s) that attaches to it. This ephemeral route is a way of thinking
about the configuration that is actually applied to the data plane, and should not
be considered an implementation requirement.

Comment on lines +14 to +15
- Provide a 2-tier hierarchy of route configuration to allow partial delegation
of route settings.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Provide a 2-tier hierarchy of route configuration to allow partial delegation
of route settings.
- Provide a 2-tier hierarchy of route configuration to allow partial delegation
of route settings at Extended conformance.

I think we need to make very clear that this functionality is not and probably should never be core functionality. It's pretty advanced and useful for specific use cases; outside of those use cases it's a lot of extra complexity for minimal gain. Inside those use cases, it's absolutely vital.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree and will add a statement to make it clear.

managed by an app dev team)
- **Inclusion:** When a child Route’s configuration is combined with a parent
Route’s config during runtime. The two routes may or may not be owned by the same
persona or organization. All delegations are inclusions, but not all inclusions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that putting the definition of inclusion first makes sense.

are delegations.

## Introduction

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we should be clear that this feature is all about Gateway API resources and how they are combined, not about what happens in the data plane.

Message: Route successfully attached to parent and gateway.
```

This is pretty verbose, but the extra information will be very important once we
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, the idea here is to avoid the owner of the child Route needing to do a tree traversal to know the state of their child route. I can see the fanout concerns, but was hoping that by having the status refer back up the tree, rather than down, we should avoid some of those problems.

Comment on lines +731 to +732
kinds:
- kind: HTTPRoute
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is from an earlier version that allowed other Kinds, and hasn't been updated.

conditions:
- type: Accepted
status: true
reason: GrandparentConfigError
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree this is confusing, and needs to be updated.

parent route to know the exact name of the child route. If the two routes are
in different namespaces, that might require cross-organization coordination
between administrators.
- It would result in there being one way to attach routes to listeners and a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other thing I don't think we mentioned here is that I think that the backendRef implies an extra routing hop, which I don't think will be common. This is a way to break up the config, not a way to represent extra routing hops.

- [Request Filtering Between Gateways and Namespaced Routes · Issue #634 · kubernetes-sigs/gateway-api (github.com)](https://github.com/kubernetes-sigs/gateway-api/issues/634)
- [Request Filtering Between Gateways and Namespaced Routes - Google Docs](https://docs.google.com/document/d/1-0mgRRAY784OgGQ1_LCOshpLLbeAtIr4eXd0YVYK4RY/edit#heading=h.8cfxzle5tmqb)
- [Clarify how RouteGateways would work if we supported Route->Route delegation · Issue #411 · kubernetes-sigs/gateway-api (github.com)](https://github.com/kubernetes-sigs/gateway-api/issues/411)
- [Route inclusion/delegation security and policy implications · Issue #1042 · kubernetes-sigs/gateway-api (github.com)](https://github.com/kubernetes-sigs/gateway-api/issues/1042)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think that there's absolutely an action here to do some definition about how policy attachment interacts with route inclusion.

@costinm
Copy link

costinm commented Jul 27, 2022 via email

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Jeff-Apple
Once this PR has been reviewed and has the lgtm label, please ask for approval from bowei by writing /assign @bowei in a comment. For more information see:The Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@caboteria
Copy link
Contributor

Looked at this over the weekend and the biggest problem that I had was from overloading the HTTPRoute object to be two different things depending on the context. This will likely be an ongoing problem, e.g. "is this HTTPRoute a parent route or a child route?", "should this HTTPRoute have done something or is it waiting for a child?" etc. The ephemeral route idea is an attempt to solve this problem but for me it just adds complexity.

I'd like to suggest an experiment: create a new object called "HTTPRouteTemplate" which would be what we're now calling a "parent route." It would attach to Gateways and would be attached to by HTTPRoutes. It would be a template for attaching HTTPRoutes and it would constrain its attached HTTPRoutes, just like parent routes do today. In other words, this is mostly the same functionality, just with clearer nomenclature.

This would eliminate a lot of confusion about what HTTPRoute is supposed to do, when, and in which context. Users would no longer need to try to figure out whether a given HTTPRoute was a parent or a child based on context. It no longer needs the ephemeral route idea: HTTPRouteTemplate does what it does, and HTTPRoute does what it does, and they're two different things. To me this is basically the same functionality as the current proposal but I find it much easier to understand if we give up the "grandparent-parent-child" idea.

@robscott
Copy link
Member

robscott commented Aug 1, 2022

I think I like @caboteria's idea of a new resource focused entirely on HTTP delegation. If we kept it very limited in scope, it could end up simplifying this approach and resolving a number of the comments above. Each new resource we add comes with a significant cost, so we need to be careful here, but if it looks like delegation would only support a tiny subset of the overall HTTPRoute functionality it could be quite compelling.

I've been thinking about how we could move forward with this GEP as a whole, and I think one of the most important steps will be to outline how each capability of HTTPRoute today would or would not be supported by delegation. This seems like it could also help determine if a new delegation focused resource makes sense or if we really need most of the capabilities of an HTTPRoute for both parent and child resources.

Beyond that, here are some ideas that could potentially help get this GEP closer to merging. Some of these may not be relevant if we end up going further with @caboteria's suggestion above for a new resource to model this.

  1. Move expansion of status to later GEP, keep initial GEP focused on new reasons and conditions within existing structure
  2. Clearly state in the GEP that route delegation must be statically computable
  3. Simplify API by replacing "AllowedRoutes" struct with "DelegateToNamespace" string (name is not important here)
  4. Merge GEP as "Provisional" with the requirement that an OSS library that can support the requirements above can be developed within Gateway API before GEP graduates to "Implementable".

...
- name: store
...
allowedRoutes:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will happen if a delegation Route rule created later than the root rule?

This seems to be the prevailing style among k8s projects.

This commit should be a no-op in terms of the meaning of the YAML
docs, i.e., it should be style only.
matchExpressions should be contained by selector, not a sibling of it.
@costinm
Copy link

costinm commented Aug 7, 2022

I also like the idea of having a separate CRD for the parent route. In the spirit of bike shedding, I would call it HttpRouteDelegate or RouteDelegation - 'template' has a slightly different use.

That will make the use extremely clear and simple - and fits the 'policy attachment' model, it is a policy set on the gateway defining what routes can be attached by which namespace. We can start with what is likely the most common use cases - gateway admin delegating by URL prefix, etc. Would be great to have a list of the use cases and evaluate which are common enough.

Since you suggested Template - I can see the benefit of allowing an actual template - like /${namespace}/ - in the delegation policy.

As mentioned, keeping the HttpRoute ( which will be the most frequently used CR ) clean and simple and having the delegation
as a separate policy is better for users.

Route 'inclusion' or some actual templating may also be a useful - but it would be ideal to separate delegation (which has a clear security and role separation purpose) from templating and inclusion.

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Aug 8, 2022

CLA Missing ID CLA Not Signed

@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. and removed cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Aug 8, 2022
@shaneutt shaneutt removed this from the v0.6.0 milestone Sep 19, 2022
@shaneutt
Copy link
Member

Talked about this in the sync: not considering a blocker for v0.6.0

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 18, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 17, 2023
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Reopen this PR with /reopen
  • Mark this PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closed this PR.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Reopen this PR with /reopen
  • Mark this PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. kind/gep PRs related to Gateway Enhancement Proposal(GEP) lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.