Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support routing to Service Cluster IP #1900

Closed
Tracked by #1256
arkodg opened this issue Sep 19, 2023 · 26 comments · Fixed by #3543
Closed
Tracked by #1256

Support routing to Service Cluster IP #1900

arkodg opened this issue Sep 19, 2023 · 26 comments · Fixed by #3543
Assignees
Labels
area/api API-related issues area/policy
Milestone

Comments

@arkodg
Copy link
Contributor

arkodg commented Sep 19, 2023

No description provided.

@arkodg arkodg mentioned this issue Sep 19, 2023
3 tasks
@arkodg
Copy link
Contributor Author

arkodg commented Sep 19, 2023

two ways to solve this

  • add a field within EnvoyProxy called disableEndpointRouting / enableClusterIPRouting which allows us to fallback to routing using Service Cluster IP
  • incorporate this into the load balancing API

@arkodg
Copy link
Contributor Author

arkodg commented Sep 19, 2023

ptal @envoyproxy/gateway-maintainers

@arkodg
Copy link
Contributor Author

arkodg commented Sep 22, 2023

chatted with @kflynn the other day about this and we both feel adding it in the load balancing API is more suitable

  • if we added it another API, userX may enable serviceIP at the EnvoyProxy level, user Y may set the LB Policy and then wonder why the LB policy isn't working. Adding it inside the LB Policy forces the user to be aware of the lb & routing decision to endpoints

@Xunzhuo
Copy link
Member

Xunzhuo commented Sep 22, 2023

I am +1 on load balancing api, if we set this in EnvoyProxy, IMHO, this is on GC level and working for all managed Gateways, but if people just want this to work in specific Gateways, a Policy is more suitable.

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had activity in the last 30 days.

@github-actions github-actions bot added the stale label Oct 22, 2023
@arkodg arkodg added area/api API-related issues area/policy and removed stale labels Feb 16, 2024
@arkodg arkodg added this to the Backlog milestone Feb 16, 2024
@arkodg arkodg added the help wanted Extra attention is needed label Feb 16, 2024
@arkodg
Copy link
Contributor Author

arkodg commented Feb 16, 2024

this new ClusterIP load balancing type should live in https://github.com/envoyproxy/gateway/blob/main/api/v1alpha1/loadbalancer_types.go

@deszhou
Copy link
Contributor

deszhou commented Mar 5, 2024

Hi @arkodg, i think this use case can be achieved by adding a fallbackToClusterIP field

apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
  namespace: default
  name: policy-for-route1
spec:
  targetRef:
    group: gateway.networking.k8s.io
    kind: HTTPRoute
    name: httproute-1
    namespace: default
  loadBalancer:
    type: LeastRequest
    slowStart:
      window: 300s
    fallbackToClusterIP: true # Enables fallback to service's cluster IP

@arkodg
Copy link
Contributor Author

arkodg commented Mar 5, 2024

@yeedove, I personally prefer

  loadBalancer:
    type: ClusterIP

@kflynn
Copy link
Contributor

kflynn commented Mar 7, 2024

Turns out that to get Envoy Gateway working well with Linkerd, we either need this or we need an l5d-dst-override header to be added to the request, so I would definitely like this. 🙂

I'd personally vote for routingType: service or routingType: endpoint, but I'm good with @arkodg's type suggestion.

@deszhou
Copy link
Contributor

deszhou commented Mar 12, 2024

please assign to me

@arkodg
Copy link
Contributor Author

arkodg commented Mar 16, 2024

I like @kflynn's suggestion of a dedicated field routingType, if we take this approach, we'll have to use CEL to disable loadBalancing if routingType: Service

@arkodg
Copy link
Contributor Author

arkodg commented Mar 28, 2024

  • we discussed this in the community meeting today, most were in favor of this field living in the BackendTrafficPolicy
    thanks for volunteering to pick this up @kflynn, temporarily assigning this to you for now
    here's a link to help you get started
    func (t *Translator) processDestination(backendRefContext BackendRefContext,
    (you'll need to override the global EndpointRoutingDisabled field)

@arkodg arkodg assigned kflynn and unassigned deszhou Mar 28, 2024
@arkodg arkodg changed the title Allow fallback to Service Cluster IP Support routing to Service Cluster IP Mar 28, 2024
@arkodg
Copy link
Contributor Author

arkodg commented Apr 3, 2024

rethinking this one, if the field lives in BackendTrafficPolicy, then it won't apply to other backendRefs in other places, e.g. ext_auth in SecurityPolicy, or ext_proc in EnvoyExtensionPolicy or openTelemetry sinks in EnvoyProxy .
we may want to move this field into EnvoyProxy
cc @zhaohuabing @guydc

@zhaohuabing
Copy link
Member

I guess this depends on do we need per-service configuration for the choice of cluster IP vs pod IP?

@arkodg
Copy link
Contributor Author

arkodg commented Apr 3, 2024

one reason to select Service routing is for mesh case (sidecar beside gateway) to work seamlessly, so it will work for the xRoute backends (with sidecars ) but not if the ext auth / ext proc / Otel (have sidecars ) because it won't get intercepted properly in the gateway sidecar

Copy link

github-actions bot commented May 3, 2024

This issue has been automatically marked as stale because it has not had activity in the last 30 days.

@github-actions github-actions bot added the stale label May 3, 2024
@jasonmoore2k
Copy link

Any progress on this? I’m looking to deploy EG alongside an Istio mesh and from what I’m hearing here and in the related ticket, I would need this feature to make it work?

@github-actions github-actions bot removed the stale label May 13, 2024
@arkodg
Copy link
Contributor Author

arkodg commented May 13, 2024

@kflynn still planning on working on this one ?

@arkodg arkodg modified the milestones: Backlog, v1.1.0-rc1 May 13, 2024
@ryanhristovski
Copy link
Contributor

@kflynn by setting the routingType: service would we lose the observability into each endpoint in the /clusters config & metrics and only display the ClusterIP?

@arkodg discussed the possibility of running an idle linkerd proxy sidecar on the gateway which would handle mTLS, but then it seems like we'd lose Linkerd multi-cluster service discovery.

@lnattrass
Copy link

@kflynn by setting the routingType: service would we lose the observability into each endpoint in the /clusters config & metrics and only display the ClusterIP?

I think if service-mesh is being used, the metrics are likely going to be more useful from the mesh than from the ingress, because the mesh will handle service -> service as well as envoy -> service in the same format and namespace..

@kflynn
Copy link
Contributor

kflynn commented May 23, 2024

I think @lnattrass has the right of it – part of the point of the mesh is that it should be handling both metrics and routing for you in this case.

@kflynn
Copy link
Contributor

kflynn commented May 23, 2024

@arkodg I'd still like to, yes! so here's a question for you. There already is an EndpointRoutingDisabled field in the Translator; do we think that works as a global switch? or is a leftover untested vestigial thing? 🙂

@arkodg
Copy link
Contributor Author

arkodg commented May 23, 2024

@arkodg I'd still like to, yes! so here's a question for you. There already is an EndpointRoutingDisabled field in the Translator; do we think that works as a global switch? or is a leftover untested vestigial thing? 🙂

@kflynn this should work as expected, the gatewayapi test should prove this easily, if the flag is set, the clusterIP will get set it in the IR

@ryanhristovski
Copy link
Contributor

@lnattrass we're not necessarily interested in Linkerd for routing on our ingress, but more so for cross-cluster service discovery and mTLS. So we'd prefer to keep the load balancing capabilities and metrics as close to our proxy as possible. In the past the CPU requirements of running 4k RPS+ on an ingress pod with linkerd over doubles our usage and since we've always used headless services we haven't relied on Linkerd's load balancing.

I suppose this is more of a niche use-case, but it does limit us on implementing multi-cluster Linkerd.

@kflynn would Linkerd be open to mirroring EndpointSlices as well or is it out of scope? I could open an issue (and potentially work on it) if you think it's worth considering

@evacchi
Copy link
Contributor

evacchi commented Jun 5, 2024

Hi @kflynn I'm sorry to take this over from you, but we internally needed a fix earlier :) feel free to check out and comment my PR, though!

@arkodg arkodg assigned evacchi and unassigned kflynn Jun 5, 2024
@kflynn
Copy link
Contributor

kflynn commented Jun 27, 2024

@evacchi No apology needed! as you can see, I've been pulled onto other things, thank you for making this happen!! 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api API-related issues area/policy
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants