Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SecurityPolicy crd not working correctly with multiple gateway controllers #2520

Closed
zetaab opened this issue Jan 28, 2024 · 9 comments
Closed
Assignees
Labels
area/api API-related issues area/policy kind/decision A record of a decision made by the community. kind/feature new feature road-to-ga
Milestone

Comments

@zetaab
Copy link
Contributor

zetaab commented Jan 28, 2024

Description:
I have currently two different gateways installed (called internal and external). I am trying to create oidc and jwt SecurityPolicy to internal one. However, some configurations gets applied to external one and the auth itself does not work at all. If I try to create these policies to external one, it will somehow work. After I will apply cognito configurations to internal gateway I can see cognito configurations to be applied to external as well.

Repro steps:
install two different gateway controllers and then https://gist.github.com/zetaab/e70547adb70a8de61765387f36e8c23f

I have currently that configuration applied and I can see following in external envoyproxy (which should be only under internal gateway components):
Screenshot 2024-01-28 at 23 07 08

it does not have any configuration that should be against cognito at all.

Environment:
kube 1.29.1

Logs:

@zetaab zetaab added the triage label Jan 28, 2024
@zetaab
Copy link
Contributor Author

zetaab commented Jan 28, 2024

the problem might be that both controllers are listening SecurityPolicy and its changes. However, it should filter according targetRef "is this my task" or not. If I shutdown that another controller, things starts to work better.

@arkodg
Copy link
Contributor

arkodg commented Jan 29, 2024

thanks for raising this issue, as of today there is no parent for any of the Policies, so every controller can reconcile it, which is the issue you are facing
Recommendation ( as well as a workaround ) is to limit each controller to reconcile a limited number of namespaces, https://gateway.envoyproxy.io/latest/user/deployment-mode/

The better solution is to add an optional parent for a Policy, thinking out loud, it should be the controller name (versus GatewayClass name since some policies can attach to a Gatewayclass) similar to what exists in GatewayClass https://gateway-api.sigs.k8s.io/reference/spec/#gateway.networking.k8s.io%2fv1.GatewayController
cc @envoyproxy/gateway-maintainers

@arkodg arkodg added kind/feature new feature kind/decision A record of a decision made by the community. road-to-ga and removed triage labels Jan 29, 2024
@arkodg arkodg added area/policy area/api API-related issues labels Jan 29, 2024
@arkodg arkodg added this to the v1.0.0-rc1 milestone Jan 29, 2024
@zetaab
Copy link
Contributor Author

zetaab commented Feb 7, 2024

@arkodg could you help me little bit. I was investigating this and change zetaab@f83a3e9 makes it already work like should. However, there might be race condition if the controllers will reconcile the object in "wrong order".

Example:

2 controllers: lets say called internal and external. We have securitypolicy that should be applied only to external controller.

If the reconciling order is 1) external 2) internal following will happen:

  1. external will apply configuration fine and update securitypolicy status Accepted
  2. internal tries to apply configuration. However, it does not have route object so it will not add the configuration but it will update the securitypolicy status to TargetNotFound.

Anyways it works already better than before. Is it worth of making PR of this, or should I add that new field to securitypolicy and make it possible to configure gatewayClass for securitypolicy object?

@arkodg
Copy link
Contributor

arkodg commented Feb 7, 2024

@zetaab
we can't skip appending the policy into the result, since we need to have to update the status for all policies

the solution here is some sort of partitioning, since we cannot control reconcile order (during restarts etc)

  1. ether partition by limiting namespaces being reconciled (supported today)
  2. or add a parent field to limit reconciliation to a controller or a gatewayclass (controller child)

Id recommend trying out 1 for now (which is also better for performance, lesser amount of reconciliation), until we have some consensus on 2

@arkodg
Copy link
Contributor

arkodg commented Feb 8, 2024

looks like there is a better solution upstream
https://gateway-api.sigs.k8s.io/geps/gep-713/#standard-status-struct
we'll need to replace the current status object

Status SecurityPolicyStatus `json:"status,omitempty"`
with the upstream type https://github.com/kubernetes-sigs/gateway-api/blob/2ac95e4577ab9a0cedf9cc302f279f36aeb4d6db/apis/v1alpha2/policy_types.go#L187

this status is unique per controller (per ancestor which has a controller field) so reconciliation by multiple controllers should be able to update their individual ancestor status without affecting each other

@zetaab
Copy link
Contributor Author

zetaab commented Feb 8, 2024

but that also says:

// Note also that implementations MUST ONLY populate ancestor status for
// the Ancestor resources they are responsible for.

so basically the another controller is not responsible of that policy at all. So it should not be added as status

@arkodg
Copy link
Contributor

arkodg commented Feb 9, 2024

so basically the another controller is not responsible of that policy at all. So it should not be added as status

we cannot determine why a controller reconciling a policy targeting a gateway, cannot find the gateway

  • the user maybe have introduced a typo
  • the gateway may not exist in the cluster
  • the gateway may exist in the cluster but may not belong to the controller (part of another gatewayclass linked to another controller)

@shawnh2
Copy link
Contributor

shawnh2 commented Feb 28, 2024

facing the same issue with BTP as in e2e test https://github.com/envoyproxy/gateway/actions/runs/8073622448/job/22057769156?pr=2665#step:6:2130

#2665 introduce merge gateways e2e test, that requires multiple-gatewayclass per controller feature. but this test will affects all the other test cases that have policies attached. because the new gatewayclass in controller will also update the policy status, even the policy is not belong to new GC's.

so by fixing #2631, I think this issue can be resolved. assigning myself to these two issues.

@arkodg
Copy link
Contributor

arkodg commented Mar 8, 2024

should be fixed by #2802

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api API-related issues area/policy kind/decision A record of a decision made by the community. kind/feature new feature road-to-ga
Projects
No open projects
Development

No branches or pull requests

3 participants