Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: translator reports errors for existing clusters and secretes #4707

Merged
merged 19 commits into from
Nov 20, 2024

Conversation

zhaohuabing
Copy link
Member

@zhaohuabing zhaohuabing commented Nov 12, 2024

Fixes #4706 xDS translation failed when oidc tokenEndpoint and jwt remoteJWKS are specified within the same security policy and using the same hostname

Refactor: skips adding the cluster/secrets and returns nil to make the code cleaner and easier to maintain. It's safe to remove ErrXdsClusterExists and ErrXdsSecretsExists as they don't need to be handled in any places.

Release Notes: Yes

Test before the fix:

--- FAIL: TestTranslateXds (0.43s)
    --- FAIL: TestTranslateXds/securitypolicy-with-oidc-jwt-authz (0.00s)
        translator_test.go:142: securitypolicy-with-oidc-jwt-authz
        translator_test.go:143: 
                Error Trace:    /home/ubuntu/gateway/internal/xds/translator/translator_test.go:143
                Error:          Received unexpected error:
                                xds cluster exists
                Test:           TestTranslateXds/securitypolicy-with-oidc-jwt-authz
FAIL
FAIL    github.com/envoyproxy/gateway/internal/xds/translator   0.673s
FAIL

After:

ok      github.com/envoyproxy/gateway/internal/xds/translator   0.251s

Copy link

codecov bot commented Nov 12, 2024

Codecov Report

Attention: Patch coverage is 72.22222% with 10 lines in your changes missing coverage. Please review.

Project coverage is 65.65%. Comparing base (f99c36c) to head (2759831).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
internal/xds/translator/translator.go 72.72% 1 Missing and 2 partials ⚠️
internal/gatewayapi/securitypolicy.go 66.66% 0 Missing and 2 partials ⚠️
internal/xds/translator/extauth.go 0.00% 0 Missing and 2 partials ⚠️
internal/xds/translator/accesslog.go 50.00% 0 Missing and 1 partial ⚠️
internal/xds/translator/extproc.go 0.00% 0 Missing and 1 partial ⚠️
internal/xds/translator/oidc.go 50.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4707      +/-   ##
==========================================
+ Coverage   65.63%   65.65%   +0.02%     
==========================================
  Files         211      211              
  Lines       32017    31996      -21     
==========================================
- Hits        21014    21008       -6     
+ Misses       9761     9751      -10     
+ Partials     1242     1237       -5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


🚨 Try these New Features:

@zhaohuabing zhaohuabing marked this pull request as ready for review November 12, 2024 07:10
@zhaohuabing zhaohuabing marked this pull request as draft November 12, 2024 07:15
@zhaohuabing zhaohuabing changed the title fix: existing clusters and secretes fix: translator reports errors for existing clusters and secretes Nov 12, 2024
@zhaohuabing zhaohuabing force-pushed the fix-existing-cluster branch 6 times, most recently from 21c1901 to a7d7e6b Compare November 13, 2024 00:01
@zhaohuabing zhaohuabing marked this pull request as ready for review November 13, 2024 00:07
@arkodg
Copy link
Contributor

arkodg commented Nov 13, 2024

hey @zhaohuabing is the issue that we are generating the same name for the jwks and oidc clusters if both are set in the policy ? shouldnt we be using an additional string value to differentiate them ?

@zhaohuabing
Copy link
Member Author

hey @zhaohuabing is the issue that we are generating the same name for the jwks and oidc clusters if both are set in the policy ? shouldnt we be using an additional string value to differentiate them ?

For the cluster generated by a url, EG uses the host and port for the cluster name to avoid creating duplicated clusters for the same host+port combination.

jwt: 
 providers:
   - remoteJWKS:
       uri: https://oidc.example.com/auth/realms/example/protocol/openid-connect/cert

oidc:
 provider:
   tokenEndpoint: https://oidc.example.com/oauth/token

The name of the generated cluster: oidc_example_com_443.

We could change this logic to generate an unique name for each single oidc or jwt configuration. However, we should also ensure that the translator shouldn't throw error if the cluster already exists.

@arkodg
Copy link
Contributor

arkodg commented Nov 14, 2024

hey @zhaohuabing is the issue that we are generating the same name for the jwks and oidc clusters if both are set in the policy ? shouldnt we be using an additional string value to differentiate them ?

For the cluster generated by a url, EG uses the host and port for the cluster name to avoid creating duplicated clusters for the same host+port combination.

jwt: 
 providers:
   - remoteJWKS:
       uri: https://oidc.example.com/auth/realms/example/protocol/openid-connect/cert

oidc:
 provider:
   tokenEndpoint: https://oidc.example.com/oauth/token

The name of the generated cluster: oidc_example_com_443.

We could change this logic to generate an unique name for each single oidc or jwt configuration. However, we should also ensure that the translator shouldn't throw error if the cluster already exists.

is it safe to reuse the same cluster configuration ? is the naming different when use the backendRefs field ?

@zhaohuabing
Copy link
Member Author

is it safe to reuse the same cluster configuration ? is the naming different when use the backendRefs field ?

It's safe to reuse the asme cluster for ulr generated cluster as the cluster confiugration is identical for the same host+port combination.

For OIDC provider with backendRefs, EG generate an unique cluster name like securitypolicy/envoy-gateway/policy-for-gateway/0

@zhaohuabing
Copy link
Member Author

zhaohuabing commented Nov 14, 2024

For OIDC provider with backendRefs, EG generate an unique cluster name like securitypolicy/envoy-gateway/policy-for-gateway/0.

Ha, there is also a bug here, the index is always 0, which generates duplicated name for different clusters if there're both ext auth and oidc whithin a SecurityPolicy. Will fix it in this PR as well.

@arkodg
Copy link
Contributor

arkodg commented Nov 14, 2024

For OIDC provider with backendRefs, EG generate an unique cluster name like securitypolicy/envoy-gateway/policy-for-gateway/0.

Ha, there is also a bug here, the index is always 0, which generates duplicated name for different clusters if there're both ext auth and oidc whithin a SecurityPolicy. Will fix it in this PR as well.

nice catch, prob needs another prefix like jwt, oidc after policy name

Signed-off-by: Huabing Zhao <[email protected]>
@zhaohuabing zhaohuabing marked this pull request as draft November 14, 2024 16:57
@zhaohuabing zhaohuabing marked this pull request as ready for review November 14, 2024 17:26
@zhaohuabing zhaohuabing requested a review from arkodg November 14, 2024 17:31
@@ -361,6 +361,10 @@ func (t *Translator) translateSecurityPolicyForRoute(
// Apply IR to all relevant routes
prefix := irRoutePrefix(route)
parentRefs := GetParentReferences(route)
var (
extAutClusterIndex = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is the index needed ?

Copy link
Member Author

@zhaohuabing zhaohuabing Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A cluster is created for each parent, so an index is added to the cluster name to avoid name conflict.

var (
		extAutClusterIndex = 0
		oidcClusterIndex   = 0
	)
	for _, p := range parentRefs {
		parentRefCtx := GetRouteParentContext(route, p)
		gtwCtx := parentRefCtx.GetGateway()
		if gtwCtx == nil {
			continue
		}

		var extAuth *ir.ExtAuth
		if policy.Spec.ExtAuth != nil {
			if extAuth, err = t.buildExtAuth(
				policy,
				resources,
				gtwCtx.envoyProxy,
				extAutClusterIndex,
			); err != nil {
				err = perr.WithMessage(err, "ExtAuth")
				errs = errors.Join(errs, err)
			}
			extAutClusterIndex++
		}

		var oidc *ir.OIDC
		if policy.Spec.OIDC != nil {
			if oidc, err = t.buildOIDC(
				policy,
				resources,
				gtwCtx.envoyProxy, // TODO zhaohuabing: Only the last EnvoyProxy will be used as the OIDC name doesn't include the cluster index
				oidcClusterIndex,
			); err != nil {
				err = perr.WithMessage(err, "OIDC")
				errs = errors.Join(errs, err)
			}
			oidcClusterIndex++
		}
               ...
	}
	return errs
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the same policy though, can't the clusters be reused ?

Copy link
Member Author

@zhaohuabing zhaohuabing Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's because the EnvoyProxy configuration is different for each parent.

There's still an issue here, only the last one will be applied as the OIDC name is identical, I've added a TODO and will address this later in a follow-up PR.

#4707 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if its due to envoy proxy resource attributes that make the cluster unique, then lets append /<ns>-<name>/ of EP instead to maximize reuse ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or listener name, the index here is not used to debug and is not useful for reuse

Copy link
Member Author

@zhaohuabing zhaohuabing Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rethinking on this, since EnvoyProxy is an optional parameter, its name may not be suitable to be used as part of the generated cluster name. We can always use EndpointRoutingType or ServiceRoutingType for OIDC, ExtProc, and other none-route backend clusters and don't need to use the RoutingType in the EnvoyProxy.

@arkodg Could we address this in a follow-up PR for #4720? This is a minor issue and won't break OIDC. Addressing it in a separate PR will make cherry-picking to v1.2.2 easier.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhaohuabing I'd prefer if we removed the index altogether and raised another GH issue for dealing with policies targeting routes linked to multiple envoy proxies that are impacting the upstream TLS.

@zhaohuabing zhaohuabing requested a review from arkodg November 19, 2024 14:19
guydc
guydc previously approved these changes Nov 20, 2024
arkodg
arkodg previously approved these changes Nov 20, 2024
Copy link
Contributor

@arkodg arkodg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks !

@zhaohuabing zhaohuabing dismissed stale reviews from arkodg and guydc via 2759831 November 20, 2024 01:45
@zhaohuabing zhaohuabing requested review from guydc and arkodg November 20, 2024 01:45
@zhaohuabing zhaohuabing merged commit 86d750a into envoyproxy:main Nov 20, 2024
22 of 23 checks passed
@zhaohuabing zhaohuabing deleted the fix-existing-cluster branch November 20, 2024 02:24
zhaohuabing added a commit to zhaohuabing/gateway that referenced this pull request Nov 22, 2024
…voyproxy#4707)

* fix: existing clusters and secretes

Signed-off-by: Huabing Zhao <[email protected]>

* fix cluster index for SP

Signed-off-by: Huabing Zhao <[email protected]>

* minor change

Signed-off-by: Huabing Zhao <[email protected]>

* minor change

Signed-off-by: Huabing Zhao <[email protected]>

* minor change

Signed-off-by: Huabing Zhao <[email protected]>

* minor change

Signed-off-by: Huabing Zhao <[email protected]>

* fix lint

Signed-off-by: Huabing Zhao <[email protected]>

* add comment

Signed-off-by: Huabing Zhao <[email protected]>

* remove index

Signed-off-by: Huabing Zhao <[email protected]>

* fix lint

Signed-off-by: Huabing Zhao <[email protected]>

---------

Signed-off-by: Huabing Zhao <[email protected]>
(cherry picked from commit 86d750a)
Signed-off-by: Huabing Zhao <[email protected]>
zhaohuabing added a commit to zhaohuabing/gateway that referenced this pull request Nov 22, 2024
…voyproxy#4707)

* fix: existing clusters and secretes

Signed-off-by: Huabing Zhao <[email protected]>

* fix cluster index for SP

Signed-off-by: Huabing Zhao <[email protected]>

* minor change

Signed-off-by: Huabing Zhao <[email protected]>

* minor change

Signed-off-by: Huabing Zhao <[email protected]>

* minor change

Signed-off-by: Huabing Zhao <[email protected]>

* minor change

Signed-off-by: Huabing Zhao <[email protected]>

* fix lint

Signed-off-by: Huabing Zhao <[email protected]>

* add comment

Signed-off-by: Huabing Zhao <[email protected]>

* remove index

Signed-off-by: Huabing Zhao <[email protected]>

* fix lint

Signed-off-by: Huabing Zhao <[email protected]>

---------

Signed-off-by: Huabing Zhao <[email protected]>
zhaohuabing added a commit that referenced this pull request Nov 27, 2024
* fix: tcp listener is rejected when no route attached (#4681)

* fix: tcp listener is rejected when no route attached

Signed-off-by: Huabing Zhao <[email protected]>

* change cluter name

Signed-off-by: Huabing Zhao <[email protected]>

* fix listener connection limit test

Signed-off-by: Huabing Zhao <[email protected]>

* fix listener connetcp keepalive  test

Signed-off-by: Huabing Zhao <[email protected]>

* fix tcp endpoint stats test

Signed-off-by: Huabing Zhao <[email protected]>

* fix tcp-route-enable-req-resp-sizes-stats

Signed-off-by: Huabing Zhao <[email protected]>

* fix extensionpolicy-tcp-udp-http test

Signed-off-by: Huabing Zhao <[email protected]>

* fix lint

Signed-off-by: Huabing Zhao <[email protected]>

---------

Signed-off-by: Huabing Zhao <[email protected]>
(cherry picked from commit f99c36c)
Signed-off-by: Huabing Zhao <[email protected]>

* fix: remove backendrefs validation (#4705)

* remove backendrefs validation

Signed-off-by: Huabing Zhao <[email protected]>

* add tests

Signed-off-by: Huabing Zhao <[email protected]>

* add tests

Signed-off-by: Huabing Zhao <[email protected]>

---------

Signed-off-by: Huabing Zhao <[email protected]>
Co-authored-by: zirain <[email protected]>
(cherry picked from commit 5068698)
Signed-off-by: Huabing Zhao <[email protected]>

* fix: translator reports errors for existing clusters and secretes (#4707)

* fix: existing clusters and secretes

Signed-off-by: Huabing Zhao <[email protected]>

* fix cluster index for SP

Signed-off-by: Huabing Zhao <[email protected]>

* minor change

Signed-off-by: Huabing Zhao <[email protected]>

* minor change

Signed-off-by: Huabing Zhao <[email protected]>

* minor change

Signed-off-by: Huabing Zhao <[email protected]>

* minor change

Signed-off-by: Huabing Zhao <[email protected]>

* fix lint

Signed-off-by: Huabing Zhao <[email protected]>

* add comment

Signed-off-by: Huabing Zhao <[email protected]>

* remove index

Signed-off-by: Huabing Zhao <[email protected]>

* fix lint

Signed-off-by: Huabing Zhao <[email protected]>

---------

Signed-off-by: Huabing Zhao <[email protected]>

* xds: always use `::` and `IPv4Compact` for dynamic listener (#4743)

* enable IPv4Compact

Signed-off-by: zirain <[email protected]>

* fix xds test

Signed-off-by: zirain <[email protected]>

* release-notes

Signed-off-by: zirain <[email protected]>

* nit

Signed-off-by: zirain <[email protected]>

* gen

Signed-off-by: zirain <[email protected]>

---------

Signed-off-by: zirain <[email protected]>
(cherry picked from commit 78da42c)
Signed-off-by: Huabing Zhao <[email protected]>

* Fix: frequent 503 errors when connecting to a Service experiencing high Pod churn (#4754)

* Revert "fix: some status updates are discarded by the status updater (#4337)"

This reverts commit 14830c7.

Signed-off-by: Huabing Zhao <[email protected]>

* store update events and process it later

Signed-off-by: Huabing Zhao <[email protected]>

* rename method

Signed-off-by: Huabing Zhao <[email protected]>

* add release note

Signed-off-by: Huabing Zhao <[email protected]>

---------

Signed-off-by: Huabing Zhao <[email protected]>

* xds: use V4_PREFERRED dnsLookupFamily by default (#4745)

* use Cluster_V4_PREFERRED

Signed-off-by: zirain <[email protected]>

* release notes

Signed-off-by: zirain <[email protected]>

---------

Signed-off-by: zirain <[email protected]>

---------

Signed-off-by: Huabing Zhao <[email protected]>
Signed-off-by: zirain <[email protected]>
Co-authored-by: zirain <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

OIDC authentication and JWT authorization is unstable
4 participants