Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

conformance: enable HTTPRouteRedirectPortAndScheme #1601

Merged
merged 8 commits into from
Aug 11, 2023

Conversation

shawnh2
Copy link
Contributor

@shawnh2 shawnh2 commented Jun 28, 2023

What type of PR is this?

What this PR does / why we need it:

HTTPRouteRedirectPortAndScheme conformance test is related to kubernetes-sigs/gateway-api#1880

Which issue(s) this PR fixes:

Fixes #1441

@codecov
Copy link

codecov bot commented Jun 28, 2023

Codecov Report

Merging #1601 (5401e2a) into main (8019ae4) will increase coverage by 0.05%.
Report is 3 commits behind head on main.
The diff coverage is 85.71%.

@@            Coverage Diff             @@
##             main    #1601      +/-   ##
==========================================
+ Coverage   64.88%   64.94%   +0.05%     
==========================================
  Files          84       84              
  Lines       12192    12207      +15     
==========================================
+ Hits         7911     7928      +17     
+ Misses       3774     3773       -1     
+ Partials      507      506       -1     
Files Changed Coverage Δ
internal/gatewayapi/route.go 88.28% <83.33%> (-0.08%) ⬇️
internal/provider/kubernetes/helpers.go 83.22% <100.00%> (+3.35%) ⬆️

... and 2 files with indirect coverage changes

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had activity in the last 30 days. Please feel free to give a status update now, ping for review, when it's ready. Thank you for your contributions!

@github-actions github-actions bot added the stale label Jul 28, 2023
@shawnh2
Copy link
Contributor Author

shawnh2 commented Aug 9, 2023

/retest

@github-actions github-actions bot removed the stale label Aug 9, 2023
@shawnh2
Copy link
Contributor Author

shawnh2 commented Aug 10, 2023

TL;DR the failed conformance test is all about `HTTPRouteRedirectPortAndScheme/http-listener-on-8080`, i suspect there is some wrong with this conformance test, causing routes to conflict somehow

--- FAIL: TestGatewayAPIConformance/HTTPRouteRedirectPortAndScheme (1.17s)
        --- PASS: TestGatewayAPIConformance/HTTPRouteRedirectPortAndScheme/https-listener-on-443/4_request_to_'example.org/scheme-http-and-port-80'_should_receive_a_302 (0.01s)
        --- PASS: TestGatewayAPIConformance/HTTPRouteRedirectPortAndScheme/https-listener-on-443/5_request_to_'example.org/scheme-http-and-port-8080'_should_receive_a_302 (0.02s)
        --- PASS: TestGatewayAPIConformance/HTTPRouteRedirectPortAndScheme/https-listener-on-443/3_request_to_'example.org/scheme-http-and-port-nil'_should_receive_a_302 (0.01s)
        --- PASS: TestGatewayAPIConformance/HTTPRouteRedirectPortAndScheme/https-listener-on-443/2_request_to_'example.org/scheme-nil-and-port-8443'_should_receive_a_302 (0.01s)
        --- PASS: TestGatewayAPIConformance/HTTPRouteRedirectPortAndScheme/https-listener-on-443/1_request_to_'example.org/scheme-nil-and-port-443'_should_receive_a_302 (0.01s)
        --- PASS: TestGatewayAPIConformance/HTTPRouteRedirectPortAndScheme/https-listener-on-443/0_request_to_'example.org/scheme-nil-and-port-nil'_should_receive_a_302 (0.01s)
        *** FAIL: TestGatewayAPIConformance/HTTPRouteRedirectPortAndScheme/http-listener-on-8080/1_request_to_'/scheme-nil-and-port-80'_should_receive_a_302 (31.52s)
        *** FAIL: TestGatewayAPIConformance/HTTPRouteRedirectPortAndScheme/http-listener-on-8080/2_request_to_'/scheme-https-and-port-nil'_should_receive_a_302 (31.52s)
        *** FAIL: TestGatewayAPIConformance/HTTPRouteRedirectPortAndScheme/http-listener-on-8080/0_request_to_'/scheme-nil-and-port-nil'_should_receive_a_302 (31.52s)
        --- PASS: TestGatewayAPIConformance/HTTPRouteRedirectPortAndScheme/http-listener-on-80/5_request_to_'/scheme-https-and-port-8443'_should_receive_a_302 (0.00s)
        --- PASS: TestGatewayAPIConformance/HTTPRouteRedirectPortAndScheme/http-listener-on-80/4_request_to_'/scheme-https-and-port-443'_should_receive_a_302 (0.00s)
        --- PASS: TestGatewayAPIConformance/HTTPRouteRedirectPortAndScheme/http-listener-on-80/3_request_to_'/scheme-https-and-port-nil'_should_receive_a_302 (0.00s)
        --- PASS: TestGatewayAPIConformance/HTTPRouteRedirectPortAndScheme/http-listener-on-80/2_request_to_'/scheme-nil-and-port-8080'_should_receive_a_302 (0.00s)
        --- PASS: TestGatewayAPIConformance/HTTPRouteRedirectPortAndScheme/http-listener-on-80/1_request_to_'/scheme-nil-and-port-80'_should_receive_a_302 (0.00s)
        --- PASS: TestGatewayAPIConformance/HTTPRouteRedirectPortAndScheme/http-listener-on-80/0_request_to_'/scheme-nil-and-port-nil'_should_receive_a_302 (0.00s)

the error shows:

http.go:217: Request failed, not ready yet: Get "http://172.18.255.203:8080/scheme-nil-and-port-nil": dial tcp 172.18.255.203:8080: connect: no route to host (after 20.332037396s)
Sending Request:
< GET /scheme-nil-and-port-nil HTTP/1.1
< Host: 172.18.255.203:8080
< User-Agent: Go-http-client/1.1
< X-Echo-Set-Header: 
< Accept-Encoding: gzip
< 
< 

which is wired. if i deply the gateway, deployment and svc myself in my local cluster, it works and does not have any errors:

# running in my own cluster
~ curl -v 172.16.255.200:8080/scheme-nil-and-port-nil

*   Trying 172.16.255.200:8080...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0* Connected to 172.16.255.200 (172.16.255.200) port 8080 (#0)
> GET /scheme-nil-and-port-nil HTTP/1.1
> Host: 172.16.255.200:8080
> User-Agent: curl/7.74.0
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 302 Found
< location: http://example.org:8080/scheme-nil-and-port-nil
< date: Thu, 10 Aug 2023 03:07:56 GMT
< server: envoy
< content-length: 0
<
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
* Connection #0 to host 172.16.255.200 left intact

so i suspect there is some wrong with this conformance test, causing routes to conflict somehow.

@shawnh2
Copy link
Contributor Author

shawnh2 commented Aug 10, 2023

cc @arkodg

i think i found the reason, and it will be fixed in this PR.


TL;DR two different gateway may end up with the same one IP address, causing route to conflict

reproduce steps:

make create-cluster && make kube-deploy

kubectl apply -f test/config/gatewayclass.yaml

go test -v -tags conformance ./test/conformance --gateway-class=envoy-gateway --debug=true --run TestGatewayAPIConformance/HTTPRouteRedirectPortAndScheme
~ kubectl get gtw -n gateway-conformance-infra
NAME                                        CLASS           ADDRESS          PROGRAMMED   AGE
all-namespaces                              envoy-gateway   172.16.255.201   True         80s
backend-namespaces                          envoy-gateway   172.16.255.202   True         80s
same-namespace                              envoy-gateway   172.16.255.200   True         80s
same-namespace-with-http-listener-on-8080   envoy-gateway   172.16.255.203   True         7s  <==
same-namespace-with-https-listener          envoy-gateway   172.16.255.203   True         80s <==

so my request to 172.16.255.202:8080/scheme-nil-and-port-nil will hit same-namespace-with-https-listener instead of same-namespace-with-http-listener-on-8080, causing error connect: no route to host

this is caused by envoy-gateway, it trims the resource name if it is longer than 48.

if len(resourceName) > 48 {
return fmt.Sprintf("%s-%s", resourceName[0:48], hashedName[0:8])
}

so same-namespace-with-http-listener-on-8080 ends up have the same name as same-namespace-with-https-listener do, which is envoy-gateway-conformance-infra-same-namespace-with-ht, so they end up with same IP.

kubectl get svc -n envoy-gateway-system
NAME                                                              TYPE           CLUSTER-IP      EXTERNAL-IP      PORT(S)               AGE
envoy-gateway                                                     ClusterIP      10.96.227.12    <none>           18000/TCP,18001/TCP   21m
envoy-gateway-conformance-infra-all-namespaces-67617465           LoadBalancer   10.96.129.24    172.16.255.201   80:31265/TCP          19m
envoy-gateway-conformance-infra-backend-namespaces-67617465       LoadBalancer   10.96.63.212    172.16.255.202   80:30822/TCP          19m
envoy-gateway-conformance-infra-same-namespace-67617465           LoadBalancer   10.96.107.229   172.16.255.200   80:30797/TCP          19m
envoy-gateway-conformance-infra-same-namespace-with-ht-67617465   LoadBalancer   10.96.21.67     172.16.255.203   443:31186/TCP         19m
envoy-gateway-metrics-service                                     ClusterIP      10.96.168.31    <none>           8443/TCP              21m

@shawnh2
Copy link
Contributor Author

shawnh2 commented Aug 10, 2023

the k8s can only accept object's name that shorter than 63, and the name of envoy gateway resources is formatted as:

envoy-[48char:name]-[8char:hash]   ===> total: 63char

there is no way to increase the char number of [48char:name], so i change the way to generate hash: use h.Write(nsName); h.Sum(nil) instead of the h.Sum(nsName), the later could easily produce the same (first 8 digits) hash even with different nsName.

hSum := h.Sum([]byte(nsName))
hashedName := strings.ToLower(fmt.Sprintf("%x", hSum))

and use namespace/resource-name as the format of nsName uniformly.

@shawnh2 shawnh2 marked this pull request as ready for review August 10, 2023 13:21
@shawnh2 shawnh2 requested a review from a team as a code owner August 10, 2023 13:21
@shawnh2 shawnh2 requested a review from arkodg August 10, 2023 13:21
@shawnh2
Copy link
Contributor Author

shawnh2 commented Aug 10, 2023

/retest

@@ -197,12 +197,12 @@ func refsSecret(ref *gwapiv1b1.SecretObjectReference) bool {
}

func infraServiceName(gateway *gwapiv1b1.Gateway) string {
infraName := utils.GetHashedName(fmt.Sprintf("%s-%s", gateway.Namespace, gateway.Name))
infraName := utils.GetHashedName(fmt.Sprintf("%s/%s", gateway.Namespace, gateway.Name))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice find :)
I understand this issue, which stems from

func irStringKey(gatewayNs, gatewayName string) string {
&
func ExpectedResourceHashedName(name string) string {

can we add a comment in here explaining why its this way or should we change

or instead of finding the deployment by its name which is done here

key := types.NamespacedName{

should we find the deployment based on the gateway labels

func GatewayOwnerLabels(namespace, name string) map[string]string {
which makes more sense imo

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the way we had is great.

and i have added some comments about func GetHashedName in internal/provider/utils/utils.go#L26: this function takes input which should be formatted as {Namespace}/{ResourceName}

func GetHashedName(nsName string) string {

h := sha256.New() // Using sha256 instead of sha1 due to Blocklisted import crypto/sha1: weak cryptographic primitive (gosec)
hSum := h.Sum([]byte(nsName))
hashedName := strings.ToLower(fmt.Sprintf("%x", hSum))
h.Write([]byte(nsName))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you help elaborate why this is needed ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here is a go playground example, may help you see why.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool thanks for highlighting this !

@arkodg
Copy link
Contributor

arkodg commented Aug 10, 2023

hey @shawnh2 thanks for digging deep and finding the culprit, reg your comment

so same-namespace-with-http-listener-on-8080 ends up have the same name as same-namespace-with-https-listener

trying to better understand how can the hash be the same for this case

@shawnh2
Copy link
Contributor Author

shawnh2 commented Aug 11, 2023

hey @shawnh2 thanks for digging deep and finding the culprit, reg your comment

so same-namespace-with-http-listener-on-8080 ends up have the same name as same-namespace-with-https-listener

trying to better understand how can the hash be the same for this case

we originally use sha256.Sum() to compute hash directly, but method Sum() is not the right way to generate hash. the go/sha256 package recommended this way to generate hash. (can refer to the go playground example here)

here is the effects
before:

kubectl get svc -n envoy-gateway-system
NAME                                                              TYPE           
envoy-gateway                                                     ClusterIP 
envoy-gateway-conformance-infra-all-namespaces-67617465           LoadBalancer
envoy-gateway-conformance-infra-backend-namespaces-67617465       LoadBalancer
envoy-gateway-conformance-infra-same-namespace-67617465           LoadBalancer
envoy-gateway-conformance-infra-same-namespace-with-ht-67617465   LoadBalancer  # resources like `envoy-gateway-conformance-infra-same-namespace-with-http-listener-on-8080` and
                                                                                  `envoy-gateway-conformance-infra-same-namespace-with-https-listener` collapse into one

after:

NAME                                                              TYPE
envoy-gateway                                                     ClusterIP
envoy-gateway-conformance-infra-all-namespaces-cbdhjs62           LoadBalancer
envoy-gateway-conformance-infra-backend-namespaces-c5ds6v8k       LoadBalancer
envoy-gateway-conformance-infra-same-namespace-ed64jgb9           LoadBalancer
envoy-gateway-conformance-infra-same-namespace-with-ht-8eu1w9v3   LoadBalancer  # this is for resource `envoy-gateway-conformance-infra-same-namespace-with-http-listener-on-8080`
envoy-gateway-conformance-infra-same-namespace-with-ht-7f38wfg4   LoadBalancer  # this is for resource `envoy-gateway-conformance-infra-same-namespace-with-https-listener`

Copy link
Member

@Xunzhuo Xunzhuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Thanks

Copy link
Contributor

@arkodg arkodg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks !

@arkodg arkodg merged commit f466430 into envoyproxy:main Aug 11, 2023
@shawnh2 shawnh2 deleted the conf-test-redirect branch August 11, 2023 04:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enable HTTPRouteRedirectPortAndScheme
3 participants