Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

conformance: flaky test GatewayModifyListeners #2269

Closed
shawnh2 opened this issue Dec 6, 2023 · 12 comments
Closed

conformance: flaky test GatewayModifyListeners #2269

shawnh2 opened this issue Dec 6, 2023 · 12 comments
Labels
area/conformance Gateway API Conformance Related Issues area/testing help wanted Extra attention is needed kind/bug Something isn't working
Milestone

Comments

@shawnh2
Copy link
Contributor

shawnh2 commented Dec 6, 2023

https://github.com/envoyproxy/gateway/actions/runs/7111705161/job/19360397127#step:6:1111

@shawnh2 shawnh2 added kind/bug Something isn't working area/testing help wanted Extra attention is needed labels Dec 6, 2023
@pravinpushkar
Copy link

pravinpushkar commented Jan 12, 2024

I am also noticing the same behaviour in mainly two tests -

  1. GatewayModifyListeners/should_be_able_to_add_a_listener_that_then_becomes_available_for_routing_traffic
  2. GatewaySecretReferenceGrantAllInNamespace/Gateway_listener_should_have_a_true_ResolvedRefs_condition_and_a_true_Programmed_condition
  3. GatewayWithAttachedRoutesWithPort8080/Gateway_listener_should_have_attached_route_by_specifying_the_sectionName (this is less frequent)

All these failures are due to context deadline exceeded error. I tried to configure the timeouts like this, not luck -

timeout := timeoutConfig.TimeoutConfig{
		GatewayStatusMustHaveListeners:    120 * time.Second,
		GatewayListenersMustHaveCondition: 120 * time.Second,
	}

	cSuite := suite.New(suite.Options{
		Client:               client,
		GatewayClassName:     *flags.GatewayClassName,
		Debug:                *flags.ShowDebug,
		Clientset:            clientset,
		CleanupBaseResources: *flags.CleanupBaseResources,
		SupportedFeatures:    suite.AllFeatures,
		SkipTests: []string{
			tests.GatewayStaticAddresses.ShortName,
		},
		ExemptFeatures: suite.MeshCoreFeatures,
		TimeoutConfig:  timeout,

Any idea or suggestions on how to fix them. I looked here and they seem fairly stable.

@shawnh2
Copy link
Contributor Author

shawnh2 commented Feb 15, 2024

there're lots of failed reasons, but most of them are: timeout: context deadline exceeded.

some tests only failed occasionally and very difficult to debug and replay, hence the code should be fine, what may need to do is add a retry mechanism for the failed test case.

and also it seems upstream conformance does not support retry yet. maybe we can try to implement retry for EG first, to see how it works.

cc @envoyproxy/gateway-maintainers

@arkodg
Copy link
Contributor

arkodg commented Feb 15, 2024

@shawnh2 imo we need to figure out why its taking longer than 120s to compute and publish status

@shawnh2 shawnh2 changed the title conformance: flaky test GatewayModifyListeners conformance: flaky test GatewayModifyListeners & HTTPRouteInvalidCrossNamespaceParentRef Feb 24, 2024
@shawnh2
Copy link
Contributor Author

shawnh2 commented Feb 24, 2024

GatewayModifyListeners & HTTPRouteInvalidCrossNamespaceParentRef, these two conformance tests seem to be very unstable recently, let me investigate this.

Copy link

This issue has been automatically marked as stale because it has not had activity in the last 30 days.

@github-actions github-actions bot added the stale label Mar 25, 2024
@arkodg
Copy link
Contributor

arkodg commented Apr 17, 2024

o/p from another run

2024-04-16T07:30:22.7515182Z === RUN   TestGatewayAPIConformance/GatewayModifyListeners/should_be_able_to_remove_listeners,_which_would_then_stop_routing_the_relevant_traffic
2024-04-16T07:30:22.7589888Z     helpers.go:248: Gateways and Pods in gateway-conformance-infra namespaces ready
2024-04-16T07:30:22.7784581Z     helpers.go:217: Gateway gateway-conformance-infra/gateway-remove-listener expected observedGeneration to be updated to 2 for all conditions, only 0/2 were updated. stale conditions are: Accepted (generation 1), Programmed (generation 1)
2024-04-16T07:30:23.7784810Z     helpers.go:248: Gateways and Pods in gateway-conformance-infra namespaces ready
2024-04-16T07:30:23.7803677Z     helpers.go:655: Expected 1 Gateway status listeners, got 2
2024-04-16T07:30:24.7816499Z     helpers.go:655: Expected 1 Gateway status listeners, got 2
2024-04-16T07:30:25.7816031Z     helpers.go:655: Expected 1 Gateway status listeners, got 2
2024-04-16T07:30:26.7817410Z     helpers.go:655: Expected 1 Gateway status listeners, got 2
2024-04-16T07:30:27.7811108Z     helpers.go:655: Expected 1 Gateway status listeners, got 2
2024-04-16T07:30:28.7810117Z     helpers.go:655: Expected 1 Gateway status listeners, got 2
2024-04-16T07:30:29.7816314Z     helpers.go:655: Expected 1 Gateway status listeners, got 2
2024-04-16T07:30:30.7820952Z     helpers.go:655: Expected 1 Gateway status listeners, got 2
2024-04-16T07:30:31.7815261Z     helpers.go:655: Expected 1 Gateway status listeners, got 2
2024-04-16T07:30:32.7815842Z     helpers.go:655: Expected 1 Gateway status listeners, got 2
2024-04-16T07:30:33.7819704Z     helpers.go:655: Expected 1 Gateway status listeners, got 2
2024-04-16T07:30:34.7814444Z     helpers.go:655: Expected 1 Gateway status listeners, got 2
2024-04-16T07:30:35.7811404Z     helpers.go:655: Expected 1 Gateway status listeners, got 2
2024-04-16T07:30:36.7812707Z     helpers.go:655: Expected 1 Gateway status listeners, got 2
2024-04-16T07:30:37.7809131Z     helpers.go:655: Expected 1 Gateway status listeners, got 2
2024-04-16T07:30:38.7811003Z     helpers.go:655: Expected 1 Gateway status listeners, got 2
2024-04-16T07:30:39.7816384Z     helpers.go:655: Expected 1 Gateway status listeners, got 2
2024-04-16T07:30:40.7817144Z     helpers.go:655: Expected 1 Gateway status listeners, got 2
2024-04-16T07:30:41.7811348Z     helpers.go:655: Expected 1 Gateway status listeners, got 2
2024-04-16T07:30:42.7815970Z     helpers.go:655: Expected 1 Gateway status listeners, got 2
2024-04-16T07:30:43.7815505Z     helpers.go:655: Expected 1 Gateway status listeners, got 2
2024-04-16T07:30:44.7814868Z     helpers.go:655: Expected 1 Gateway status listeners, got 2
2024-04-16T07:30:45.7826414Z     helpers.go:655: Expected 1 Gateway status listeners, got 2
2024-04-16T07:30:46.7811063Z     helpers.go:655: Expected 1 Gateway status listeners, got 2
2024-04-16T07:30:47.7809669Z     helpers.go:655: Expected 1 Gateway status listeners, got 2
2024-04-16T07:30:48.7812275Z     helpers.go:655: Expected 1 Gateway status listeners, got 2
2024-04-16T07:30:49.7818144Z     helpers.go:655: Expected 1 Gateway status listeners, got 2
2024-04-16T07:30:50.7813862Z     helpers.go:655: Expected 1 Gateway status listeners, got 2
2024-04-16T07:30:51.7820819Z     helpers.go:655: Expected 1 Gateway status listeners, got 2
2024-04-16T07:30:52.7819000Z     helpers.go:655: Expected 1 Gateway status listeners, got 2
2024-04-16T07:30:53.7812285Z     helpers.go:655: Expected 1 Gateway status listeners, got 2
2024-04-16T07:30:54.7820181Z     helpers.go:655: Expected 1 Gateway status listeners, got 2
2024-04-16T07:30:55.7816157Z     helpers.go:655: Expected 1 Gateway status listeners, got 2
2024-04-16T07:30:56.7815871Z     helpers.go:655: Expected 1 Gateway status listeners, got 2
2024-04-16T07:30:57.7818397Z     helpers.go:655: Expected 1 Gateway status listeners, got 2
2024-04-16T07:30:58.7813144Z     helpers.go:655: Expected 1 Gateway status listeners, got 2
2024-04-16T07:30:59.7812120Z     helpers.go:655: Expected 1 Gateway status listeners, got 2
2024-04-16T07:31:00.7816908Z     helpers.go:655: Expected 1 Gateway status listeners, got 2
2024-04-16T07:31:01.7819883Z     helpers.go:655: Expected 1 Gateway status listeners, got 2
2024-04-16T07:31:02.7814860Z     helpers.go:655: Expected 1 Gateway status listeners, got 2
2024-04-16T07:31:03.7818306Z     helpers.go:655: Expected 1 Gateway status listeners, got 2
2024-04-16T07:31:04.7811612Z     helpers.go:655: Expected 1 Gateway status listeners, got 2
2024-04-16T07:31:05.7816259Z     helpers.go:655: Expected 1 Gateway status listeners, got 2
2024-04-16T07:31:06.7818326Z     helpers.go:655: Expected 1 Gateway status listeners, got 2
2024-04-16T07:31:07.7811554Z     helpers.go:655: Expected 1 Gateway status listeners, got 2
2024-04-16T07:31:08.7809345Z     helpers.go:655: Expected 1 Gateway status listeners, got 2
2024-04-16T07:31:09.7814957Z     helpers.go:655: Expected 1 Gateway status listeners, got 2
2024-04-16T07:31:10.7810451Z     helpers.go:655: Expected 1 Gateway status listeners, got 2
2024-04-16T07:31:11.7818248Z     helpers.go:655: Expected 1 Gateway status listeners, got 2
2024-04-16T07:31:12.7812777Z     helpers.go:655: Expected 1 Gateway status listeners, got 2
2024-04-16T07:31:13.7818887Z     helpers.go:655: Expected 1 Gateway status listeners, got 2
2024-04-16T07:31:14.7810245Z     helpers.go:655: Expected 1 Gateway status listeners, got 2
2024-04-16T07:31:15.7815620Z     helpers.go:655: Expected 1 Gateway status listeners, got 2
2024-04-16T07:31:16.7817708Z     helpers.go:655: Expected 1 Gateway status listeners, got 2
2024-04-16T07:31:17.7812662Z     helpers.go:655: Expected 1 Gateway status listeners, got 2
2024-04-16T07:31:18.7815552Z     helpers.go:655: Expected 1 Gateway status listeners, got 2
2024-04-16T07:31:19.7819050Z     helpers.go:655: Expected 1 Gateway status listeners, got 2
2024-04-16T07:31:20.7809590Z     helpers.go:655: Expected 1 Gateway status listeners, got 2
2024-04-16T07:31:21.7813606Z     helpers.go:655: Expected 1 Gateway status listeners, got 2
2024-04-16T07:31:22.7812635Z     helpers.go:655: Expected 1 Gateway status listeners, got 2
2024-04-16T07:31:23.7793577Z     gateway-modify-listeners.go:194: 
2024-04-16T07:31:23.7796763Z         	Error Trace:	/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/conformance/utils/kubernetes/helpers.go:657
2024-04-16T07:31:23.7803218Z         	            				/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/conformance/tests/gateway-modify-listeners.go:194
2024-04-16T07:31:23.7804857Z         	Error:      	Received unexpected error:
2024-04-16T07:31:23.7816799Z         	            	error fetching Gateway: client rate limiter Wait returned an error: context deadline exceeded
2024-04-16T07:31:23.7868065Z         	Test:       	TestGatewayAPIConformance/GatewayModifyListeners/should_be_able_to_remove_listeners,_which_would_then_stop_routing_the_relevant_traffic
2024-04-16T07:31:23.7870591Z         	Messages:   	error waiting for Gateway status to have listeners matching expectations

Copy link

This issue has been automatically marked as stale because it has not had activity in the last 30 days.

@github-actions github-actions bot added the stale label May 17, 2024
@arkodg arkodg removed the stale label May 22, 2024
@arkodg arkodg added this to the v1.1.0-rc1 milestone May 22, 2024
@shawnh2 shawnh2 changed the title conformance: flaky test GatewayModifyListeners & HTTPRouteInvalidCrossNamespaceParentRef conformance: flaky test GatewayModifyListeners May 29, 2024
@arkodg arkodg modified the milestones: v1.1.0-rc1, v1.1.0 May 31, 2024
Copy link

github-actions bot commented Jul 1, 2024

This issue has been automatically marked as stale because it has not had activity in the last 30 days.

@github-actions github-actions bot added the stale label Jul 1, 2024
@arkodg arkodg removed this from the v1.1.0 milestone Jul 31, 2024
@arkodg arkodg added this to the v1.2.0 milestone Jul 31, 2024
@github-actions github-actions bot removed the stale label Jul 31, 2024
@arkodg
Copy link
Contributor

arkodg commented Aug 16, 2024

is this test still flaky @shawnh2 ?

@shawnh2
Copy link
Contributor Author

shawnh2 commented Aug 16, 2024

Yeap, still flaky. Unassign myself since lack of bandwidth to investigate.

@shawnh2 shawnh2 removed their assignment Aug 16, 2024
@shawnh2 shawnh2 added the help wanted Extra attention is needed label Aug 16, 2024
@arkodg
Copy link
Contributor

arkodg commented Oct 31, 2024

I haven't seen this flake in a long time, closing this, lets reopen if it does happen again

@arkodg arkodg closed this as completed Oct 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/conformance Gateway API Conformance Related Issues area/testing help wanted Extra attention is needed kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants