🐛 Use a different object name for each ClusterResultSet test case #4081

fabriziopandini · 2021-01-15T14:04:05Z

What this PR does / why we need it:
Fix ClusterResultSet test flakes by ensuring that each test case uses a different CRS object name (instead of reusing always the same name).

Which issue(s) this PR fixes:
Fixes #4075

fabriziopandini · 2021-01-15T14:12:40Z

/test pull-cluster-api-test-main

fabriziopandini · 2021-01-15T14:16:55Z

/test pull-cluster-api-test-main

fabriziopandini · 2021-01-15T14:28:57Z

/test pull-cluster-api-test-main

fabriziopandini · 2021-01-15T15:22:44Z

/test pull-cluster-api-test-main

fabriziopandini · 2021-01-15T16:21:18Z

/test pull-cluster-api-test-main

fabriziopandini · 2021-01-15T16:26:52Z

/test pull-cluster-api-test-main

fabriziopandini · 2021-01-15T16:39:45Z

/test pull-cluster-api-test-main

fabriziopandini · 2021-01-15T16:48:24Z

/test pull-cluster-api-test-main

fabriziopandini · 2021-01-15T17:04:30Z

@vincepri @jsturtevant @furkatgofurov7 PTAL

With this change, the ClusterResultSet test seems to be consistently passing

jsturtevant · 2021-01-15T17:16:16Z

I pulled your change into my branch and ran it locally via in goland and still see errors:

Summarizing 1 Failure:

[Fail] ClusterResourceSet Reconciler [It] Should reconcile a ClusterResourceSet when a resource is created that is part of ClusterResourceSet resources 
/home/jstur/projects/cluster-api/exp/addons/controllers/clusterresourceset_controller_test.go:312

Ran 5 of 5 Specs in 16.096 seconds
FAIL! -- 4 Passed | 1 Failed | 0 Pending | 0 Skipped

From my investigation what seems to be happening is the test client doesn't see the configMap right away (my custom logging):

I0115 09:11:02.883895 1738907 clusterresourceset_controller.go:292] controller-runtime/manager/controller/clusterresourceset "msg"="############## can't find the object!" "cluster"="cluster-84jeuv" "name"="test-clusterresourceset" "namespace"="default" "reconciler group"="addons.cluster.x-k8s.io" "reconciler kind"="ClusterResourceSet" "ns"="default" "resource"={"name":"test-configmap3","kind":"ConfigMap"}

and it quietly returns ctrl.Result{} so the controller doesn't reconcile again. As a test, I updated to return and re-run when the ConfigMap isn't found and it passes consistently:

cluster-api/exp/addons/controllers/clusterresourceset_controller.go

Lines 280 to 283 in 1ac7896

    
           // Continue without adding the error to the aggregate if we can't find the resource. 
        
           if apierrors.IsNotFound(err) { 
        
           	continue 
        
           }

to

if apierrors.IsNotFound(err) {
					log.Info("############## can't find the object!", "resource", resource, "ns", cluster.GetNamespace())
					return err
					//continue
				}

Seems like there are two things happening... The test controller doesn't get the value correctly (initially) and second if the the resource isn't there (maybe because of delay or caching) the reconciliation might not happen.

jsturtevant · 2021-01-15T17:24:57Z

looks like the test failed in prow as well :-(

I saw this comment: // It applies resources best effort and continue on scenarios like: unsupported resource types, failure during creation, missing resources.

Currently if the resource is not found it doesn't re-trigger reconciliation unless something else re-triggers it later. Should we consider adding this as an error and retrying reconciliation with a back-off in cases where the resource is missing?

fabriziopandini · 2021-01-15T18:49:33Z

@jsturtevant the prow failure is in the machinepool test, and it is tracked in #4068
I will investigate your note above, thanks for the detailed repo

jsturtevant · 2021-01-15T19:22:08Z

@jsturtevant the prow failure is in the machinepool test, and it is tracked in #4068

I see that now. looked too quick.

These are the rough changes with the logging for debugging and retry: https://github.com/kubernetes-sigs/cluster-api/compare/master...jsturtevant:crs-objects-not-found-retry?expand=1. Seems that this is mainly caused by something in the test client caching (since retries fix it) but it might make the CRS more resilient in the cases that something wasn't on the cluster when the reconciliation happened for what ever reason.

I also found that in this case where the ConfigMap object isn't found the conditions.MarkTrue(clusterResourceSet, addonsv1.ResourcesAppliedCondition) is applied to the cluster which is a bit strange to me.

fabriziopandini · 2021-01-18T11:18:39Z

/retest

fabriziopandini · 2021-01-18T11:24:49Z

/test pull-cluster-api-test-main

fabriziopandini · 2021-01-18T11:30:57Z

/test pull-cluster-api-test-main

fabriziopandini · 2021-01-18T12:06:35Z

/test pull-cluster-api-test-main

fabriziopandini · 2021-01-18T13:04:10Z

@jsturtevant thanks!
I have incorporated also the wait for the config map object to be created, and now this flake does not happen anymore both in prow and locally for me. If everyone agrees, I suggest to merge these changes and then make another round of optimization after keeping an eye on flakes for a few days

fabriziopandini · 2021-01-19T11:15:31Z

/milestone v0.4.0

vincepri · 2021-01-19T15:57:16Z

exp/addons/controllers/clusterresourceset_controller_test.go

-				Name:      "test-clusterresourceset",
+				Name:      clusterResourceSetName,


Could we use generate name instead?

vincepri

/milestone v0.4.0

vincepri

/approve
/lgtm

k8s-ci-robot · 2021-01-22T16:10:30Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: vincepri

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [vincepri]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jan 15, 2021

k8s-ci-robot requested review from detiber and JoelSpeed January 15, 2021 14:04

k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jan 15, 2021

fabriziopandini force-pushed the investigate-clusterresultset-flakes branch from 72a5865 to 4ae0733 Compare January 15, 2021 16:55

fabriziopandini changed the title ~~[WIP] 🐛 investigate ClusterResultSet test flakes~~ 🐛 investigate ClusterResultSet test flakes Jan 15, 2021

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 15, 2021

fabriziopandini changed the title ~~🐛 investigate ClusterResultSet test flakes~~ 🐛 Use different object names for each ClusterResultSet test cases Jan 15, 2021

fabriziopandini changed the title ~~🐛 Use different object names for each ClusterResultSet test cases~~ 🐛 Use a different object name for each ClusterResultSet test case Jan 15, 2021

k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jan 18, 2021

fabriziopandini force-pushed the investigate-clusterresultset-flakes branch from aa7a57d to c69f3e8 Compare January 18, 2021 12:31

k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 18, 2021

k8s-ci-robot added this to the v0.4.0 milestone Jan 19, 2021

vincepri reviewed Jan 19, 2021

View reviewed changes

use different CRS for each test

90dfe19

fabriziopandini force-pushed the investigate-clusterresultset-flakes branch from c69f3e8 to 90dfe19 Compare January 19, 2021 15:59

vincepri approved these changes Jan 22, 2021

View reviewed changes

k8s-ci-robot assigned vincepri Jan 22, 2021

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 22, 2021

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 22, 2021

k8s-ci-robot merged commit e851972 into kubernetes-sigs:master Jan 22, 2021

fabriziopandini deleted the investigate-clusterresultset-flakes branch February 10, 2021 13:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🐛 Use a different object name for each ClusterResultSet test case #4081

🐛 Use a different object name for each ClusterResultSet test case #4081

fabriziopandini commented Jan 15, 2021 •

edited

Loading

fabriziopandini commented Jan 15, 2021

fabriziopandini commented Jan 15, 2021

fabriziopandini commented Jan 15, 2021

fabriziopandini commented Jan 15, 2021

fabriziopandini commented Jan 15, 2021

fabriziopandini commented Jan 15, 2021

fabriziopandini commented Jan 15, 2021

fabriziopandini commented Jan 15, 2021

fabriziopandini commented Jan 15, 2021

jsturtevant commented Jan 15, 2021 •

edited

Loading

jsturtevant commented Jan 15, 2021

fabriziopandini commented Jan 15, 2021

jsturtevant commented Jan 15, 2021

fabriziopandini commented Jan 18, 2021

fabriziopandini commented Jan 18, 2021

fabriziopandini commented Jan 18, 2021

fabriziopandini commented Jan 18, 2021

fabriziopandini commented Jan 18, 2021

fabriziopandini commented Jan 19, 2021

vincepri Jan 19, 2021

vincepri left a comment

vincepri left a comment

k8s-ci-robot commented Jan 22, 2021

		Name: "test-clusterresourceset",
		Name: clusterResourceSetName,

🐛 Use a different object name for each ClusterResultSet test case #4081

🐛 Use a different object name for each ClusterResultSet test case #4081

Conversation

fabriziopandini commented Jan 15, 2021 • edited Loading

fabriziopandini commented Jan 15, 2021

fabriziopandini commented Jan 15, 2021

fabriziopandini commented Jan 15, 2021

fabriziopandini commented Jan 15, 2021

fabriziopandini commented Jan 15, 2021

fabriziopandini commented Jan 15, 2021

fabriziopandini commented Jan 15, 2021

fabriziopandini commented Jan 15, 2021

fabriziopandini commented Jan 15, 2021

jsturtevant commented Jan 15, 2021 • edited Loading

jsturtevant commented Jan 15, 2021

fabriziopandini commented Jan 15, 2021

jsturtevant commented Jan 15, 2021

fabriziopandini commented Jan 18, 2021

fabriziopandini commented Jan 18, 2021

fabriziopandini commented Jan 18, 2021

fabriziopandini commented Jan 18, 2021

fabriziopandini commented Jan 18, 2021

fabriziopandini commented Jan 19, 2021

vincepri Jan 19, 2021

Choose a reason for hiding this comment

vincepri left a comment

Choose a reason for hiding this comment

vincepri left a comment

Choose a reason for hiding this comment

k8s-ci-robot commented Jan 22, 2021

fabriziopandini commented Jan 15, 2021 •

edited

Loading

jsturtevant commented Jan 15, 2021 •

edited

Loading