Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🌱 ClusterCacheTracker: fix accessor deletion on health check failure #9025

Merged

Conversation

sbueringer
Copy link
Member

@sbueringer sbueringer commented Jul 21, 2023

Signed-off-by: Stefan Büringer [email protected]

What this PR does / why we need it:

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Part of #8948 (should fix the issue for main and release-1.5)

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jul 21, 2023
@sbueringer
Copy link
Member Author

/assign @fabriziopandini @chrischdi @killianmuldoon

I think we should have a follow-up to ensure we improve the test coverage of this critical component for cases like that workload clusters become unreachable

@sbueringer
Copy link
Member Author

sbueringer commented Jul 21, 2023

/cherry-pick release-1.5

This issue was introduced in the 1.5 cycle during the CR bump. So we shouldn't need it in v1.4.
I'll verify if this behavior works in v1.4

EDIT: PR to fix the other half of the issue #9028

@k8s-infra-cherrypick-robot

@sbueringer: once the present PR merges, I will cherry-pick it on top of release-1.5 in a new PR and assign it to you.

In response to this:

/cherry-pick release-1.5

This issue was introduced in the 1.5 cycle during the CR bump. So we shouldn't need it in v1.4.
I'll verify if this behavior works in v1.4

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Member

@chrischdi chrischdi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 21, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: de607ad529c83b17301f7729c470877bedc92a45

@sbueringer
Copy link
Member Author

/assign @vincepri

Copy link
Member

@fabriziopandini fabriziopandini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
nice find!

@fabriziopandini
Copy link
Member

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: fabriziopandini

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 21, 2023
@killianmuldoon
Copy link
Contributor

/cherry-pick release-1.5

@k8s-infra-cherrypick-robot

@killianmuldoon: once the present PR merges, I will cherry-pick it on top of release-1.5 in a new PR and assign it to you.

In response to this:

/cherry-pick release-1.5

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-infra-cherrypick-robot

@sbueringer: new pull request created: #9031

In response to this:

/cherry-pick release-1.5

This issue was introduced in the 1.5 cycle during the CR bump. So we shouldn't need it in v1.4.
I'll verify if this behavior works in v1.4

EDIT: PR to fix the other half of the issue #9028

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sbueringer
Copy link
Member Author

sbueringer commented Jul 21, 2023

Just more context for future us. We don't need this on release-1.4 or older because there we still could precisely check for the error returned by the wait func. This changed and with >= release-1.5 we can't diferentiate between wait func timeout and client-go timeout.

1.4 code:

	err := wait.PollImmediateUntil(in.interval, runHealthCheckWithThreshold, ctx.Done())
	// An error returned implies the health check has failed a sufficient number of
	// times for the cluster to be considered unhealthy
	// NB. we are ignoring ErrWaitTimeout because this error happens when the channel is close, that in this case
	// happens when the cache is explicitly stopped.
	if err != nil && err != wait.ErrWaitTimeout {
		t.log.Error(err, "Error health checking cluster", "Cluster", klog.KRef(in.cluster.Namespace, in.cluster.Name))
		t.deleteAccessor(ctx, in.cluster)
	}

(but the old code was a bit brittle anyway so I'm happy with the new one in any case)

@sbueringer sbueringer deleted the pr-fix-cct-healthcheck branch July 21, 2023 12:15
@killianmuldoon
Copy link
Contributor

/area clustercachetracker

@k8s-ci-robot k8s-ci-robot added the area/clustercachetracker Issues or PRs related to the clustercachetracker label Jul 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/clustercachetracker Issues or PRs related to the clustercachetracker cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants