Don't retry health check when Unauthorized is returned

This fixes unnecessary delays in self-repair when the underlying provider (e.g. the AWS provider) performs periodic kubeconfig refreshes. Without this, the health check is retried 10 times with a 10s interval, meaning that the controller is unable to act for at least 100 seconds, even though it could repair itself immediately.
kubernetes-sigs · Oct 6, 2022 · b1e45cb · b1e45cb
1 parent a46c990
commit b1e45cb
Showing 1 changed file with 6 additions and 0 deletions.
diff --git a/controllers/remote/cluster_cache_tracker.go b/controllers/remote/cluster_cache_tracker.go
@@ -484,6 +484,12 @@ func (t *ClusterCacheTracker) healthCheckCluster(ctx context.Context, in *health
 		// If no error occurs, reset the unhealthy counter.
 		_, err := restClient.Get().AbsPath(in.path).Timeout(in.requestTimeout).DoRaw(ctx)
 		if err != nil {
+			if apierrors.IsUnauthorized(err) {
+				// Unauthorized means that the underlying kubeconfig is not authorizing properly anymore, which
+				// usually is the result of automatic kubeconfig refreshes, meaning that we have to throw away the
+				// clusterAccessor and rely on the creation of a new one (with a refreshed kubeconfig)
+				return false, err
+			}
 			unhealthyCount++
 		} else {
 			unhealthyCount = 0