-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: remediator missing custom resource events #1441
fix: remediator missing custom resource events #1441
Conversation
528238f
to
7abd4b0
Compare
/retest |
7abd4b0
to
7e46681
Compare
85b0b0a
to
d3bf55d
Compare
d3bf55d
to
d7ba23e
Compare
22cc5cd
to
eb6b50f
Compare
b6090a4
to
fe7bb01
Compare
@@ -416,6 +418,14 @@ func (r *RootSyncReconciler) deleteManagedObjects(ctx context.Context, reconcile | |||
|
|||
// Register RootSync controller with reconciler-manager. | |||
func (r *RootSyncReconciler) Register(mgr controllerruntime.Manager, watchFleetMembership bool) error { | |||
r.lock.Lock() | |||
defer r.lock.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need the lock? The Register
function is only invoked once and won't be re-registered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lock protects the controller from read/write race condition. But the way we're calling Register today it probably won't ever be called in parallel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't technically need it today, it's just good defensive practice to protect variables that could be used in parallel.
Locking is cheap. Lock contention is what's expensive.
fe7bb01
to
11a7118
Compare
Prior to this change, the remediator watches were only being started for new custom resources after the apply attempt had fully completed. This left some time after the object was applied that the remediator could miss events made by third-parties. Normally, this would be fine, because the remediator would revert any change after the watch was started. But if a DELETE event was missed, the object wouldn't be recreated until the next apply attempt. This change adds a CRD Controller to the remediator that watches CRDs and executes any registered handlers when the CRD is established, unestablished, or deleted. The remediator now registers CRD handlers for each resource type it watches, starting watchers as soon as possible, without waiting for the next apply attempt.
11a7118
to
f4e5560
Compare
For posterity, since inline comment threads can get lost... Unfortunately, neither the DynamicRESTMapper in controller-runtime nor the DeferredDiscoveryRESTMapper/CachedDiscoveryClient in client-go implement auto-invalidation of resources, unless aggregated discovery is enabled on the server. While aggregated discovery is beta in 1.27+ and thus enabled by default, that doesn't guarantee that it will be enabled on non-GKE clusters.
So I've added a new ReplaceOnResetRESTMapper that can be used to replace the RESTMapper when Reset is called. Then I added some code to the watch.Manager to handle calling Reset on the mapper when a CRD is established or unestablished, if the mapper still knows about the deleted resource or doesn't know about a new resource. This handles auto-discovery and auto-invalidation of resources in the RESTMapper, but it's a relatively inefficient. Hopefully at some point in the future we can make aggregated discovery a requirement and use a simplified DiscoveryClient to handle both auto-discovery and auto-invalidation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: nan-yu The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
89f6973
into
GoogleContainerTools:main
Prior to this change, the remediator watches were only being started
for new custom resources after the apply attempt had fully completed.
This left some time after the object was applied that the remediator
could miss events made by third-parties. Normally, this would be fine,
because the remediator would revert any change after the watch was
started. But if a DELETE event was missed, the object wouldn't be
recreated until the next apply attempt.
This change adds a CRD Controller to the remediator that watches CRDs
and executes any registered handlers when the CRD is established,
unestablished, or deleted. The remediator now registers CRD handlers
for each resource type it watches, starting watchers as soon as
possible, without waiting for the next apply attempt.
This change also adds a ClusterRole and ClusterRoleBinding specifically
for RepoSync reconcilers, to allow watching of CRDs. RootSync
reconcilers now also watch CRDs, but they already have a CR & CRD.
Fixes: b/355532135
Extracted: