-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🌱 Proposal for dynamic informer cache #2285
🌱 Proposal for dynamic informer cache #2285
Conversation
Welcome @maxsmythe! |
Hi @maxsmythe. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@vincepri can you please help take a look at this? |
We have also a use case where a feature like this one could be really helpful: we are going to generate at runtime CRDs and we would like to watch all the CRs based on the created CRDs. |
In the queue, should be done reviewing by EOW tomorrow |
/ok-to-test |
@maxsmythe We should probably add some tests before merging as well |
Thanks for the review! Regarding tests, I was holding off until I got signal that there was interest in the idea. It sounds like that hurdle is cleared? |
@vincepri Any feedback on the correct thing to do vis-a-vis licensing, since this code is copied from the Gatekeeper project (which is Apache licensed and CNCF-owned)? Not sure if this requires a NOTICE file, or a citation in the comment or similar. |
Where is the code copied from? In general, I have no idea how that works, but I'd expect code for this specific feature to go be net-new in controller runtime, especially with the changes proposed above |
Code currently lives here: That directory is essentially a fork of controller runtime, where the code changes look pretty much like what I've done in this PR. |
@maxsmythe @vincepri any updates on the proposed PR? This looks so useful for many use cases! |
d9ecde9
to
55353dc
Compare
@braghettos Sorry, I got distracted. Just pushed a first pass responding to feedback now. @vincepri I took a stab at addressing your comments. LMK what you think. Also let me know what, if any, additional tests you'd like to see. Tests pass locally, not sure why there is a failure currently. |
/retest |
Unit tests passed on retest. Still haven't been able to get them to fail locally. Rare flake? |
Any updates on this PR? |
Kubernetes upstream added support for dynamic informer lifecycle handling for the shared informer factory in 1.26. Any chances that we could lean on that? |
@maxsmythe given that we do not have anything to dynamically manage watches on controllers, would you be ok if we change this to |
Thanks for merging!! @vincepri @alvaroaleman Definitely interested in making this more user-friendly. Lemme know how I can help. @alvaroaleman The one concern I would have is memory leaks. The main reason to remove the informer is to also remove the cache data, cleaning stale state. Is there some way to remove the cache but keep a record of interested sources (this could still be a memory leak, but a smaller one)? Is there a way to have two methods (stop informer and remove informer)? Maybe figuring out the edge cases is part of the larger "dynamic watcher" story, that this is more a stopgap on the way to? |
@maxsmythe memory leak because there is still data in the informer (the state of the world before it was stopped) or because of the informer itself? |
Its less about this particular change but about having an overall "Dynamic management of event sources" story which would include things like removing/adding them to a controller post-start. Right now it seems ppl are submitting code for their particular itch but I'd really like someone to step up and have the overall user stories and public interface changes described in a design doc. This would not mean that that person also has to implement any of it but it would ensure that what we merge is aligned with the overall north star. |
Mostly concerned about the state of the world, as that's the larger dataset (there can be a LOT of very large config maps in a cluster, for instance). Having a persistent record of an informer could break down if there is a lot of creation/destruction of CRDs with novel kinds. This seems a harder edge case to get into, but IMO always worth architecting defensively, or at least giving users the ability to mitigate the issue if they're subject to it (though it may require extra effort on their part). WRT "dynamic management of event sources" happy to take a stab at that. AFAIK the first attempt is #2159, which appears to be focused on starting/stopping controllers ad-hoc. Is there any other state I should know about? I can sketch some rough thoughts on a Google doc to start with, unless you have a more preferred way of handling these things? |
Yeah, maybe there is a way to clean up the store in the informer?
Only the linked issues to my knowledge. And starting to sketch something would be really helpful :) The PRs make it too easy to miss the forest from all the trees because these changes are big. |
Still working on the stories. @alvaroaleman Is there a timeline yet for when a release may be cut that has this? |
Crossplane uses a controller engine to dynamically start claim and XR controllers when a new XRD is installed. Before this commit, each controller gets at least one cache. This is because when I built this functionality, you couldn't stop a single informer within a cache (a cache is basically a map of informers by GVK). When realtime composition is enabled, there are even more caches. One per composed resource GVK. A GVK routed cache routes cache lookups to these various delegate caches. Meanwhile, controller-runtime recently made it possible to stop an informer within a cache. It's also been possible to remove an event handler from an informer for some time (since Kubernetes 1.26). kubernetes-sigs/controller-runtime#2285 kubernetes-sigs/controller-runtime#2046 This commit uses a single client, backed by a single cache, across all dynamic controllers (specifically the definition, offered, claim, and XR controllers). Compared to the current implementation, this commit: * Ensures all dynamic controllers use clients backed by the same cache used to power watches (i.e. trigger reconciles). * Takes fewer global locks when realtime compositions are enabled. Locking is now mostly at the controller scope. * Works with the breaking changes to source.Source introduced in controller-runtime v0.18. :) Notably when realtime compositions are enabled, XR controllers will get XRs and composed resources from cache. Before this commit, their client wasn't backed by a cache. They'd get resources directly from the API server. Similarly, the claim controller will read claims from cache. Finally, I think this makes the realtime composition code a little easier to follow by consolodating it into the ControllerEngine, but that's pretty subjective. Signed-off-by: Nic Cope <[email protected]>
Crossplane uses a controller engine to dynamically start claim and XR controllers when a new XRD is installed. Before this commit, each controller gets at least one cache. This is because when I built this functionality, you couldn't stop a single informer within a cache (a cache is basically a map of informers by GVK). When realtime composition is enabled, there are even more caches. One per composed resource GVK. A GVK routed cache routes cache lookups to these various delegate caches. Meanwhile, controller-runtime recently made it possible to stop an informer within a cache. It's also been possible to remove an event handler from an informer for some time (since Kubernetes 1.26). kubernetes-sigs/controller-runtime#2285 kubernetes-sigs/controller-runtime#2046 This commit uses a single client, backed by a single cache, across all dynamic controllers (specifically the definition, offered, claim, and XR controllers). Compared to the current implementation, this commit: * Ensures all dynamic controllers use clients backed by the same cache used to power watches (i.e. trigger reconciles). * Takes fewer global locks when realtime compositions are enabled. Locking is now mostly at the controller scope. * Works with the breaking changes to source.Source introduced in controller-runtime v0.18. :) Notably when realtime compositions are enabled, XR controllers will get XRs and composed resources from cache. Before this commit, their client wasn't backed by a cache. They'd get resources directly from the API server. Similarly, the claim controller will read claims from cache. Finally, I think this makes the realtime composition code a little easier to follow by consolodating it into the ControllerEngine, but that's pretty subjective. Signed-off-by: Nic Cope <[email protected]>
Crossplane uses a controller engine to dynamically start claim and XR controllers when a new XRD is installed. Before this commit, each controller gets at least one cache. This is because when I built this functionality, you couldn't stop a single informer within a cache (a cache is basically a map of informers by GVK). When realtime composition is enabled, there are even more caches. One per composed resource GVK. A GVK routed cache routes cache lookups to these various delegate caches. Meanwhile, controller-runtime recently made it possible to stop an informer within a cache. It's also been possible to remove an event handler from an informer for some time (since Kubernetes 1.26). kubernetes-sigs/controller-runtime#2285 kubernetes-sigs/controller-runtime#2046 This commit uses a single client, backed by a single cache, across all dynamic controllers (specifically the definition, offered, claim, and XR controllers). Compared to the current implementation, this commit: * Ensures all dynamic controllers use clients backed by the same cache used to power watches (i.e. trigger reconciles). * Takes fewer global locks when realtime compositions are enabled. Locking is now mostly at the controller scope. * Works with the breaking changes to source.Source introduced in controller-runtime v0.18. :) Notably when realtime compositions are enabled, XR controllers will get XRs and composed resources from cache. Before this commit, their client wasn't backed by a cache. They'd get resources directly from the API server. Similarly, the claim controller will read claims from cache. Finally, I think this makes the realtime composition code a little easier to follow by consolodating it into the ControllerEngine, but that's pretty subjective. Signed-off-by: Nic Cope <[email protected]>
Crossplane uses a controller engine to dynamically start claim and XR controllers when a new XRD is installed. Before this commit, each controller gets at least one cache. This is because when I built this functionality, you couldn't stop a single informer within a cache (a cache is basically a map of informers by GVK). When realtime composition is enabled, there are even more caches. One per composed resource GVK. A GVK routed cache routes cache lookups to these various delegate caches. Meanwhile, controller-runtime recently made it possible to stop an informer within a cache. It's also been possible to remove an event handler from an informer for some time (since Kubernetes 1.26). kubernetes-sigs/controller-runtime#2285 kubernetes-sigs/controller-runtime#2046 This commit uses a single client, backed by a single cache, across all dynamic controllers (specifically the definition, offered, claim, and XR controllers). Compared to the current implementation, this commit: * Ensures all dynamic controllers use clients backed by the same cache used to power watches (i.e. trigger reconciles). * Takes fewer global locks when realtime compositions are enabled. Locking is now mostly at the controller scope. * Works with the breaking changes to source.Source introduced in controller-runtime v0.18. :) Notably when realtime compositions are enabled, XR controllers will get XRs and composed resources from cache. Before this commit, their client wasn't backed by a cache. They'd get resources directly from the API server. Similarly, the claim controller will read claims from cache. Finally, I think this makes the realtime composition code a little easier to follow by consolodating it into the ControllerEngine, but that's pretty subjective. Signed-off-by: Nic Cope <[email protected]>
Crossplane uses a controller engine to dynamically start claim and XR controllers when a new XRD is installed. Before this commit, each controller gets at least one cache. This is because when I built this functionality, you couldn't stop a single informer within a cache (a cache is basically a map of informers by GVK). When realtime composition is enabled, there are even more caches. One per composed resource GVK. A GVK routed cache routes cache lookups to these various delegate caches. Meanwhile, controller-runtime recently made it possible to stop an informer within a cache. It's also been possible to remove an event handler from an informer for some time (since Kubernetes 1.26). kubernetes-sigs/controller-runtime#2285 kubernetes-sigs/controller-runtime#2046 This commit uses a single client, backed by a single cache, across all dynamic controllers (specifically the definition, offered, claim, and XR controllers). Compared to the current implementation, this commit: * Ensures all dynamic controllers use clients backed by the same cache used to power watches (i.e. trigger reconciles). * Takes fewer global locks when realtime compositions are enabled. Locking is now mostly at the controller scope. * Works with the breaking changes to source.Source introduced in controller-runtime v0.18. :) Notably when realtime compositions are enabled, XR controllers will get XRs and composed resources from cache. Before this commit, their client wasn't backed by a cache. They'd get resources directly from the API server. Similarly, the claim controller will read claims from cache. Finally, I think this makes the realtime composition code a little easier to follow by consolodating it into the ControllerEngine, but that's pretty subjective. Signed-off-by: Nic Cope <[email protected]>
Crossplane uses a controller engine to dynamically start claim and XR controllers when a new XRD is installed. Before this commit, each controller gets at least one cache. This is because when I built this functionality, you couldn't stop a single informer within a cache (a cache is basically a map of informers by GVK). When realtime composition is enabled, there are even more caches. One per composed resource GVK. A GVK routed cache routes cache lookups to these various delegate caches. Meanwhile, controller-runtime recently made it possible to stop an informer within a cache. It's also been possible to remove an event handler from an informer for some time (since Kubernetes 1.26). kubernetes-sigs/controller-runtime#2285 kubernetes-sigs/controller-runtime#2046 This commit uses a single client, backed by a single cache, across all dynamic controllers (specifically the definition, offered, claim, and XR controllers). Compared to the current implementation, this commit: * Takes fewer global locks when realtime compositions are enabled. Locking is now mostly at the controller scope. * Works with the breaking changes to source.Source introduced in controller-runtime v0.18. :) I think this makes the realtime composition code a little easier to follow by consolodating it into the ControllerEngine, but that's pretty subjective. Signed-off-by: Nic Cope <[email protected]>
Crossplane uses a controller engine to dynamically start claim and XR controllers when a new XRD is installed. Before this commit, each controller gets at least one cache. This is because when I built this functionality, you couldn't stop a single informer within a cache (a cache is basically a map of informers by GVK). When realtime composition is enabled, there are even more caches. One per composed resource GVK. A GVK routed cache routes cache lookups to these various delegate caches. Meanwhile, controller-runtime recently made it possible to stop an informer within a cache. It's also been possible to remove an event handler from an informer for some time (since Kubernetes 1.26). kubernetes-sigs/controller-runtime#2285 kubernetes-sigs/controller-runtime#2046 This commit uses a single client, backed by a single cache, across all dynamic controllers (specifically the definition, offered, claim, and XR controllers). Compared to the current implementation, this commit: * Takes fewer global locks when realtime compositions are enabled. Locking is now mostly at the controller scope. * Works with the breaking changes to source.Source introduced in controller-runtime v0.18. :) I think this makes the realtime composition code a little easier to follow by consolodating it into the ControllerEngine, but that's pretty subjective. Signed-off-by: Nic Cope <[email protected]>
Crossplane uses a controller engine to dynamically start claim and XR controllers when a new XRD is installed. Before this commit, each controller gets at least one cache. This is because when I built this functionality, you couldn't stop a single informer within a cache (a cache is basically a map of informers by GVK). When realtime composition is enabled, there are even more caches. One per composed resource GVK. A GVK routed cache routes cache lookups to these various delegate caches. Meanwhile, controller-runtime recently made it possible to stop an informer within a cache. It's also been possible to remove an event handler from an informer for some time (since Kubernetes 1.26). kubernetes-sigs/controller-runtime#2285 kubernetes-sigs/controller-runtime#2046 This commit uses a single client, backed by a single cache, across all dynamic controllers (specifically the definition, offered, claim, and XR controllers). Compared to the current implementation, this commit: * Takes fewer global locks when realtime compositions are enabled. Locking is now mostly at the controller scope. * Works with the breaking changes to source.Source introduced in controller-runtime v0.18. :) I think this makes the realtime composition code a little easier to follow by consolodating it into the ControllerEngine, but that's pretty subjective. Signed-off-by: Nic Cope <[email protected]>
Crossplane uses a controller engine to dynamically start claim and XR controllers when a new XRD is installed. Before this commit, each controller gets at least one cache. This is because when I built this functionality, you couldn't stop a single informer within a cache (a cache is basically a map of informers by GVK). When realtime composition is enabled, there are even more caches. One per composed resource GVK. A GVK routed cache routes cache lookups to these various delegate caches. Meanwhile, controller-runtime recently made it possible to stop an informer within a cache. It's also been possible to remove an event handler from an informer for some time (since Kubernetes 1.26). kubernetes-sigs/controller-runtime#2285 kubernetes-sigs/controller-runtime#2046 This commit uses a single client, backed by a single cache, across all dynamic controllers (specifically the definition, offered, claim, and XR controllers). Compared to the current implementation, this commit: * Takes fewer global locks when realtime compositions are enabled. Locking is now mostly at the controller scope. * Works with the breaking changes to source.Source introduced in controller-runtime v0.18. :) I think this makes the realtime composition code a little easier to follow by consolodating it into the ControllerEngine, but that's pretty subjective. Signed-off-by: Nic Cope <[email protected]>
Crossplane uses a controller engine to dynamically start claim and XR controllers when a new XRD is installed. Before this commit, each controller gets at least one cache. This is because when I built this functionality, you couldn't stop a single informer within a cache (a cache is basically a map of informers by GVK). When realtime composition is enabled, there are even more caches. One per composed resource GVK. A GVK routed cache routes cache lookups to these various delegate caches. Meanwhile, controller-runtime recently made it possible to stop an informer within a cache. It's also been possible to remove an event handler from an informer for some time (since Kubernetes 1.26). kubernetes-sigs/controller-runtime#2285 kubernetes-sigs/controller-runtime#2046 This commit uses a single client, backed by a single cache, across all dynamic controllers (specifically the definition, offered, claim, and XR controllers). Compared to the current implementation, this commit: * Takes fewer global locks when realtime compositions are enabled. Locking is now mostly at the controller scope. * Works with the breaking changes to source.Source introduced in controller-runtime v0.18. :) I think this makes the realtime composition code a little easier to follow by consolodating it into the ControllerEngine, but that's pretty subjective. Signed-off-by: Nic Cope <[email protected]>
Crossplane uses a controller engine to dynamically start claim and XR controllers when a new XRD is installed. Before this commit, each controller gets at least one cache. This is because when I built this functionality, you couldn't stop a single informer within a cache (a cache is basically a map of informers by GVK). When realtime composition is enabled, there are even more caches. One per composed resource GVK. A GVK routed cache routes cache lookups to these various delegate caches. Meanwhile, controller-runtime recently made it possible to stop an informer within a cache. It's also been possible to remove an event handler from an informer for some time (since Kubernetes 1.26). kubernetes-sigs/controller-runtime#2285 kubernetes-sigs/controller-runtime#2046 This commit uses a single client, backed by a single cache, across all dynamic controllers (specifically the definition, offered, claim, and XR controllers). Compared to the current implementation, this commit: * Takes fewer global locks when realtime compositions are enabled. Locking is now mostly at the controller scope. * Works with the breaking changes to source.Source introduced in controller-runtime v0.18. :) I think this makes the realtime composition code a little easier to follow by consolodating it into the ControllerEngine, but that's pretty subjective. Signed-off-by: Nic Cope <[email protected]>
Crossplane uses a controller engine to dynamically start claim and XR controllers when a new XRD is installed. Before this commit, each controller gets at least one cache. This is because when I built this functionality, you couldn't stop a single informer within a cache (a cache is basically a map of informers by GVK). When realtime composition is enabled, there are even more caches. One per composed resource GVK. A GVK routed cache routes cache lookups to these various delegate caches. Meanwhile, controller-runtime recently made it possible to stop an informer within a cache. It's also been possible to remove an event handler from an informer for some time (since Kubernetes 1.26). kubernetes-sigs/controller-runtime#2285 kubernetes-sigs/controller-runtime#2046 This commit uses a single client, backed by a single cache, across all dynamic controllers (specifically the definition, offered, claim, and XR controllers). Compared to the current implementation, this commit: * Takes fewer global locks when realtime compositions are enabled. Locking is now mostly at the controller scope. * Works with the breaking changes to source.Source introduced in controller-runtime v0.18. :) I think this makes the realtime composition code a little easier to follow by consolodating it into the ControllerEngine, but that's pretty subjective. Signed-off-by: Nic Cope <[email protected]>
Crossplane uses a controller engine to dynamically start claim and XR controllers when a new XRD is installed. Before this commit, each controller gets at least one cache. This is because when I built this functionality, you couldn't stop a single informer within a cache (a cache is basically a map of informers by GVK). When realtime composition is enabled, there are even more caches. One per composed resource GVK. A GVK routed cache routes cache lookups to these various delegate caches. Meanwhile, controller-runtime recently made it possible to stop an informer within a cache. It's also been possible to remove an event handler from an informer for some time (since Kubernetes 1.26). kubernetes-sigs/controller-runtime#2285 kubernetes-sigs/controller-runtime#2046 This commit uses a single client, backed by a single cache, across all dynamic controllers (specifically the definition, offered, claim, and XR controllers). Compared to the current implementation, this commit: * Takes fewer global locks when realtime compositions are enabled. Locking is now mostly at the controller scope. * Works with the breaking changes to source.Source introduced in controller-runtime v0.18. :) I think this makes the realtime composition code a little easier to follow by consolodating it into the ControllerEngine, but that's pretty subjective. Signed-off-by: Nic Cope <[email protected]>
Crossplane uses a controller engine to dynamically start claim and XR controllers when a new XRD is installed. Before this commit, each controller gets at least one cache. This is because when I built this functionality, you couldn't stop a single informer within a cache (a cache is basically a map of informers by GVK). When realtime composition is enabled, there are even more caches. One per composed resource GVK. A GVK routed cache routes cache lookups to these various delegate caches. Meanwhile, controller-runtime recently made it possible to stop an informer within a cache. It's also been possible to remove an event handler from an informer for some time (since Kubernetes 1.26). kubernetes-sigs/controller-runtime#2285 kubernetes-sigs/controller-runtime#2046 This commit uses a single client, backed by a single cache, across all dynamic controllers (specifically the definition, offered, claim, and XR controllers). Compared to the current implementation, this commit: * Takes fewer global locks when realtime compositions are enabled. Locking is now mostly at the controller scope. * Works with the breaking changes to source.Source introduced in controller-runtime v0.18. :) I think this makes the realtime composition code a little easier to follow by consolodating it into the ControllerEngine, but that's pretty subjective. Signed-off-by: Nic Cope <[email protected]>
Crossplane uses a controller engine to dynamically start claim and XR controllers when a new XRD is installed. Before this commit, each controller gets at least one cache. This is because when I built this functionality, you couldn't stop a single informer within a cache (a cache is basically a map of informers by GVK). When realtime composition is enabled, there are even more caches. One per composed resource GVK. A GVK routed cache routes cache lookups to these various delegate caches. Meanwhile, controller-runtime recently made it possible to stop an informer within a cache. It's also been possible to remove an event handler from an informer for some time (since Kubernetes 1.26). kubernetes-sigs/controller-runtime#2285 kubernetes-sigs/controller-runtime#2046 This commit uses a single client, backed by a single cache, across all dynamic controllers (specifically the definition, offered, claim, and XR controllers). Compared to the current implementation, this commit: * Takes fewer global locks when realtime compositions are enabled. Locking is now mostly at the controller scope. * Works with the breaking changes to source.Source introduced in controller-runtime v0.18. :) I think this makes the realtime composition code a little easier to follow by consolodating it into the ControllerEngine, but that's pretty subjective. Signed-off-by: Nic Cope <[email protected]>
Crossplane uses a controller engine to dynamically start claim and XR controllers when a new XRD is installed. Before this commit, each controller gets at least one cache. This is because when I built this functionality, you couldn't stop a single informer within a cache (a cache is basically a map of informers by GVK). When realtime composition is enabled, there are even more caches. One per composed resource GVK. A GVK routed cache routes cache lookups to these various delegate caches. Meanwhile, controller-runtime recently made it possible to stop an informer within a cache. It's also been possible to remove an event handler from an informer for some time (since Kubernetes 1.26). kubernetes-sigs/controller-runtime#2285 kubernetes-sigs/controller-runtime#2046 This commit uses a single client, backed by a single cache, across all dynamic controllers (specifically the definition, offered, claim, and XR controllers). Compared to the current implementation, this commit: * Takes fewer global locks when realtime compositions are enabled. Locking is now mostly at the controller scope. * Works with the breaking changes to source.Source introduced in controller-runtime v0.18. :) I think this makes the realtime composition code a little easier to follow by consolodating it into the ControllerEngine, but that's pretty subjective. Signed-off-by: Nic Cope <[email protected]> Signed-off-by: Chuan-Yen Chiang <[email protected]>
This PR shows how Gatekeeper has forked controller runtime to support the dynamic addition/removal of informers.
Happy to flesh this out if people are interested. Not sure what the correct licensing actions are for moving code across CNCF projects.
This is related to #2159 and #1884
Basically, something like this PR will be necessary to clean up informers once no more controllers are using them.
In the interim, having this code would be helpful to Gatekeeper by eliminating our need to maintain a fork, which has been fairly labor intensive. It also may give other members of the community a way to meet their needs for dynamic watches while waiting for the dynamic controller/reference counting approach to be implemented.