-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ignore events that are outside of namespaces that the controller manages #5058
Ignore events that are outside of namespaces that the controller manages #5058
Conversation
This PR takes the most minimal approach to solving the issue referenced in #4168, which references the issue where our custom CRDs themselves, from namespace |
Yeah, my testing is showing that these will need to be ignored as well. That change is a WIP. |
Ok, changes complete. I'm not seeing these error messages any longer when running 2 operators within the same cluster, managing 2x separate namespaces: operator 1
operator 2
With the following resources
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we could not assume that there is only one predicate, and also that this predicate is immutable. Based on this assumption we would have the predicate initialized in only one place, and shared as a public variable.
For example, the predicate can be managed in a predicate
package:
package predicate
import (
"sigs.k8s.io/controller-runtime/pkg/event"
"sigs.k8s.io/controller-runtime/pkg/predicate"
)
var ManagedNamespacePredicate predicate.Predicate
// NewManagedNamespacesPredicate will return a predicate that will ignore events
// that exist outside of the given managed namespaces,
func NewManagedNamespacesPredicate(managedNamespaces []string) predicate.Predicate {
/* logic to create the predicate*/
}
The predicate would be then initialized in main.go
:
// configure the manager cache based on the number of managed namespaces
managedNamespaces := viper.GetStringSlice(operator.NamespacesFlag)
predicate.ManagedNamespacePredicate = predicate.NewManagedNamespacesPredicate(managedNamespaces)
switch {
case len(managedNamespaces) == 0:
....
The predicate can now be used from any controller, without the need of adding a new function parameter, or a new member in a struct.
For example:
// addWatches adds watches for all resources this controller cares about
func addWatches(c controller.Controller, r *ReconcileBeat) error {
// Watch for changes to Beat
if err := c.Watch(&source.Kind{Type: &beatv1beta1.Beat{}}, &handler.EnqueueRequestForObject{}, predicate.ManagedNamespacePredicate); err != nil {
return err
}
// Watch DaemonSets
if err := c.Watch(&source.Kind{Type: &appsv1.DaemonSet{}}, &handler.EnqueueRequestForOwner{
IsController: true,
OwnerType: &beatv1beta1.Beat{},
}, predicate.ManagedNamespacePredicate); err != nil {
return err
}
// Watch Deployments
if err := c.Watch(&source.Kind{Type: &appsv1.Deployment{}}, &handler.EnqueueRequestForOwner{
IsController: true,
OwnerType: &beatv1beta1.Beat{},
}, predicate.ManagedNamespacePredicate); err != nil {
return err
}
// Watch Pods, to ensure `status.version` is correctly reconciled on any change.
// Watching Deployments or DaemonSets only may lead to missing some events.
if err := watches.WatchPods(c, beatcommon.NameLabelName, predicate.ManagedNamespacePredicate); err != nil {
return err
}
// Watch owned and soft-owned Secrets
if err := c.Watch(&source.Kind{Type: &corev1.Secret{}}, &handler.EnqueueRequestForOwner{
IsController: true,
OwnerType: &beatv1beta1.Beat{},
}, predicate.ManagedNamespacePredicate); err != nil {
return err
}
if err := watches.WatchSoftOwnedSecrets(c, beatv1beta1.Kind, predicate.ManagedNamespacePredicate); err != nil {
return err
}
// Watch dynamically referenced Secrets
return c.Watch(&source.Kind{Type: &corev1.Secret{}}, r.dynamicWatches.Secrets, predicate.ManagedNamespacePredicate)
}
@barkbay Yeah, those suggested changes would make things cleaner. Thanks for the suggestion. I'll get those changes taken care of. |
777a727
to
8c32098
Compare
I'm investigating why integration tests are failing in |
Resolved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While doing a new review I realized that the predicate is missing in the trial controller.
This made me think that we should maybe try another approach, in which we don't have to pass the predicate as a parameter in all the Watch(...)
functions. I remembered that most of the controllers are created using a custom function, common.NewController
. For example in the case of Elasticsearch:
// Add creates a new Elasticsearch Controller and adds it to the Manager with default RBAC. The Manager will set fields
// on the Controller and Start it when the Manager is Started.
// this is also called by cmd/main.go
func Add(mgr manager.Manager, params operator.Parameters) error {
reconciler := newReconciler(mgr, params)
c, err := common.NewController(mgr, name, reconciler, params)
if err != nil {
return err
}
return addWatches(c, reconciler)
}
(There is the exception of the webhook certificate controller , but maybe we can also use common.NewController(...)
here ?)
I think an alternative would be to inject the predicate in a new implementation of the controller.Controller
interface, something along those lines in pkg/controller/common/controller.go
:
// NewController creates a new controller with the given name, reconciler and parameters and registers it with the manager.
func NewController(mgr manager.Manager, name string, r reconcile.Reconciler, p operator.Parameters) (controller.Controller, error) {
c, err := controller.New(name, mgr, controller.Options{Reconciler: r, MaxConcurrentReconciles: p.MaxConcurrentReconciles})
if err != nil {
return nil, err
}
return newNamespaceAwareWatchersController(c, p.ManagedNamespaces, p.OperatorNamespace), nil
}
var _ controller.Controller = &namespaceAwareController{}
// namespaceAwareController implements the controller.Controller interface and automatically include a predicate to filter events
// which are not in a managed namespace.
type namespaceAwareController struct {
controller.Controller
namespacePredicate predicate.Predicate
}
func newNamespaceAwareWatchersController(c controller.Controller, managedNamespaces []string, operatorNamespace string) controller.Controller {
watchedNamespaces := managedNamespaces
if !slices.Contains(managedNamespaces, operatorNamespace) {
watchedNamespaces = append(watchedNamespaces, operatorNamespace)
}
return &namespaceAwareController{
Controller: c,
namespacePredicate: predicate.Funcs{
CreateFunc: func(e event.CreateEvent) bool {
return IsNamespaceManaged(e.Object.GetNamespace(), managedNamespaces)
},
UpdateFunc: func(e event.UpdateEvent) bool {
return IsNamespaceManaged(e.ObjectNew.GetNamespace(), managedNamespaces)
},
DeleteFunc: func(e event.DeleteEvent) bool {
return IsNamespaceManaged(e.Object.GetNamespace(), managedNamespaces)
},
GenericFunc: func(e event.GenericEvent) bool {
return IsNamespaceManaged(e.Object.GetNamespace(), managedNamespaces)
},
},
}
}
func (n *namespaceAwareController) Watch(src source.Source, eventhandler handler.EventHandler, predicates ...predicate.Predicate) error {
return n.Controller.Watch(src, eventhandler, append(predicates, n.namespacePredicate)...)
}
Sorry for all those back and forth, I think that if this approach works it'll make this PR a bit smaller and reduce the chance of a missing a controller.
cmd/manager/main.go
Outdated
@@ -466,6 +467,9 @@ func startOperator(ctx context.Context) error { | |||
|
|||
// configure the manager cache based on the number of managed namespaces | |||
managedNamespaces := viper.GetStringSlice(operator.NamespacesFlag) | |||
// initialize the managed namespace predicate to ignore events outside of the namespaces the operator is concerned with | |||
predicates.ManagedNamespacePredicate = predicates.NewManagedNamespacesPredicate(managedNamespaces) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also watch resources in the operatorNamespace
, to reconcile on license events for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @barkbay for all of the suggestions in this PR. Don't worry about the back and forth at all. The last suggestion significantly cleaned up the changes. 👍 As for this comment, the operator namespace is considered in the common controller newNamespaceAwareWatchersController
func now, and this code is simply gone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After doing some additional tests it turns out that we also need to update telemetry.go
. First error is raised when reading Kibana secrets here:
cloud-on-k8s/pkg/telemetry/telemetry.go
Lines 150 to 153 in 7122d8d
if err := r.client.Get(context.Background(), nsName, &secret); err != nil { | |
log.Error(err, "failed to get Kibana secret") | |
continue | |
} |
{
"log.level": "error",
"@timestamp": "2021-12-09T08:49:49.727Z",
"log.logger": "usage",
"message": "failed to get Kibana secret",
"service.version": "2.0.0-SNAPSHOT+a25633ec",
"service.type": "eck",
"ecs.version": "1.4.0",
"error": "unable to get: namespace-a/kb-apm-sample-kb-config because of unknown namespace for the cache",
"error.stack_trace": "github.com/elastic/cloud-on-k8s/cmd/manager.asyncTasks.func2\n\t/go/src/github.com/elastic/cloud-on-k8s/cmd/manager/main.go:655"
}
An option might be to use once again the predicate as we do for GarbageCollectAllSoftOwnedOrphanSecrets
. However I fear we'll have the same problem in all the getStatsFn
functions (esStats
, kbStats
...). I'm then wondering if a better solution would not be to reuse the original flag, as provided by the user, when the reporter is created here:
if !disableTelemetry {
// Start the telemetry reporter
go func() {
tr := telemetry.NewReporter(operatorInfo, mgr.GetClient(), operatorNamespace, viper.GetStringSlice(operator.NamespacesFlag), telemetryInterval)
tr.Start()
}()
}
Overall I think we have a problem with the managedNamespaces
, it's confusing to have this empty string. Maybe we should just make it immutable:
// Add the empty namespace to allow watching cluster-scoped resources if storage class validation is enabled.
if viper.GetBool(operator.ValidateStorageClassFlag) {
opts.NewCache = cache.MultiNamespacedCacheBuilder(append(managedNamespaces, ""))
} else {
opts.NewCache = cache.MultiNamespacedCacheBuilder(managedNamespaces)
}
@barkbay may I ask how you generated this error?
cloud-on-k8s/pkg/controller/common/reconciler/secret.go Lines 181 to 184 in 7122d8d
but telemetry is doing a namespace scoped list operation, and then iterating over it, which should only get Kibanas in it's set of managed namespaces, right? cloud-on-k8s/pkg/telemetry/telemetry.go Line 143 in 7122d8d
To which flag are you referring? Surely not
I agree that it's confusing, but isn't that what the apimachinery uses to designate all namespaces:
Do you see a path to making this less confusing in this code? |
I have a "restricted" operator running with
No because
Yes, I was referring to
I don't think we want to watch all the namespaces. We want to watch
I was wondering if adding // Add the empty namespace to allow watching cluster-scoped resources if storage class validation is enabled.
if viper.GetBool(operator.ValidateStorageClassFlag) {
opts.NewCache = cache.MultiNamespacedCacheBuilder(append(managedNamespaces, ""))
} else {
opts.NewCache = cache.MultiNamespacedCacheBuilder(managedNamespaces)
} |
My apologies. I didn't realize all your initial comments were coming from the fact that we do this:
now it all makes sense. I completely agree that it's quite confusing to have the "all namespaces" namespace in the list of managed namespaces. I'll get these changes in. |
…ument to the 'addWatches' functions in most of the packages.
…g into account managed namespaces
…ror message when storage class validation is enabled during secrets garbage collection.
Co-authored-by: Michael Morello <[email protected]>
always add managedNamespacePredicate in addWatches in association package
Adjust Webhooks controller to use the common path for creating a controller.
…ed atm. remove WithPredicates as it's not used anymore either. remove predicates in remoteca controller create as it's using common controller, so unneeded.
… of controller options on create. Use NewControllerWithOptions in webhook controller to avoid the default setting of MaxConcurrentReconciles.
…empty (managing all namespaces)
7699ec5
to
9f83fff
Compare
Fixed the integration test, which caught an issue where we weren't managing the empty slice of managed namespaces properly when initializing the controller. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spotted another place where some resources are listed and processed without predicates/options:
cloud-on-k8s/pkg/controller/license/list.go
Lines 19 to 23 in 3016db4
// list all clusters | |
err := c.List(context.Background(), &clusters) | |
if err != nil { | |
return nil, err | |
} |
We might have the same issue in the remote ca controller:
cloud-on-k8s/pkg/controller/remoteca/controller.go
Lines 222 to 225 in 3016db4
var list esv1.ElasticsearchList | |
if err := c.List(context.Background(), &list, &client.ListOptions{}); err != nil { | |
return nil, err | |
} |
I'm afraid fixing all these errors / events and maintaining the code is tricky 😕
func newNamespaceAwareWatchersController(c controller.Controller, managedNamespaces []string, operatorNamespace string) controller.Controller { | ||
watchedNamespaces := managedNamespaces | ||
// if the length of watchedNamespaces is 0, then we're watching all namespaces, and shouldn't append anything to the slice, as | ||
// it will just cause issues wth the managed namespaces predicate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// it will just cause issues wth the managed namespaces predicate. | |
// it will just cause issues with the managed namespaces predicate. |
// if the length of watchedNamespaces is 0, then we're watching all namespaces, and shouldn't append anything to the slice, as | ||
// it will just cause issues wth the managed namespaces predicate. | ||
if len(watchedNamespaces) > 0 && !slices.Contains(watchedNamespaces, operatorNamespace) { | ||
watchedNamespaces = append(watchedNamespaces, operatorNamespace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it really required ? The operatorNamespace
is supposed to be already added to the managedNamespaces
: https://github.com/elastic/cloud-on-k8s/pull/5058/files#diff-435eecb1d2af40ee96747f88ab71eb2758da989c92f76dcb1f714fc1b300c633R479-R480
While having a look at the multicache implementation I noticed that cluster scoped resources are now handled: // If the object is clusterscoped, get the informer from clusterCache,
// if not use the namespaced caches.
isNamespaced, err := objectutil.IsAPINamespaced(obj, c.Scheme, c.RESTMapper)
if err != nil {
return nil, err
}
if !isNamespaced {
clusterCacheInf, err := c.clusterCache.GetInformer(ctx, obj)
if err != nil {
return nil, err
}
informers[globalCache] = clusterCacheInf
return &multiNamespaceInformer{namespaceToInformer: informers}, nil
} (see this PR) I think we can just stop adding the empty namespace: we should still be able to read |
@barkbay great catch. I'm testing to see if this simple fix is sufficient now in a separate branch. Will report back. |
closing this per #5187 |
This will ignore any CRDs laid down within a Kubernetes cluster that are outside of the namespaces that ECK is set to manage (
managedNamespaces
option).resolves #4168