Enable filtered list watches as watches #244

nkvoll · 2018-12-12T12:08:02Z

When setting up watches during initialization it's currently not possible to filter by any selectors (which is possible using list watches).

For example it is not possible to only watch pods with specific labels (e.g having the label pod-type: my-controller-type). The current behavior results in very broad caching, which might not be desirable for large Kubernetes deployments.

In some scenarios an operator could contain multiple controllers, and they all share caches, so keying caches on GVK's alone might be problematic if they want to watch the same resource type, but with different filters.

When doing List/Get, how would one decide which of the caches to use? It seems that perhaps this needs to be an explicit choice by the operator developers?

The text was updated successfully, but these errors were encountered:

jeefy · 2018-12-28T14:53:19Z

+1

I've run into a similar issue with a controller I'm writing. I wound up starting a goroutine when the controller gets added to watch/act on certain objects that are created and labeled by another controller outside of mine.

DirectXMan12 · 2019-01-04T00:01:09Z

we'd have to specify this at the manager level, but it should be possible to pass restricting items.

DirectXMan12 · 2019-01-04T00:03:15Z

(since caches are shared)

DirectXMan12 · 2019-01-04T00:03:49Z

we could also let users manually specify auxiliary caches

DirectXMan12 · 2019-01-04T00:04:00Z

if anyone has ideas for a design doc, I'd love to see them.

DirectXMan12 · 2019-01-04T01:50:40Z

/kind feature
/priority important-soon

this has been coming up a lot lately, and seems like a good thing to tackle in the list of related issues.

DirectXMan12 · 2019-01-04T02:09:55Z

i.e. one sketch:

Have a list-options watch predicate, smartly detect the presence of that (in some generic way) and pass it down to GetInformer. If we already have a global informer, just use that. If we don't have a global informer, create a new filtered informer. We delay creation of informers to figure out if one person needs a global informer, and just use that, to allow splitting and combining seamlessly. The question is how to deal with the cached client:

❌ cached client always tries for a global informer (seems like a performance footgun)
❌ only one filtered informer max (breaks multi-controller managers)
❌ cache client refuses to use global informer if filtered informers exist and a global informer does not (breaks multi-controller informers)
🆗 keep track of which controller a given client belongs to and which watches that client set up, and try to scope those gets to the informer it belongs to, in the case of filtered informers (complicated, serious internal refactors)
🆗 allow users to turn off auto-caching, make them explicitly get filtered caching clients if they want (doable, more work for users). This potentially lets us allow users to specify that they don't want the client to ever cache a particular object (could encourage wrong patterns).

That last one might solve multiple problems in one fell swoop, but I'd need to see a design sketch to make sure that it's not too arcane.

poidag-zz · 2019-01-30T21:46:04Z

I will follow this to understand how this is properly implemented and any questions from a user or testing perspective please ask and I'll do my best to assist. In the interim I've implemented this for my own use case.

kvp := &keyValuePair{
	key:   "app",
	value: "test",
}

	filter := newFilter(keyValueFilter(kvp))

	err = c.Watch(&source.Kind{Type: &v1.Deployment{}}, &handler.EnqueueRequestForObject{}, filter)
if err != nil {
	return err
}

type keyValuePair struct {
	key   string
	value string
}

var kvp keyValuePair

type filterFuncs struct {
	// Create returns true if the Create event should be processed
	CreateFunc func(event.CreateEvent) bool

	// Delete returns true if the Delete event should be processed
	DeleteFunc func(event.DeleteEvent) bool

	// Update returns true if the Update event should be processed
	UpdateFunc func(event.UpdateEvent) bool

	// Generic returns true if the Generic event should be processed
	GenericFunc func(event.GenericEvent) bool
	Filter      keyValuePair
}

func keyValueFilter(kvp *keyValuePair) func(*filterFuncs) {
	return func(f *filterFuncs) {
		f.Filter.key = kvp.key
		f.Filter.value = kvp.value
	}
}

func newFilter(opts ...func(*filterFuncs)) *filterFuncs {
	f := &filterFuncs{}
	for _, opt := range opts {
		opt(f)
	}
	return f
}


func (p filterFuncs) Create(e event.CreateEvent) bool {
	labels := e.Meta.GetLabels()
	if val, ok := labels[p.Filter.key]; ok {
		if val == p.Filter.value {
			return true
		}
	}
	return false
}

DirectXMan12 · 2019-01-31T19:45:29Z

That's just like a predicate, though -- it doesn't actually get you most of the benefits of the filtering that people are asking for here, since it doesn't ask for server-side filtering.

poidag-zz · 2019-01-31T20:06:01Z

I know... As I said above the code block

I will follow this to understand how this is properly implemented and any questions from a user or testing perspective please ask and I'll do my best to assist. In the interim I've implemented this for my own use case.

DirectXMan12 · 2019-01-31T21:27:42Z

Ack, just wanted to make sure we were on the same page :-)

fejta-bot · 2019-05-01T21:35:28Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2019-05-31T22:20:11Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

DirectXMan12 · 2019-06-04T00:36:22Z

/remove-lifecycle stale

fejta-bot · 2019-07-04T01:20:07Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot · 2019-07-04T01:20:14Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

alvaroaleman · 2019-07-04T07:18:18Z

/reopen
/remove-lifecycle rotten

k8s-ci-robot · 2019-07-04T07:18:19Z

@alvaroaleman: Reopened this issue.

In response to this:

/reopen
/remove-lifecycle rotten

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

shawn-hurley · 2019-08-12T15:20:34Z

bumping this feature request:

allow users to turn off auto-caching, make them explicitly get filtered caching clients if they want (doable, more work for users). This potentially lets us allow users to specify that they don't want the client to ever cache a particular object (could encourage wrong patterns).

I think this is a viable option, but I wonder if we could have a more straightforward pattern like a user specifies a watch with a label. If they do, then we use the label-filtered cache for that specific GVK. If another watch is created without that label-filter, then we will error out and not establish the watch. The most significant caveat that I see is the client will only use the label filtered cache. We would need to make that clear in the documentation, or is there some other mechanism to alert the user to this.

The controller-runtime is missing filtering it's cache by labels or fields [1], this means that all the kubernetes-nmstate-handlers will read all the nodes and nodenetworkstates every period, clearly dies does not scale since kubernetes-nmstate-handler runs at as daemonset meaning that there is one handler running at every node so the bigger the cluster the bigger the problem. This change replace the default controller-runtime cache with an implementation that can be configured to use some field selectors depending on the resource, this way we can filter by "metadata.name" using the node name for "node" and "nodenetworkstate" so only one instance of them is feteched. [1] kubernetes-sigs/controller-runtime#244 Signed-off-by: Quique Llorente <[email protected]>

pagarwal-tibco · 2021-03-01T04:16:18Z

Any updates on this?

qinqon · 2021-03-01T09:45:04Z

I have try a PR to specify a fieldselector at manager build time #1404.

The controller-runtime is missing filtering it's cache by labels or fields [1], this means that all the kubernetes-nmstate-handlers will read all the nodes and nodenetworkstates every period, clearly dies does not scale since kubernetes-nmstate-handler runs at as daemonset meaning that there is one handler running at every node so the bigger the cluster the bigger the problem. This change replace the default controller-runtime cache with an implementation that can be configured to use some field selectors depending on the resource, this way we can filter by "metadata.name" using the node name for "node" and "nodenetworkstate" so only one instance of them is feteched. [1] kubernetes-sigs/controller-runtime#244 Signed-off-by: Quique Llorente <[email protected]>

timebertt · 2021-04-16T07:07:15Z

New PR: #1435

invidian · 2021-04-29T14:14:00Z

#1435 is now merged, so is this resolved?

taylormgeorge91 · 2021-04-29T14:18:00Z

It is being worked into a release currently @invidian

Looking at the tags there is a v0.9.0-beta available if you want to test.

invidian · 2021-04-29T14:26:44Z

Looking at the tags there is a v0.9.0-beta available if you want to test.

Thanks, will do!

estroz · 2021-04-29T15:51:41Z

#1435 is now merged, so is this resolved?

@invidian yes I believe this is resolved.

/close

k8s-ci-robot · 2021-04-29T15:51:47Z

@estroz: Closing this issue.

In response to this:

#1435 is now merged, so is this resolved?

@invidian yes I believe this is resolved.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

roee88 · 2021-07-19T19:40:32Z

It seems like the current solution doesn't address the use case mentioned in
#244 (comment).

It should be possible to create a namespace scoped client in ListFunc and WatchFunc if there is a field selector for metadata.namespace (vs the current logic of just looking at the global namespace field). Is that reasonable?

alvaroaleman · 2021-07-19T19:44:34Z

@roee88 what was merged allows to set a FieldSelector so yes, this is possible

roee88 · 2021-07-19T20:07:45Z

To clarify, I suspect that without affecting the parameters to NamespaceIfScoped, the selector opts have no effect on the required rbac (cluster role bindings vs role bindings). For example here:

controller-runtime/pkg/cache/internal/informers_map.go

Lines 282 to 288 in 64b1c72

    
           WatchFunc: func(opts metav1.ListOptions) (watch.Interface, error) { 
        
           	ip.selectors[gvk].ApplyToList(&opts) 
        
           	// Watch needs to be set to true separately 
        
           	opts.Watch = true 
        
           	isNamespaceScoped := ip.namespace != "" && mapping.Scope.Name() != meta.RESTScopeNameRoot 
        
           	return client.Get().NamespaceIfScoped(ip.namespace, isNamespaceScoped).Resource(mapping.Resource.Resource).VersionedParams(&opts, ip.paramCodec).Watch(ctx) 
        
           },

My question is whether changing the code here makes sense. Specifically for the case where ip.namespace is empty, the value from a metadata.namespace field selector should be used (if exists).

cc @shlomitk1

alvaroaleman · 2021-07-19T20:14:27Z

My question is whether changing the code here makes sense. Specifically for the case where ip.namespace is empty, the value from a metadata.namespace field selector should be used (if exists).

That sounds reasonable

thesuperzapper · 2024-11-10T03:19:04Z

My recommendation is to do something similar to this PR in external-secrets:

feat: significantly reduce api calls and introduce partial secret cache external-secrets/external-secrets#4086

Disable Caching Manager's Default Client

You can disable the cache for specific resource kind in your manager.Manager default client using DisableFor in the client.CacheOptions:

For example, your outer main() function might look like this:

import (
	corev1 "k8s.io/api/core/v1"
	"k8s.io/apimachinery/pkg/runtime"
	clientgoscheme "k8s.io/client-go/kubernetes/scheme"
	"log/slog"
	"os"
	ctrl "sigs.k8s.io/controller-runtime"
	"sigs.k8s.io/controller-runtime/pkg/client"
	metricsserver "sigs.k8s.io/controller-runtime/pkg/metrics/server"
)

func main() {
	logger := slog.New(slog.NewTextHandler(os.Stdout, nil))
	restConfig := ctrl.GetConfigOrDie()
	scheme := runtime.NewScheme()

	// add default Kubernetes types to the scheme
	if err := clientgoscheme.AddToScheme(scheme); err != nil {
		logger.Error("failed to add Kubernetes types to scheme", "error", err)
		os.Exit(1)
	}
	// add your custom types to the scheme
	//if err := XXXXv1beta1.AddToScheme(scheme); err != nil {
	//	logger.Error("failed to add XXXX types to scheme", "error", err)
	//	os.Exit(1)
	//}

	ctrlOpts := ctrl.Options{
		Scheme: scheme,
		Client: client.Options{
			Cache: &client.CacheOptions{
				// this disables caching for all Secrets and ConfigMaps
				// as caching all secrets can take a LOT of memory in a large cluster
				DisableFor: []client.Object{
					&corev1.Secret{},
					&corev1.ConfigMap{},
				},
			},
		},
		Metrics: metricsserver.Options{
			BindAddress: "0", // not required, just for example
		},
		HealthProbeBindAddress: "0",   // not required, just for example
		LeaderElection:         false, // not required, just for example
	}

	// create a new manager
	mgr, err := ctrl.NewManager(restConfig, ctrlOpts)
	if err != nil {
		logger.Error("unable to create manager", "error", err)
		os.Exit(1)
	}

	//
	// ADD THE BuildManagedSecretClient() FUNCTION HERE
	// SEE THE NEXT SNIPPET FOR AN EXAMPLE
	//

	// start the manager
	logger.Info("starting manager")
	err = mgr.Start(ctrl.SetupSignalHandler())
	if err != nil {
		logger.Error("unable to start manager", "error", err)
		os.Exit(1)
	}
}

Example of Filtered Cached Client

You can construct a new filtered client using ByObject from cache.Options.

(WARNING: a client which uses a filtered cache will be completely unable to see resources in the cluster which don't match the cache, even if they exist)

For example, here is the one from external-secrets which only caches secrets with the reconcile.external-secrets.io/managed=true label:

import (
	corev1 "k8s.io/api/core/v1"
	"k8s.io/apimachinery/pkg/labels"
	"k8s.io/apimachinery/pkg/selection"
	ctrl "sigs.k8s.io/controller-runtime"
	"sigs.k8s.io/controller-runtime/pkg/cache"
	"sigs.k8s.io/controller-runtime/pkg/client"
)

func BuildManagedSecretClient(mgr ctrl.Manager) (client.Client, error) {
	// secrets we manage will have the `reconcile.external-secrets.io/managed=true` label
	managedLabelReq, _ := labels.NewRequirement("reconcile.external-secrets.io/managed", selection.Equals, []string{"true"})
	managedLabelSelector := labels.NewSelector().Add(*managedLabelReq)

	// create a new cache with a label selector for managed secrets
	// NOTE: this means that the cache/client will be unable to see secrets without the "managed" label
	secretCacheOpts := cache.Options{
		HTTPClient: mgr.GetHTTPClient(),
		Scheme:     mgr.GetScheme(),
		Mapper:     mgr.GetRESTMapper(),
		ByObject: map[client.Object]cache.ByObject{
			&corev1.Secret{}: {
				Label: managedLabelSelector,
			},
		},
		// this requires us to explicitly start an informer for each object type
		// and helps avoid people mistakenly using the secret client for other resources
		ReaderFailOnMissingInformer: true,
	}
	secretCache, err := cache.New(mgr.GetConfig(), secretCacheOpts)
	if err != nil {
		return nil, err
	}

	// start an informer for secrets
	// this is required because we set ReaderFailOnMissingInformer to true
	_, err = secretCache.GetInformer(context.Background(), &corev1.Secret{})
	if err != nil {
		return nil, err
	}

	// add the secret cache to the manager, so that it starts at the same time
	err = mgr.Add(secretCache)
	if err != nil {
		return nil, err
	}

	// create a new client that uses the secret cache
	secretClient, err := client.New(mgr.GetConfig(), client.Options{
		HTTPClient: mgr.GetHTTPClient(),
		Scheme:     mgr.GetScheme(),
		Mapper:     mgr.GetRESTMapper(),
		Cache: &client.CacheOptions{
			Reader: secretCache,
		},
	})
	if err != nil {
		return nil, err
	}

	return secretClient, nil
}

This was referenced Dec 12, 2018

Consider making cache.internal.CacheReader accessible for users #247

Closed

Why DelegatingClient is needed? #180

Closed

k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Jan 4, 2019

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 1, 2019

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 31, 2019

k8s-ci-robot closed this as completed Jul 4, 2019

k8s-ci-robot reopened this Jul 4, 2019

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jul 4, 2019

hasbro17 mentioned this issue Aug 9, 2019

RFE: Watch labels operator-framework/operator-sdk#1808

Closed

jalseth mentioned this issue Feb 18, 2021

Allow namespace inclusion for replicating data open-policy-agent/gatekeeper#1044

Closed

qinqon mentioned this issue Mar 1, 2021

✨ Add cache option to filter by field #1404

Closed

estroz removed their assignment Mar 1, 2021

invidian mentioned this issue Apr 29, 2021

Without the list and watch permissions on secrets a log error is shown newrelic/newrelic-infra-operator#14

Closed

k8s-ci-robot closed this as completed Apr 29, 2021

This was referenced Jul 27, 2021

Restrict namespace for list/watch based on field selectors shlomitk1/controller-runtime#1

Closed

✨Restrict namespace for list/watch based on field selectors #1602

Merged

pintohutch mentioned this issue Oct 12, 2021

OperatorConfig CRD supports more of alertmanager_config GoogleCloudPlatform/prometheus-engine#47

Merged

thbkrkr mentioned this issue Dec 1, 2021

unable to get: x/y because of unknown namespace for the cache elastic/cloud-on-k8s#4168

Closed

moolen mentioned this issue Feb 15, 2022

Resource Usage on large cluster external-secrets/external-secrets#721

Closed

akhilac1 mentioned this issue Jan 4, 2023

High memory consumption for Operator when there are lots of pods dapr/dapr#5693

Closed

AshfaqMH mentioned this issue Jan 18, 2023

reconcile for externally managed secret is not triggered for update event kubernetes-sigs/kubebuilder#3145

Closed

janetkuo mentioned this issue Nov 13, 2023

feat: implement roleRefs API for RootSyncs GoogleContainerTools/kpt-config-sync#991

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable filtered list watches as watches #244

Enable filtered list watches as watches #244

nkvoll commented Dec 12, 2018

jeefy commented Dec 28, 2018

DirectXMan12 commented Jan 4, 2019

DirectXMan12 commented Jan 4, 2019

DirectXMan12 commented Jan 4, 2019

DirectXMan12 commented Jan 4, 2019

DirectXMan12 commented Jan 4, 2019

DirectXMan12 commented Jan 4, 2019

poidag-zz commented Jan 30, 2019 •

edited

Loading

DirectXMan12 commented Jan 31, 2019

poidag-zz commented Jan 31, 2019

DirectXMan12 commented Jan 31, 2019

fejta-bot commented May 1, 2019

fejta-bot commented May 31, 2019

DirectXMan12 commented Jun 4, 2019

fejta-bot commented Jul 4, 2019

k8s-ci-robot commented Jul 4, 2019

alvaroaleman commented Jul 4, 2019

k8s-ci-robot commented Jul 4, 2019

shawn-hurley commented Aug 12, 2019

pagarwal-tibco commented Mar 1, 2021

qinqon commented Mar 1, 2021

timebertt commented Apr 16, 2021

invidian commented Apr 29, 2021

taylormgeorge91 commented Apr 29, 2021

invidian commented Apr 29, 2021

estroz commented Apr 29, 2021

k8s-ci-robot commented Apr 29, 2021

roee88 commented Jul 19, 2021

alvaroaleman commented Jul 19, 2021

roee88 commented Jul 19, 2021 •

edited

Loading

alvaroaleman commented Jul 19, 2021

thesuperzapper commented Nov 10, 2024

Enable filtered list watches as watches #244

Enable filtered list watches as watches #244

Comments

nkvoll commented Dec 12, 2018

jeefy commented Dec 28, 2018

DirectXMan12 commented Jan 4, 2019

DirectXMan12 commented Jan 4, 2019

DirectXMan12 commented Jan 4, 2019

DirectXMan12 commented Jan 4, 2019

DirectXMan12 commented Jan 4, 2019

DirectXMan12 commented Jan 4, 2019

poidag-zz commented Jan 30, 2019 • edited Loading

DirectXMan12 commented Jan 31, 2019

poidag-zz commented Jan 31, 2019

DirectXMan12 commented Jan 31, 2019

fejta-bot commented May 1, 2019

fejta-bot commented May 31, 2019

DirectXMan12 commented Jun 4, 2019

fejta-bot commented Jul 4, 2019

k8s-ci-robot commented Jul 4, 2019

alvaroaleman commented Jul 4, 2019

k8s-ci-robot commented Jul 4, 2019

shawn-hurley commented Aug 12, 2019

pagarwal-tibco commented Mar 1, 2021

qinqon commented Mar 1, 2021

timebertt commented Apr 16, 2021

invidian commented Apr 29, 2021

taylormgeorge91 commented Apr 29, 2021

invidian commented Apr 29, 2021

estroz commented Apr 29, 2021

k8s-ci-robot commented Apr 29, 2021

roee88 commented Jul 19, 2021

alvaroaleman commented Jul 19, 2021

roee88 commented Jul 19, 2021 • edited Loading

alvaroaleman commented Jul 19, 2021

thesuperzapper commented Nov 10, 2024

Disable Caching Manager's Default Client

Example of Filtered Cached Client

poidag-zz commented Jan 30, 2019 •

edited

Loading

roee88 commented Jul 19, 2021 •

edited

Loading