Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Informer extremely slow to learn about new events #2121

Closed
lalithsuresh opened this issue Apr 8, 2020 · 9 comments
Closed

Informer extremely slow to learn about new events #2121

lalithsuresh opened this issue Apr 8, 2020 · 9 comments

Comments

@lalithsuresh
Copy link
Contributor

lalithsuresh commented Apr 8, 2020

I'm using a SharedIndexInformer to build a scheduler. I have a 500 node cluster, where I deploy a 1000 pods. I find that it takes roughly a minute for the informer to learn about every pod that was created, and worse, the information trickles in one pod at a time over the entire minute.

Is this normal? Am I missing some configuration? This is how I'm instantiating the informers:

final SharedInformerFactory sharedInformerFactory = client.informers();
final SharedIndexInformer<Node> nodeSharedIndexInformer = sharedInformerFactory
                .sharedIndexInformerFor(Node.class, NodeList.class, 30000);
nodeSharedIndexInformer.addEventHandler(new NodeResourceEventHandler(dbConnectionPool, service));

final SharedIndexInformer<Pod> podInformer = sharedInformerFactory
                .sharedIndexInformerFor(Pod.class, PodList.class, 30000);
@rohanKanojia
Copy link
Member

rohanKanojia commented Apr 9, 2020

@lalithsuresh : to be honest, I didn't test SharedInformer support on such a huge infrastructure with 500 nodes. Does performance degrade only due to scaling or do you face similar issue on small clusters too?

@kolorful : Did you face any issue like this while trying out SharedInformer support?

@kolorful
Copy link
Contributor

kolorful commented Apr 9, 2020

@rohanKanojia I'm using v4.9.0 in a smaller cluster (100-ish node) watching similar amount of Pods, I didn't observe such slowness?
@lalithsuresh How are you measuring the initialization time exactly?

@lalithsuresh
Copy link
Contributor Author

Hi @rohanKanojia , @kolorful,

To measure this, I've enabled logging in the ReflectorWatcher class, and I'm also observing the rate at which the registered PodResourceEeventHandler is invoked.

I've tried two approaches:

  1. using the informer.addEventHandler() API
  2. using a watch directly.

In both cases, the invoked handler immediately passes off the "Pod" instance to a threadpool with 10 threads to start my processing pipeline. I observe the number of currently active threads in that thread pool.

With 1), events trickle in slowly as I mentioned, with an occupancy of 1 active thread almost all the time. With 2), I find that all 10 threads are constantly being exercised.

@kolorful
Copy link
Contributor

kolorful commented Apr 10, 2020

Correction: I'm using v4.7.1 instead of v4.9.0. When using v4.9.0, sometimes it results in deleted objects lingering in the sharedinformer cache for an extended period of time, even hours. I'm not sure if it's correlated. Didn't have this issue with v4.7.1.

This graph shows that right after upgrading to v4.9.0, the number of phantom pods the application finds spikes.
image

@lalithsuresh are you using v4.9.0, have you tried v4.7.1?

@lalithsuresh
Copy link
Contributor Author

@kolorful Yes, I'm using 4.9.0. I'm also experiencing a bunch of issues with pod bindings not going through (Pod stays in pending), delayed calls etc. I'll downgrade to 4.7.1 and get back to you. Thanks!

@rohanKanojia
Copy link
Member

ah, I wonder #2022 introduced some kind of regression 😟

@stale
Copy link

stale bot commented Jul 11, 2020

This issue has been automatically marked as stale because it has not had any activity since 90 days. It will be closed if no further activity occurs within 7 days. Thank you for your contributions!

@kolorful
Copy link
Contributor

Do we know if this is still an issue?

@lalithsuresh
Copy link
Contributor Author

lalithsuresh commented Jul 13, 2020

I suspect not. The slow events trickling in my deployment may have had to do with the Deployment controller creating pods slowly. When I directly create a large number of pod objects, the informer was working just fine. I'll close this for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants