Informer extremely slow to learn about new events #2121

lalithsuresh · 2020-04-08T23:50:59Z

I'm using a SharedIndexInformer to build a scheduler. I have a 500 node cluster, where I deploy a 1000 pods. I find that it takes roughly a minute for the informer to learn about every pod that was created, and worse, the information trickles in one pod at a time over the entire minute.

Is this normal? Am I missing some configuration? This is how I'm instantiating the informers:

final SharedInformerFactory sharedInformerFactory = client.informers();
final SharedIndexInformer<Node> nodeSharedIndexInformer = sharedInformerFactory
                .sharedIndexInformerFor(Node.class, NodeList.class, 30000);
nodeSharedIndexInformer.addEventHandler(new NodeResourceEventHandler(dbConnectionPool, service));

final SharedIndexInformer<Pod> podInformer = sharedInformerFactory
                .sharedIndexInformerFor(Pod.class, PodList.class, 30000);

The text was updated successfully, but these errors were encountered:

rohanKanojia · 2020-04-09T06:13:37Z

@lalithsuresh : to be honest, I didn't test SharedInformer support on such a huge infrastructure with 500 nodes. Does performance degrade only due to scaling or do you face similar issue on small clusters too?

@kolorful : Did you face any issue like this while trying out SharedInformer support?

kolorful · 2020-04-09T12:09:50Z

~~@rohanKanojia I'm using v4.9.0 in a smaller cluster (100-ish node) watching similar amount of Pods, I didn't observe such slowness?~~
@lalithsuresh How are you measuring the initialization time exactly?

lalithsuresh · 2020-04-09T15:26:19Z

Hi @rohanKanojia , @kolorful,

To measure this, I've enabled logging in the ReflectorWatcher class, and I'm also observing the rate at which the registered PodResourceEeventHandler is invoked.

I've tried two approaches:

using the informer.addEventHandler() API
using a watch directly.

In both cases, the invoked handler immediately passes off the "Pod" instance to a threadpool with 10 threads to start my processing pipeline. I observe the number of currently active threads in that thread pool.

With 1), events trickle in slowly as I mentioned, with an occupancy of 1 active thread almost all the time. With 2), I find that all 10 threads are constantly being exercised.

kolorful · 2020-04-10T14:10:26Z

Correction: I'm using v4.7.1 instead of v4.9.0. When using v4.9.0, sometimes it results in deleted objects lingering in the sharedinformer cache for an extended period of time, even hours. I'm not sure if it's correlated. Didn't have this issue with v4.7.1.

This graph shows that right after upgrading to v4.9.0, the number of phantom pods the application finds spikes.

@lalithsuresh are you using v4.9.0, have you tried v4.7.1?

lalithsuresh · 2020-04-10T15:35:48Z

@kolorful Yes, I'm using 4.9.0. I'm also experiencing a bunch of issues with pod bindings not going through (Pod stays in pending), delayed calls etc. I'll downgrade to 4.7.1 and get back to you. Thanks!

rohanKanojia · 2020-04-11T06:04:07Z

ah, I wonder #2022 introduced some kind of regression 😟

stale · 2020-07-11T01:20:47Z

This issue has been automatically marked as stale because it has not had any activity since 90 days. It will be closed if no further activity occurs within 7 days. Thank you for your contributions!

kolorful · 2020-07-13T12:23:46Z

Do we know if this is still an issue?

lalithsuresh · 2020-07-13T15:57:33Z

I suspect not. The slow events trickling in my deployment may have had to do with the Deployment controller creating pods slowly. When I directly create a large number of pod objects, the informer was working just fine. I'll close this for now.

stale bot added the status/stale label Jul 11, 2020

rohanKanojia added the status/never-stale label Jul 11, 2020

stale bot removed the status/stale label Jul 11, 2020

lalithsuresh closed this as completed Jul 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Informer extremely slow to learn about new events #2121

Informer extremely slow to learn about new events #2121

lalithsuresh commented Apr 8, 2020 •

edited

Loading

rohanKanojia commented Apr 9, 2020 •

edited

Loading

kolorful commented Apr 9, 2020 •

edited

Loading

lalithsuresh commented Apr 9, 2020

kolorful commented Apr 10, 2020 •

edited

Loading

lalithsuresh commented Apr 10, 2020

rohanKanojia commented Apr 11, 2020

stale bot commented Jul 11, 2020

kolorful commented Jul 13, 2020

lalithsuresh commented Jul 13, 2020 •

edited

Loading

Informer extremely slow to learn about new events #2121

Informer extremely slow to learn about new events #2121

Comments

lalithsuresh commented Apr 8, 2020 • edited Loading

rohanKanojia commented Apr 9, 2020 • edited Loading

kolorful commented Apr 9, 2020 • edited Loading

lalithsuresh commented Apr 9, 2020

kolorful commented Apr 10, 2020 • edited Loading

lalithsuresh commented Apr 10, 2020

rohanKanojia commented Apr 11, 2020

stale bot commented Jul 11, 2020

kolorful commented Jul 13, 2020

lalithsuresh commented Jul 13, 2020 • edited Loading

lalithsuresh commented Apr 8, 2020 •

edited

Loading

rohanKanojia commented Apr 9, 2020 •

edited

Loading

kolorful commented Apr 9, 2020 •

edited

Loading

kolorful commented Apr 10, 2020 •

edited

Loading

lalithsuresh commented Jul 13, 2020 •

edited

Loading