Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NETOBSERV-578: decrease informers' memory consumption #317

Merged
merged 8 commits into from
Sep 30, 2022
Merged

NETOBSERV-578: decrease informers' memory consumption #317

merged 8 commits into from
Sep 30, 2022

Conversation

mariomac
Copy link

This PR adds a Transformer for each Kubernetes informer that converts the watched Pods, Services, and Nodes to Info instances, that are stored by the informers cache and contain only the data that is required for the decoration.

This also alleviates the load on the Garbage Collector: before this patch, a new Info instance was created for each flow decoration. Now the same objects are reused on GetInfo invocations for the same IP.

In Flowlogs-Pipelines with mid-to-low loads, no improvements have been observed in terms of CPU and slight improvements in terms of memory. In the image below, the middle part corresponds to the new version of FLP.

image

@mariomac mariomac added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Sep 29, 2022
@github-actions
Copy link

New image: ["quay.io/netobserv/flowlogs-pipeline:7314a97"]. It will expire after two weeks.

@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Sep 29, 2022
@jotak
Copy link
Member

jotak commented Sep 29, 2022

I did some tests on higher load with hey-ho, I also see the memory usage being improved, of a slightly better amount that what you show:

  • Spiking at 1.16GB (versus 1.39 before) => -16%
  • Stabilizing at 423MB (versus 642 before) => -34%

Capture d’écran du 2022-09-29 11-15-13

@jotak
Copy link
Member

jotak commented Sep 29, 2022

@mariomac I haven't reviewed the code in deep, but remember the discussion we had about potential blocking calls and fact the cache seemed pre-loaded? Does this PR change anything in that regard / should we try to parallelize transformer processing?

@mariomac
Copy link
Author

@jotak after working deeply with the informers' code, I can confirm that there aren't blocking calls (neither before nor now).

The initial consideration about parallelizing workers was to share the big amount of memory that the informers' spent so we expected to minimize the overall memory usage by scaling vertically and downscaling horizontally.

There are some reasons that changed my mind with respect to parallelizing (I'll annotate them also in the related JIRA to justify it):

  • The memory spent by the Informers' cache does not look to be so critical as we initially thought. And after this PR, the memory usage will be noticeably lower.

  • Since the flows arrive as waves and are internally processed in batches, I'm afraid that the effect of parallelism will be limited.

Given the internal complexity of FLP, I'd say that the effort of reworking some internals to allow this parallelization won't worth the limited improvements we could achieve.

if ownerReference.Kind == "ReplicaSet" {
item, ok, err := k.replicaSetInformer.GetIndexer().GetByKey(info.Namespace + "/" + ownerReference.Name)
if err != nil {
panic(err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

never noticed this panic .. that was abrupt!

Copy link
Member

@jotak jotak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code lgtm, smart rewrite! thanks @mariomac
I'll do a couple of more tests on my cluster before approving

Copy link
Member

@jotak jotak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm
I smoke-tested the enrichment end to end, everything seems fine

@mariomac mariomac merged commit 8cc4bc7 into netobserv:main Sep 30, 2022
@mariomac mariomac deleted the informer branch December 7, 2022 14:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants