-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
azure-npm daemonset pods request 250m CPU instead of 10m. Is this configurable? #2792
Comments
Hi DaveOHenry, AKS bot here 👋 I might be just a bot, but I'm told my suggestions are normally quite good, as such:
|
Triage required from @Azure/aks-pm |
Action required from @Azure/aks-pm |
I would also like an answer on this. The last posting on #2033 was by @paulgmiller:
But the issue described is the baseline reservation of CPU in the manifest for In my cluster, Both @miwithro and @juan-lee were tagged on the closed, but unresolved issue. |
Because azure-npm is a daemonset, we apply default memory and cpu limits irrespective of the cluster size. Can you give your AKS cluster fqdn, i can check if it is recommended to reduce those limits or not. Even though NPM steady state is not using a ton of CPU, when a flood of events come in/ NPM restarts for some reason, if CPU limit is reduced and the cluster is fairly large in size then there is a high chance for NPM to be in OOM kill loop. |
Lowering requested CPU should not generate a out of memory kill loop as it has nothing to do with memory. It also doesn't limit the maximum allowed CPU usage. By lowering the |
NPM pods watch pod, namespace and netpol related events and due to some inefficiencies (which we are actively working to solve) can only work on one event at a time. On start of NPM pod, this results in incoming events to be piled up and increasing memory usage. So reducing CPU limit will result in less number of events processed, in turn further aggravating memory usage. We are working on an improved design and until that we want to review a cluster's size before we can reduce the CPU limits. I agree that this limit can be costly for smaller Node sizes, we have been exploring options on making these limits dynamic based on node size, but i am afraid we do not have a solution in sight. Until then we need to rely on adhoc requests to either increase/decrease the limits based on cluster size. Sorry for the inconvenience. |
Triage required from @Azure/aks-pm @vakalapa |
This issue has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs within 15 days of this comment. |
This should still be implemented in AKS. #not-stale |
This issue has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs within 15 days of this comment. |
ping
…________________________________
From: msftbot[bot] ***@***.***>
Sent: Monday, June 27, 2022 4:01:17 AM
To: Azure/AKS ***@***.***>
Cc: David Heinrich ***@***.***>; Author ***@***.***>
Subject: Re: [Azure/AKS] azure-npm daemonset pods request 250m CPU instead of 10m. Is this configurable? (Issue #2792)
This issue has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs within 15 days of this comment.
—
Reply to this email directly, view it on GitHub<#2792 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AE77SLMIPV62BRL5P3DNAK3VREDO3ANCNFSM5OK7LTYA>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
This issue has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs within 15 days of this comment. |
The advice I've gotten from Microsoft employees is that our cluster's |
This should be addressed from MS side... 250m is way to much for CPU Request as it takes basically nothing... |
This issue has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs within 15 days of this comment. |
Not stale, bump. |
I'd like to join and also ping this issue. Quarter of a whole CPU to be requested is not justified for such usage. Limits naturally is okay to stay. This is a heavily contributing factor why a k8s system overutilizing itself and allocates more nodes than it is actually necessary. |
We also encountered a case where trying to apply a NetworkPolicy to multiple namespaces at once had our azure-npm CPU usage go through the roof (related to #2823). |
Bump, currently experiencing this problem as well, needs a resolution. |
+1 ,we are also experiencing this issue. It is very disruptive. 250mCores is alot |
Needs resolving. We have 3 instances of this thing using up 0.75 cores and we are now at the limit of resources for our cluster. There is barely any CPU activity so it obviously does not need it. |
Needs resolving.. issue still happening for azure-npm |
I'd like to get this solved ... we are running 400 aks nodes and this |
I would also like to see this resolved.. at least give users the ability to tune this. |
need to resolve this, costing us too much money at the moment |
Would also like to see this resolved, seems ridiculous with this high amount of requested resources for what it is doing. This is our primary cost for our AKS cluster. |
Needs resolving.. issue is still happening for azure-npm |
Same issue. azure-npm pods have high request:
While usage over time in a running cluster is usually quite low, 1-3% of request ... |
The team is discussing a few improvements here. We will evaluate automatic scaling in the future, but possibly can release a reduction in the short term. @vakalapa will update this thread once the plan is solidified. |
Hi @vakalapa ! Any updates on this? |
Duplicate of #2033 which was closed without an answer.
The text was updated successfully, but these errors were encountered: