Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High vmcompute.exe CPU due to frequent HcsGetComputeSystemProperties calls. #989

Open
zoucheng2018 opened this issue Mar 31, 2021 · 6 comments

Comments

@zoucheng2018
Copy link

vmcompute.exe is taking up to 25% of a core on our AKS managed nodes, the CPU profiling data shows it was mostly caused by vmcompute!HcsRpc_GetSystemProperties. Below image contains the detailed stack:

image

RPC ETW traces indicates the calls are made from kubelet.exe, one example trace shows it calls every 500ms. This API is quite expensive inside vmcompute.exe, can we tune the frequency or use the alternative APIs?

image

@marosset
Copy link
Member

marosset commented Apr 1, 2021

@zoucheng2018 Can you answer a few questions here?

Do you have the Container Insights add-on enabled for your cluster (or any other monitor solution)?
What node size are you seeing this on?
How many containers are running on the node?
Are limits/requests configured for the pods / how densely packed is the node?

Thanks!

@jsturtevant
Copy link
Contributor

Also: what version of Kubernetes? and What was the etw command you ran?

@zoucheng2018
Copy link
Author

Do you have the Container Insights add-on enabled for your cluster (or any other monitor solution)?
I believe so, there’re also Geneva agents running but they don’t appear to be the source of the RPC calls to vmcompute.

What node size are you seeing this on?
The node SKU is Standard_D32_v3.

How many containers are running on the node?
The high vmcmopute CPU issue is quite pervasive across all our clusters, the clusters are generally not busy, so I’m not sure it’s related to the workload. The machines that I ran traces had about 5-10 containers.

Are limits/requests configured for the pods / how densely packed is the node?
Yes, they’re. Most clusters are not very dense, but some are.

Also: what version of Kubernetes? and What was the etw command you ran?
We’re running 1.18.   

Perfview command to collect CPU trace:
PerfView Collect PerfView-Manual /BufferSize:3072 /Circular:3072 /MaxCollectSec:120 /KernelEvents=Process+Thread+Profile+ImageLoad /ClrEvents:GC+Loader+Exception+Stack /Zip /AcceptEULA /NoView /NoNGENRundown /NoGui              

RPC Trace:
PerfView Collect PerfView-RPC /KernelEvents=Process+Thread+ImageLoad /providers:Microsoft-Windows-RPC:Microsoft-Windows-RPC/Debug::stack /ClrEvents:Loader /BufferSize:2048 /Circular:2048 /MaxCollectSec:120 /Zip /AcceptEULA /nogui /NoView

@dcantah
Copy link
Contributor

dcantah commented May 2, 2022

So for some update here (and sorry for the delay), we've found that the OS isn't as optimized as it could be for returning some memory statistics. I have a change here that speeds things up a bit #1362, although I'll let @marosset or @jsturtevant speak on the container-insights extension as I don't know how much extra this would add on in terms of query volume. I'm hoping we can get that change out into AKS in the next month, but that's the optimist in me haha.

@jsturtevant
Copy link
Contributor

We've helped fixed the readiness probe in container insights over the last several months that should help the performance of the container that runs on Windows. We also made two perf improvements to kubelet in 1.23 that would reduce overall cpu usage as well: kubernetes/kubernetes#105744 and kubernetes/kubernetes#104287

@yanrez
Copy link
Member

yanrez commented May 5, 2022

For my repro, disabling container insights seem to have helped, although we had other changes in the cluster and still need more time to confirm if it's indeed container insights specifically. Looking forward to fix to re-enable container insights.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants