Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Microsoft.Extensions.Diagnostics.ResourceMonitoring with version >= 8.5.0 has bug on Utilization calculation #5387

Closed
Weikai1997 opened this issue Aug 28, 2024 · 3 comments · Fixed by #5388
Assignees
Labels
bug This issue describes a behavior which is not expected - a bug. work in progress 🚧

Comments

@Weikai1997
Copy link
Contributor

Description

We found that the CPU performance was very low when using ResourceMonitoring (we used IUtilizationPublisher to publish cpu metrics)
After I went through the source code of .net extension, I found the root.

Let's split the version into two parts, 8.5.0 ~ 8.7.0 and 8.8.0
For 8.5.0~8.7.0, their problem is that they divide the cpu request twice, in the LinuxUtilizationProvider, we calculate the _scaleForTrackerApi = hostCpus / availableCpus; then in the GetSnapshot we multiply it with the cgroupTime. Then we divided it by GuaranteedCpu again in Calculator. Please note that the value of GuaranteedCpu here is 1 in R9 and before .net 8.4.0, but after .net 8.5.0 it is container cpu request, so the dividing of one more cpu request causes the calculated cpu value to become smaller and incorrect.

Then the newest version 8.8.0
It almost fixed the problem, but caused new problems. In the 8.8.0, the LinuxUtilizationProvider changed, _scaleRelativeToCpuRequestForTrackerApi = hostCpus;It removes the availableCpu that is divided when calculating scale. If there are no other changes, now only one cpu request is divided and the value should be correct
But the GuaranteedCpu originally divided in Calculator has been moved to the ResourceUtilization constructor. Now the range of utilization in Calculator is 0 ~ (100 * request). But we performed a Min operation on it var cpuUtilization = Math.Min(Hundred, utilization). This ultimately results in the range of cpu in ResouceUtilization become 0 ~ (100 / request)

So from 8.5.0 to 8.8.0 the cpu value of ResourceUtilization are incorrect and need to be fixed.

Reproduction Steps

  1. Use Resource Monitoring with version 8.5.0 ~ 8.8.0, and the IUtilizationPublisher to log the CpuUsedPercentage
  2. Let your container use the full cpu
  3. Set your linux container request and limit larger than 1 (don't use 1)
  4. Run it in a k8s environment.
  5. You can see the logged value will not exceed 100/cpurequest

Expected behavior

the cpu value should be 0~100

Actual behavior

the value is range in 0 ~ 100/request

Regression?

the version 8.5.0 ~ 8.8.0 has the issue

Known Workarounds

No response

Configuration

No response

Other information

No response

@Weikai1997 Weikai1997 added bug This issue describes a behavior which is not expected - a bug. untriaged labels Aug 28, 2024
@RussKie RussKie removed the untriaged label Aug 28, 2024
@dotnet dotnet deleted a comment from ViniciusSCG Aug 28, 2024
@dotnet dotnet deleted a comment Aug 28, 2024
@RussKie
Copy link
Member

RussKie commented Aug 28, 2024

Is this something could be replicated locally in a docker container?

@Weikai1997
Copy link
Contributor Author

@RussKie local WSL container run with docker run --cpus maybe not replicatable, I tried but cpu.shares value is still 1 which not calculated the cpu request correctly

@Weikai1997
Copy link
Contributor Author

@RussKie PR created for fix the issue in 8.8.0 #5388

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug This issue describes a behavior which is not expected - a bug. work in progress 🚧
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants