-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime/pprof: GOOS=darwin over-reports CPU usage in mutex-heavy workload #68477
Comments
My OS version is macOS Sonoma 14.4.1. The issue is also present with go1.17.13. |
Wasn't there a bug in Darwin that CPU sampling is biased toward syscalls? That might explain this. #57722 maybe? |
Yes, #57722 was on my mind too. What I see here is overcounting, which is more extreme than "bias". Maybe in small doses, the overcounting appears as if it were (only) bias? |
This seems like potentially an OS bug. Does it reproduce with a 4 thread C program? If not trivially, I wonder if it is related to threads sleeping and waking (in lock2 presumably). Maybe each thread wake-up is erroneously receiving a SIGPROF? |
OS bug seems likely, yes. I agree that a C reproducer would be a good next step. I have the next few weeks planned out with other work — thanks in advance if someone beats me to it — otherwise I'll take a stab at that later. Yes, syscalls are related: In lock_sema.go, I've updated to macOS Sonoma 14.5 and the issue is still present. |
Go version
Output of
go env
in your module/workspace:$ ~/sdk/go1.23/bin/go env -changed $
What did you do?
On an M1 Macbook Air, which I understand to have 8 cores of 1 thread each (4 performance, 4 efficiency), I ran the runtime's ChanContended benchmark while varying GOMAXPROCS, collecting a CPU profile for each.
What did you see happen?
The profile reports using an average of 0.62 threads of on-CPU time when GOMAXPROCS is 1, and 1.21 when GOMAXPROCS is 2.
However, when GOMAXPROCS is 4 and 8, it reports using 125.54 and 403.14 threads of on-CPU time respectively. It claims that the program consumed 930 seconds of on-CPU time over a period of 2.31 seconds of wall-clock time.
That's more than 50 times more hardware threads than the machine really has.
What did you expect to see?
I expected the profile to report no more than 100 CPU profile samples per thread-second.
This may be related to #57722 . In that issue, GOOS=darwin CPU profiles included the right number of samples but skewed them towards a particular portion of the workload. But here, the workload itself is experiencing extreme over-sampling.
A combined execution trace and CPU profile show a change in the density of the magenta "CPU profile sample" annotations at the time when the test transitions from GOMAXPROCS=2 to GOMAXPROCS=3. (It's like it's the opposite of #35057 , where GOOS=linux profiles under-sampled when the program's CPU usage crossed the 250% boundary.)
The text was updated successfully, but these errors were encountered: