-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime/pprof: multithreaded CPU profiles incorrect on NetBSD #6047
Comments
# Can't we do something like what windows does for profiling -- a dedicated thread that periodically queries state of all other threads? Not easily. I do hope that removing the workaround will prompt people to think about working solutions, just like what happened with "stack unavailable" in the goroutine dumps. I believe a dedicated thread can be made to work, but I also believe it requires using the Mach API to query the thread status, and we don't have an easy way from within the runtime to do that. I started looking into that (libmach/darwin.c has some code; look for thread_suspend and so on) but I am not sure that it can be used easily from within the running process, and I am not sure whether all the calls we need are available as direct system calls or whether we'd have to wrap more of the Mach message passing layer. We can't use task_suspend because it would suspend the thread doing the profiling. My kernel fix needs some cleanup and testing before others should try it, but it is a significantly smaller change and seems to be reliable. |
This issue was closed by revision d3066e4. Status changed to Fixed. |
Reopening so that it will appear in search results and to acknowledge that the bug is not fixed. NetBSD and OpenBSD are broken more or less the same way OS X is. This is not super surprising given their shared code, but it is interesting nonetheless. Apparently all of them added threads to the kernel without updating the profiling signal delivery to be thread-aware. (FreeBSD is fine at least, but they were one of the earliest to support threads.) The primary focus of the bug is still OS X. NetBSD and OpenBSD can and should fix their kernels. I have not submitted any reports to them, but if anyone wants to do so, the test program you need is probably in the Apple Bug Report (http://golang.org/change/35b716c94225). Labels changed: removed go1.2. Status changed to Accepted. |
If you want to get accurate profiles on a Mac, see the first comment. http://godoc.org/code.google.com/p/rsc/cmd/pprof_mac_fix http://research.swtch.com/macpprof |
Comment 14 by [email protected]: I cannot reproduce the test case on the OSX report on NetBSD 5-6 or indeed on OpenBSD 5.4 - does anyone have a test that eg fails on Openbsd 5.4 and succeeds on 5.5 so I can look at fixing NetBSD? |
OpenBSD now has a regress test for this - this fails on OpenBSD 5.4 and probably fails on NetBSD (although I've not tried): http://www.openbsd.org/cgi-bin/cvsweb/src/regress/sys/kern/sigprof/sigprof.c?rev=1.1;content-type=text%2Fplain |
Comment 16 by [email protected]: That test does fail on NetBSD 6, but it also fails on Linux (Ubuntu 13.10), although the distribution is much better than on NetBSD (which is not as bad as OpenBSD 5.4). I need to install OpenBSD 5.5 to compare. So not sure its a great test, but indicative... |
Until this is fixed, can we make runtime.StartCPUProfiling print a message to stderr on Mac OS X to warn users that profiling is broken? |
@adonovan, profiling works fine on OS X if you patch your kernel, as I think most Go programmers on Macs do, out of necessity. Certainly people running non-buggy kernels don't need to see forced output on standard error every time they profile a program. If we had good way to tell whether the kernel patch has been applied, we could print a warning in that case. But I don't have a good way to do that. I have thought about changing the kernel version string but the only thing I am confident about changing is the date, and there's not much room there to signal that the fix is applied. See rsc.io/pprof_mac_fix for the patch. |
@rsc thanks for the patch. Pray for me while I run it! |
I have a report from an OS X 10.11 El Capitan beta user that pprof works out of the box on that system, without the need for a kernel patch. I have also inspected the machine code for the relevant kernel function, and they did make changes roughly along the lines of what the patch has always done. So I believe it was intentionally fixed. I am hopeful that the fix will last into the final public release of OS X 10.11 El Capitan. And then maybe years from now we can look back and laugh at how ridiculous it was that we had to apply a binary patch to our kernels to profile our programs. Does anyone know: is NetBSD still broken? |
And there was much rejoicing and dancing in the streets. On Fri, 28 Aug 2015 10:30 Russ Cox [email protected] wrote:
|
Judging by reading NetBSD kernel source code, I think the
system still traverse the thread list and picks the first thread
that accepts the signal to send the SIGPROF signal to.
The sigtest.c seems to confirm this finding with the majority
of the signal received by the last created thread.
$ ./sigtest
500 signals received.
0 on main
4 on looper 0
19 on looper 1
73 on looper 2
404 on looper 3
To verify my hypothesis, I added another sleeping thread to
sigtest.c created after all the other threads, and sure enough,
all the SIGPROFs are delivered to the last thread.
$ ./sigtest
0 on main
0 on looper 0
0 on looper 1
0 on looper 2
0 on looper 3
500 on sleep thread
This test is very reproducible.
In conclusion, multi-threaded profiling is still broken on NetBSD.
|
Confirming that profiling does indeed work out-of-the-box on El Capitan. |
Obsoleting in favor of #13841. |
Yay! |
CL https://golang.org/cl/19161 mentions this issue. |
macOS tests have been disabled since CL 12429045 (Aug 2013). At the time, macOS required a kernel patch to get a working profiler (https://research.swtch.com/macpprof), which we didn't want to require, of course. macOS has improved - it no longer requires the kernel patch - but we never updated the list of exceptions. As far as I can tell, the builders have no problem passing the pprof test now. (It is possible that the iOS builders have trouble, but that is now a different GOOS.) Remove the exception for macOS. The test should now pass. Fixes #6047. Change-Id: Iab49036cacc1025e56f515bd19d084390c2f5357 Reviewed-on: https://go-review.googlesource.com/c/go/+/292229 Trust: Russ Cox <[email protected]> Reviewed-by: Cherry Zhang <[email protected]>
The text was updated successfully, but these errors were encountered: