-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime, cgo: programs using Cocoa/OpenGL/Metal APIs on macOS exhibit problems at tip not seen in 1.19.4 #57263
Comments
Can you bisect to when it happened? |
If this helps...: I started seeing the first error from an existing binary (of a Go app) when I upgraded my Mac to "Ventura" (OS 13.0). It was nonfatal, and the app still worked, so I didn't pay a lot of attention. Rebuilding the app with Go 1.20.RC1 had no effect. Today I upgraded to MacOS 13.1 and with both the old and new app binaries the initial error is followed by I guess the main thing I'm suggesting is that it might possibly be a change in MacOS rather than a change in the Go code that provoked the problem. |
@rsc Sure, I'll run a bisect and see if I can spot what Go commit introduces this, thanks. (It should be easy as long as this stays reproducible for me.) @gtownsend Thanks. In your case, does compiling the Go program with Go 1.19.4 make any difference? For me, when I tried it last, the problem would immediately go away whenever the program was built with 1.19.4. I'll also see if today's macOS 13.1 release brings any new differences. |
My app now works perfectly after reinstalling Go 1.19.4 and rebuilding.
… On Dec 13, 2022, at 3:43 PM, Dmitri Shuralyov ***@***.***> wrote:
@rsc <https://github.com/rsc> Sure, I'll run a bisect and see if I can spot what Go commit introduces this, thanks. (It should be easy as long as this stays reproducible for me.)
@gtownsend <https://github.com/gtownsend> Thanks. In your case, does compiling the Go program with Go 1.19.4 make any difference? For me, when I tried it last, the problem would immediately go away whenever the program was built with 1.19.4.
I'll also see if today's macOS 13.1 release brings any new differences.
—
Reply to this email directly, view it on GitHub <#57263 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABVOAQBXKK6FDHJXGYKSUN3WND3ZXANCNFSM6AAAAAAS4GDURU>.
You are receiving this because you were mentioned.
|
After re-reading #56784 I realized this likely has to do with fork/new process being involved during shader compilation. In case it ends up being helpful, here's a more self-contained repro that uses the Metal API: package main
/*
#cgo CFLAGS: -x objective-c
#cgo LDFLAGS: -framework Metal -framework CoreGraphics -framework Foundation
#import <Metal/Metal.h>
void CompileMetalShader() {
id<MTLDevice> device = MTLCreateSystemDefaultDevice();
if (!device) {
printf("no Metal device\n");
return;
}
NSError * err;
id<MTLLibrary> library = [device newLibraryWithSource:@"// Empty shader 123."
options:NULL
error:&err];
if (err) {
printf("newLibraryWithSource error: %s\n", err.localizedDescription.UTF8String);
return;
}
printf("ok\n");
}
*/
import "C"
func main() {
C.CompileMetalShader()
// Output (with CL 451735, with shader source that hasn't been previously compiled):
// newLibraryWithSource error: Compiler encountered XPC_ERROR_CONNECTION_INVALID (is the OS shutting down?)
// Output (without CL 451735):
// ok
} Edit: Added |
I can confirm that Dmitri's test exhibits the problem on my system with 1.20.rc1 and not with Go 1.19.4.
(Mac M1 mini, MacOS 13.1, latest xcode command tools updated along with 13.1 upgrade.)
… On Dec 14, 2022, at 5:52 AM, Dmitri Shuralyov ***@***.***> wrote:
After re-reading #56784 <#56784> I realized this likely has to do with fork/new process being involved during shader compilation.
In case it ends up being helpful, here's a more self-contained repro that uses the Metal API:
package main
/*
#cgo CFLAGS: -x objective-c
#cgo LDFLAGS: -framework Metal -framework CoreGraphics
#import <Metal/Metal.h>
void CompileMetalShader() {
id<MTLDevice> device = MTLCreateSystemDefaultDevice();
if (!device) {
printf("no Metal device\n");
return;
}
NSError * err;
id<MTLLibrary> library = [device newLibraryWithSource:@"// Empty shader 123."
options:NULL
error:&err];
if (err) {
printf("newLibraryWithSource error: %s\n", err.localizedDescription.UTF8String);
return;
}
printf("ok\n");
}
*/
import "C"
func main() {
C.CompileMetalShader()
// Output (with CL 451735, with shader source that hasn't been previously compiled):
// newLibraryWithSource error: Compiler encountered XPC_ERROR_CONNECTION_INVALID (is the OS shutting down?)
// Output (without CL 451735):
// ok
}
—
Reply to this email directly, view it on GitHub <#57263 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABVOAQA7RXZM5VFVXPOMQZTWNG7J7ANCNFSM6AAAAAAS4GDURU>.
You are receiving this because you were mentioned.
|
Tentatively adding release-blocker since this might be preventing some cgo programs from running on a first class port. |
#57419 is another issue that appears to be caused by go.dev/cl/451735 Reproducer:
Without cl/451735 or using Go 1.19 and earlier, this prints 0x00000000 (SCARD_S_SUCCESS). On Go 1.20rc1 it prints 0x8010001d (SCARD_E_NO_SERVICE) |
Change https://go.dev/cl/459175 mentions this issue: |
Change https://go.dev/cl/459176 mentions this issue: |
Revert CL 451735 (1f4394a), which fixed #33565 and #56784 but also introduced #57263. I have a different fix to apply instead. Since the first fix was never backported, it will be easiest to backport the new fix if the new fix is done in a separate CL from the revert. Change-Id: I6c8ea3a46e542ee4702675bbc058e29ccd2723e0 Reviewed-on: https://go-review.googlesource.com/c/go/+/459175 Reviewed-by: Cherry Mui <[email protected]> TryBot-Result: Gopher Robot <[email protected]> Run-TryBot: Russ Cox <[email protected]>
Change https://go.dev/cl/459178 mentions this issue: |
Change https://go.dev/cl/459178 mentions this issue: |
Change https://go.dev/cl/459179 mentions this issue: |
Issues #33565 and #56784 were caused by hangs in the child process after fork, while it ran atfork handlers that ran into slow paths that didn't work in the child. CL 451735 worked around those two issues by calling a couple functions at startup to try to warm up those child paths. That mostly worked, but it broke programs using cgo with certain macOS frameworks (#57263). CL 459175 reverted CL 451735. This CL introduces a different fix: bypass the atfork child handlers entirely. For a general fork call where the child and parent are both meant to keep executing the original program, atfork handlers can be necessary to fix any state that would otherwise be tied to the parent process. But Go only uses fork as preparation for exec, and it takes care to limit what it attempts to do in the child between the fork and exec. In particular it doesn't use any of the things that the macOS atfork handlers are trying to fix up (malloc, xpc, others). So we can use the low-level fork system call (__fork) instead of the atfork-wrapped one. The full list of functions that can be called in a child after fork in exec_libc2.go is: - ptrace - setsid - setpgid - getpid - ioctl - chroot - setgroups - setgid - setuid - chdir - dup2 - fcntl - close - execve - write - exit I disassembled all of these while attached to a hung exec.test binary and confirmed that nearly all of them are making direct kernel calls, not using anything that the atfork handler needs to fix up. The exceptions are ioctl, fcntl, and exit. The ioctl and fcntl implementations do some extra work around the kernel call but don't call any other functions, so they should still be OK. (If not, we could use __ioctl and __fcntl instead, but without a good reason, we should keep using the standard entry points.) The exit implementation calls atexit handlers. That is almost certainly inappropriate in a failed fork child, so this CL changes that call to __exit on darwin. To avoid making unnecessary changes at this point in the release cycle, this CL leaves OpenBSD calling plain exit, even though that is probably a bug in the OpenBSD port (filed #57446). Fixes #33565. Fixes #56784. Fixes #57263. Fixes #56837. Change-Id: I26812c26a72bdd7fcf72ec41899ba11cf6b9c4ab Reviewed-on: https://go-review.googlesource.com/c/go/+/459176 Reviewed-by: David Chase <[email protected]> Reviewed-by: Cherry Mui <[email protected]> TryBot-Result: Gopher Robot <[email protected]> Run-TryBot: Russ Cox <[email protected]> Reviewed-on: https://go-review.googlesource.com/c/go/+/459178
Issues #33565 and #56784 were caused by hangs in the child process after fork, while it ran atfork handlers that ran into slow paths that didn't work in the child. CL 451735 worked around those two issues by calling a couple functions at startup to try to warm up those child paths. That mostly worked, but it broke programs using cgo with certain macOS frameworks (#57263). CL 459175 reverted CL 451735. This CL introduces a different fix: bypass the atfork child handlers entirely. For a general fork call where the child and parent are both meant to keep executing the original program, atfork handlers can be necessary to fix any state that would otherwise be tied to the parent process. But Go only uses fork as preparation for exec, and it takes care to limit what it attempts to do in the child between the fork and exec. In particular it doesn't use any of the things that the macOS atfork handlers are trying to fix up (malloc, xpc, others). So we can use the low-level fork system call (__fork) instead of the atfork-wrapped one. The full list of functions that can be called in a child after fork in exec_libc2.go is: - ptrace - setsid - setpgid - getpid - ioctl - chroot - setgroups - setgid - setuid - chdir - dup2 - fcntl - close - execve - write - exit I disassembled all of these while attached to a hung exec.test binary and confirmed that nearly all of them are making direct kernel calls, not using anything that the atfork handler needs to fix up. The exceptions are ioctl, fcntl, and exit. The ioctl and fcntl implementations do some extra work around the kernel call but don't call any other functions, so they should still be OK. (If not, we could use __ioctl and __fcntl instead, but without a good reason, we should keep using the standard entry points.) The exit implementation calls atexit handlers. That is almost certainly inappropriate in a failed fork child, so this CL changes that call to __exit on darwin. To avoid making unnecessary changes at this point in the release cycle, this CL leaves OpenBSD calling plain exit, even though that is probably a bug in the OpenBSD port (filed #57446). Fixes #33565. Fixes #56784. Fixes #57263. Fixes #56836. Change-Id: I26812c26a72bdd7fcf72ec41899ba11cf6b9c4ab Reviewed-on: https://go-review.googlesource.com/c/go/+/459176 Reviewed-by: David Chase <[email protected]> Reviewed-by: Cherry Mui <[email protected]> TryBot-Result: Gopher Robot <[email protected]> Run-TryBot: Russ Cox <[email protected]> Reviewed-on: https://go-review.googlesource.com/c/go/+/459179
Change https://go.dev/cl/460476 mentions this issue: |
We're tracking some customer crashes on M1 macOS 13.1 which seems to be related to the
We are using cgo to populate an icon in the macOS menu bar./* #cgo CFLAGS: -x objective-c -mmacosx-version-min=10.10 #cgo LDFLAGS: -framework Cocoa -framework Security #import void SetupApp(); void StartApp(); void StopApp(); void add_menu_item(int, const char *); void add_separator_item(); */ import "C" void SetupApp(void) { [NSAutoreleasePool new]; [NSApplication sharedApplication]; [NSApp setActivationPolicy:NSApplicationActivationPolicyAccessory]; |
Which version of Go are you using? |
We are tracking The customer confirmed that reverting the I also got more details about his setup, which is different than what we were trying ourselves:
Yes, it does use os/exec before calling StartMenu |
Thank you for the quick confirmation. |
Change https://go.dev/cl/461115 mentions this issue: |
Change https://go.dev/cl/461116 mentions this issue: |
…on darwin" A recent comment on #57263 reports an unexplained crash in a cgo program that is fixed by reverting the __fork fix. We don't have any viable fix for the os/exec bug at this point, so give up on a fix for the January point releases. This reverts CL 459179 (commit 07b6ffb). Fixes #57689. Change-Id: I3b81de6bded399f47862325129e86a65c83d8e3b Reviewed-on: https://go-review.googlesource.com/c/go/+/461116 Reviewed-by: Ian Lance Taylor <[email protected]> Run-TryBot: Russ Cox <[email protected]> TryBot-Result: Gopher Robot <[email protected]> Reviewed-by: Cherry Mui <[email protected]>
…on darwin" A recent comment on #57263 reports an unexplained crash in a cgo program that is fixed by reverting the __fork fix. We don't have any viable fix for the os/exec bug at this point, so give up on a fix for the January point releases. This reverts CL 459178 (commit 91bc4cd). Fixes #57690. Change-Id: Ieb38d9bc7f967e9a726429eab2ea515d5ca0847f Reviewed-on: https://go-review.googlesource.com/c/go/+/461115 Run-TryBot: Russ Cox <[email protected]> Reviewed-by: Ian Lance Taylor <[email protected]> TryBot-Result: Gopher Robot <[email protected]> Reviewed-by: Cherry Mui <[email protected]>
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Not with the latest stable release.
This appears to be a regression that happens only with Go 1.20 RC 1 and at tip (as of yesterday), but doesn't happen at all with Go 1.19.4 or any older stable Go release.
What operating system and processor architecture are you using (
go env
)?go env
OutputGenerally a close-to-default install of Go 1.20 RC 1 on an M1-based Mac running latest macOS (13.0.1/22A400) and Xcode (14.1/14B47b).
What did you do?
I tried running various Go programs that use Cocoa and either OpenGL or Metal APIs via cgo to open a window and render graphics. This problem affects all of them in the same way. The smallest way to reproduce I can share at this time are the simple example programs in the go-gl org:
There's no problem running simple cgo programs that don't use the Cocoa/OpenGL/Metal APIs.
What did you expect to see?
Normal program execution, no warnings or errors, just like with Go 1.19.4 or older.
What did you see instead?
Almost always, there are warnings/log messages printed including:
(Those log messages may show up only after the Cocoa window is selected.)
Sometimes it also prints:
Also observed:
(The OS was not shutting down. Restarting had no effect.)
It's not completely deterministic: sometimes the program will exit due to an error, or fail to render graphics properly if it keeps running. However, running the same program with Go 1.19.4 will work okay, and then re-running it with Go 1.20 RC 1 will also work (still with warnings, but graphics will render normally). From a quick look, it seems that modifying a shader source will cause it to start to fail again, so it's possible running it with Go 1.19.4 causes those to be built successfully, cached and reused.
This might very well be a problem with the Go program accessing those macOS APIs in an unsafe way, or a problem in macOS itself, but it only started happening at tip and doesn't happen when reverting to an older stable Go version.
CC @golang/runtime.
The text was updated successfully, but these errors were encountered: