-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/link: Incorrect symbol linked in darwin/arm64 #58935
Comments
Thanks for the report. It would be helpful to know what the exact linker invocation is, and also whether you are using internal linking or external linking (latter is what you would typically get if your program uses CGO). |
Also, how big the binary is. (For very large binaries it needs trampolines, which could be where the problem is) |
Does it not fail with Go 1.19.x? That information would also be helpful. Thanks. |
We are using external linking mode, with these flags:
This The binary is also fairly large - haven't collected the size of all of the failing targets but the ones I did investigate were all upwards of 300mb. Some of the failing targets are even larger, reaching up to ~800mb. And yup, this does not happen on any of the 1.19.x releases. |
The use of external linking plus the fact that runtime.duff* routines are involved suggests that this might be a problem similar to the one referred to in CL 469275 (where the external linker is getting confused due to the non-zero addend on the call relocation). With that in mind, if you could let us know what flavor of linker you are using (ld.ldd, ld.bfd, etc) as well as the version. |
We're using ld.ldd.
|
Thanks. For a mis-compiled binary, could you show the incorrect instruction, and from the symbol table the address of the incorrect target function, as well as all duff symbols (i.e. symbols containing |
Could you try if CL https://golang.org/cl/474620 makes any difference? (You'll need to rebuild the Go toolchain with the CL patched in.) Initially I thought it may be related. But I'm not really sure now. Maybe it is something else... |
Sure. I pulled this from one of the failing binaries: The incorrect instruction is at 0x107c82630.
The incorrect target function (reflect.Value.Uint) is at 0x1000adc60.
Here are some duff symbols:
I've actually found this code in the linker, and tried to do the same change you made. It doesn't make any difference. |
Thanks. I think I understand it now. Will send a CL tomorrow. |
Updated CL https://golang.org/cl/474620 to be what I think may work. Could you try that? Thanks! |
Change https://go.dev/cl/474620 mentions this issue: |
@cherrymui Thanks! I confirmed this does fix the issues in the known failing targets. Still waiting to run the entire test suite in our Go Monorepo. That'll take some time but I'll update once that's done. |
@gopherbot please backport this to Go 1.20 release. This could cause binaries to be built incorrectly. Thanks. |
Backport issue(s) opened: #58954 (for 1.20). Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://go.dev/wiki/MinorReleases. |
Change https://go.dev/cl/475175 mentions this issue: |
We've verified that the fix does not break any further builds so far. The full suite is yet to finish, but we've covered enough tests so far to call this fix good on our end. Thank you for the quick turnaround. |
@sywhang thanks for confirming! |
…s on darwin/arm64 On darwin, the external linker generally supports CALL relocations with addend. One exception is that for a very large binary when it decides to insert a trampoline, instead of applying the addend to the call target (in the trampoline), it applies the addend to the CALL instruction in the caller, i.e. generating a call to trampoline+addend, which is not the correct address and usually points to unreloated functions. To work around this, we use label symbols so the CALL is targeting a label symbol without addend. To make things simple we always use label symbols for CALLs with addend (in external linking mode on darwin/arm64), even for small binaries. Updates #58935. Fixes #58954. Change-Id: I38aed6b62a0496c277c589b5accbbef6aace8dd5 Reviewed-on: https://go-review.googlesource.com/c/go/+/474620 TryBot-Result: Gopher Robot <[email protected]> Run-TryBot: Cherry Mui <[email protected]> Reviewed-by: Than McIntosh <[email protected]> (cherry picked from commit 7dbd6de) Reviewed-on: https://go-review.googlesource.com/c/go/+/475175
Hello,
I'm from the Go Platform team at Uber, and we've been running into what appears to be a linker bug in macOS/M1 while trying to upgrade to Go 1.20.
What version of Go are you using (
go version
)?This repros on all of 1.20 minor point releases 1.20.2.
Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (
go env
)?The issue only manifests in M1 macs.
We are in a Bazel sandbox environment, using rules_go.
go env
OutputWhat did you do?
We received reports of some tests in our Go Monorepo that are only failing in M1 after upgrading to Go 1.20.
The panic trace depends on the failing targets, but all of them panic in some form during init.
Invalid return address:
Panics from callee:
case 1:
case 2:
Looking through the disassembly, we're seeing calls to
runtime.duffzero
getting linked with some arbitrary functions in the problematic targets. If the linked callee panics from invalid args, then it causes this panic to occur. Sometimes the panic happens because the linked target expects a different stack size from one caller set up, and panics from invalid return pc.Below is part of the disassembled init func of one of monorepo dependencies: (github.com/shopspring/decimal).
This is what we see in the intermediate archive file generated for
compile
, before it's linked.But in the final binary, we're seeing the linker somehow linked the call to
runtime.duffzero
withreflect.Value
in the same init function:Similar issue happens with
runtime.duffcopy
in another target:Pre-linking:
Post-linking:
This issue does not occur with every binary that uses these dependencies, but only some of them. Another point worth noting is that when we change the binary layout by turning inline optimization off or all optimizations off with
-N
/-l
gcflags, the issues go away, but it starts happening on some other targets that were passing with the optimizations.This issue does not occur on any other environments we have (Linux amd64 or darwin amd64).
What did you expect to see?
Linker correctly links correct binaries.
What did you see instead?
Panics as described above.
The text was updated successfully, but these errors were encountered: