Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cgo: go link detects all linker flags as unsupported when using custom cc toolchain #3886

Closed
mikedanese opened this issue Mar 12, 2024 · 14 comments · Fixed by #4144
Closed

Comments

@mikedanese
Copy link
Contributor

mikedanese commented Mar 12, 2024

What version of rules_go are you using?

master sync'd to latest

mike.danese@ip-10-110-30-195:~/rules_go$ git show
commit 36e04e9f1e9b56c865332edfb49e1620beaedf2c

What version of Bazel are you using?

mike.danese@ip-10-110-30-195:~/rules_go$ bazel version
Bazelisk version: development
Build label: 7.1.0
Build target: @@//src/main/java/com/google/devtools/build/lib/bazel:BazelServer
Build time: Mon Mar 11 17:55:31 2024 (1710179731)
Build timestamp: 1710179731
Build timestamp as int: 1710179731
mike.danese@ip-10-110-30-195:~/rules_go$ git show
commit 36e04e9f1e9b56c865332edfb49e1620beaedf2c

Does this issue reproduce with the latest releases of all the above?

What operating system and processor architecture are you using?

Linux

What did you see instead?

I have a custom cc toolchain based on LLVM. This toolchain sets a sysroot link option, e.g. --sysroot=external/sysroot-linux-x86_64-gnu7-llvm-1.

When a cgo library is generated, the cgo tool converts this LD_FLAG value into an absolute path here: https://cs.opensource.google/go/go/+/master:src/cmd/cgo/main.go;l=353-357;drc=bdccd923e914ab61d77a8f23a3329cf1d5aaa7c1. For example, my sysroot ldflag might be converted to /home/mike.danese/.cache/bazel/_bazel_mike.danese/71e455921cc87b7db5489570176e2cff/sandbox/linux-sandbox/400/execroot/universe/external/sysroot-linux-x86_64-gnu7-llvm-16. These LD_FLAGs get backed into the object.

Now when it's time to link, go tool link probes the linker (in linkerFlagSupported) to determine if e.g. -no-pie should be configured: https://cs.opensource.google/go/go/+/master:src/cmd/link/internal/ld/lib.go;l=1833-1834;drc=2ab9218c86ed625362df5060f64fcd59398a76f3. It does this by compiling a trivial program. To construct the compile command, the ldflags from the cgo object are appended to the extdld flags: https://cs.opensource.google/go/go/+/master:src/cmd/link/internal/ld/lib.go;l=2079-2080;drc=2ab9218c86ed625362df5060f64fcd59398a76f3. This results in a command like this:

external/llvm-toolchain-16-0-4-linux-x86_64/bin/clang \
  -m64 -fuse-ld=lld -Wl,--build-id=md5 -Wl,--hash-style=gnu -Wl,-z,relro,-z,now -Wl,-z,separate-code \
  --sysroot=external/sysroot-linux-x86_64-gnu7-llvm-16 -pthread -fuse-ld=lld \
  -Wl,--build-id=md5 -Wl,--hash-style=gnu -Wl,-z,relro,-z,now -Wl,-z,separate-code \
  --sysroot=/home/mike.danese/.cache/bazel/_bazel_mike.danese/71e455921cc87b7db5489570176e2cff/sandbox/linux-sandbox/423/execroot/universe/external/sysroot-linux-x86_64-gnu7-llvm-16 \
  -pthread -o /tmp/go-link-606451175/a.out -no-pie /tmp/go-link-606451175/trivial.c

sysroot arg is passed twice, once from --extld flags and once from the go_ldflags attribute in the cgo object so the second --sysroot wins. This is a problem because the second sysroot references a path in the no longer existant execution sandbox where the cgo object was compiled. Thus, the probe fails with a nonsensical error:

ld.lld: error: cannot open crt1.o: No such file or directory
ld.lld: error: cannot open crti.o: No such file or directory
ld.lld: error: cannot open crtbegin.o: No such file or directory
ld.lld: error: unable to find library -lgcc
ld.lld: error: unable to find library -lgcc_s
ld.lld: error: unable to find library -lpthread
ld.lld: error: unable to find library -lc
ld.lld: error: unable to find library -lgcc
ld.lld: error: unable to find library -lgcc_s
ld.lld: error: cannot open crtend.o: No such file or directory
ld.lld: error: cannot open crtn.o: No such file or directory
clang: error: linker command failed with exit code 1 (use -v to see invocation)

This error is swallowed silently, and go link assumes that the linker flag is unsupported. In the case of -no-pie, this causes rules_go to attempt to link PIE cgo objects into go binaries without relro (e.g. in normal exe buildmode on linux). This causes hard to debug runtime errors such as:

runtime: pcHeader: magic= 0xfffffff1 pad1= 0 pad2= 0 minLC= 1 ptrSize= 8 pcHeader.textStart= 0x1be100 text= 0x557164240100 pluginpath=
fatal error: invalid function symbol table
runtime: panic before malloc heap initialized

runtime stack:
runtime.throw({0x5571640f265a?, 0x0?})
        GOROOT/src/runtime/panic.go:1077 +0x5c fp=0x7fff932345a8 sp=0x7fff93234578 pc=0x55716427741c
runtime.moduledataverify1(0x8?)
        GOROOT/src/runtime/symtab.go:533 +0x816 fp=0x7fff932346c8 sp=0x7fff932345a8 pc=0x5571642958f6
runtime.moduledataverify(...)
        GOROOT/src/runtime/symtab.go:519
runtime.schedinit()
        GOROOT/src/runtime/proc.go:726 +0x4c fp=0x7fff93234710 sp=0x7fff932346c8 pc=0x55716427af6c
runtime.rt0_go()
        src/runtime/asm_amd64.s:349 +0x11c fp=0x7fff93234718 sp=0x7fff93234710 pc=0x5571642aa17c

There are ~10 other calls to linkerFlagSupported and none of them are correctly detecting flag support.

@mikedanese
Copy link
Contributor Author

mikedanese commented Mar 12, 2024

@ianlancetaylor, was the ordering of extldflags before the LDFLAGS from cgo directives in linkerFlagSupported picked intentionally? I can't tell if it's a terrible solution, but swapping the order would give the go link extdflags precedence over cgo directives for mutually exclusive linker options.

func linkerFlagSupported(arch *sys.Arch, linker, altLinker, flag string) bool {
	...
	moreFlags := trimLinkerArgv(append(flagExtldflags, ldflag...))

From here: https://cs.opensource.google/go/go/+/master:src/cmd/link/internal/ld/lib.go;l=2079;drc=1e433915ce684049a6a44fd506f691f448b56c76

@fmeum
Copy link
Member

fmeum commented Mar 13, 2024

I can't answer that question, but ideally we wouldn't include any absolute paths into the outputs of compilation. Maybe we could use a symlink instead of converting the sysroot to an absolute path?

That said, if flipping the order of flags is a simple way to fix this and no tests break, I'm happy to accept that change.

@ianlancetaylor
Copy link

I don't understand what is creating the absolute path. The code you mention in cmd/cgo only applies to Go file names, not to options like --sysroot=dir.

@mikedanese
Copy link
Contributor Author

Ah ya, misread that. Thanks.

So this is actually happening in rules_go. The flags are getting absolute here:

https://github.com/bazelbuild/rules_go/blob/aeb83e878033ef357642c87122e193df44da03fe/go/tools/builders/stdlib.go#L161-L165

So we have a --sysroot from a custom cc toolchain. When building the stdlib with go install, we make all CGO_LDFLAGS absolute, presumably because go install changes directories. If we don't, then we get an error like:

_cgo_export.c:3:10: fatal error: 'stdlib.h' file not found

@adbmal
Copy link

adbmal commented Apr 5, 2024

I encountered the same issue. We use a custom cc toolchain, the binaries built success but cannot run.

runtime: pcHeader: magic= 0xfffffff1 pad1= 0 pad2= 0 minLC= 1 ptrSize= 8 pcHeader.textStart= 0x1be100 text= 0x557164240100 pluginpath=
fatal error: invalid function symbol table
runtime: panic before malloc heap initialized

According @mikedanese , I try to remove the codes of absEnv as mentioned above , but not works, the issue persists.
https://github.com/bazelbuild/rules_go/blob/aeb83e878033ef357642c87122e193df44da03fe/go/tools/builders/stdlib.go#L161-L165

I found that when rule_go link the binary, it doesn't fetch clink flags through clinkopts; instead, it reads gc_linkopts and puts the flags after -extldflags into _extract_extldflags.
https://github.com/bazelbuild/rules_go/blob/c0ef535977f9fd2d9a67243552cd04da285ab629/go/private/actions/link.bzl#L64-L106

https://github.com/bazelbuild/rules_go/blob/c0ef535977f9fd2d9a67243552cd04da285ab629/go/private/actions/link.bzl#L230-L240

So, after adding the following flags to my go_binary, it started running correctly.

  gc_linkopts = [
       "-extldflags",
       "-nopie",
   ],

@AlessandroPatti
Copy link
Contributor

Wonder if this is also fixed by #4009

@voxeljorge
Copy link
Contributor

I believe I'm also running into this issue while trying to set up llvm with a custom sysroot. I'm finding that cgo builds (in my case a build with the race detector which seems to turn cgo on by default) in my case trigger linker errors that look like this:

ld.lld: error: relocation R_X86_64_64 cannot be used against symbol 'type:.eq.os/exec.Error'; recompile with -fPIC

After diving in fairly deep, I found that go tool link is calling my the linker with __GO_BAZEL_CC_PLACEHOLDER__ all over the place, which seems to me like perhaps an environment variable leaked out somewhere without getting substituted. The link tool runs something that looks like this to determine if linker flags are supported:

external/toolchains_llvm~~llvm~llvm_toolchain/bin/cc_wrapper.sh "-m64" "-fuse-ld=lld" "-Wl,--build-id=md5" "-Wl,--hash-style=gnu" "-Wl,-z,relro,-z,now" "--sysroot=external/_main~_repo_rules~ubuntu_20_04_sysroot/" "-fuse-ld=lld" "-Wl,--build-id=md5" "-Wl,--hash-style=gnu" "-Wl,-z,relro,-z,now" "--sysroot=external/_main~_repo_rules~ubuntu_20_04_sysroot/" "-pthread" "-fuse-ld=lld" "-Wl,--build-id=md5" "-Wl,--hash-style=gnu" "-Wl,-z,relro,-z,now" "--sysroot=__GO_BAZEL_CC_PLACEHOLDER__external/_main~_repo_rules~ubuntu_20_04_sysroot/" "-pthread" "-fuse-ld=lld" "-Wl,--build-id=md5" "-Wl,--hash-style=gnu" "-Wl,-z,relro,-z,now" "--sysroot=__GO_BAZEL_CC_PLACEHOLDER__external/_main~_repo_rules~ubuntu_20_04_sysroot/" "-pthread" "-fuse-ld=lld" "-Wl,--build-id=md5" "-Wl,--hash-style=gnu" "-Wl,-z,relro,-z,now" "--sysroot=__GO_BAZEL_CC_PLACEHOLDER__external/_main~_repo_rules~ubuntu_20_04_sysroot/" "-pthread" "-fuse-ld=lld" "-Wl,--build-id=md5" "-Wl,--hash-style=gnu" "-Wl,-z,relro,-z,now" "--sysroot=__GO_BAZEL_CC_PLACEHOLDER__external/_main~_repo_rules~ubuntu_20_04_sysroot/" "-pthread" "-o" "/tmp/go-link-2282735968/a.out" "-Wl,--export-dynamic-symbol=main" "/tmp/trivial.c"

The above command fails because it can't actually find the sysroot.

@voxeljorge
Copy link
Contributor

So after even more digging, I've found that it seems like this string ends up embedded in the race detector somewhere. Strings on one of the race detector standard library .a files comes up with this:

[["cgo_ldflag","--target=x86_64-unknown-linux-gnu"],["cgo_ldflag","-lm"],["cgo_ldflag","-no-canonical-prefixes"],["cgo_ldflag","-fuse-ld=lld"],["cgo_ldflag","-Wl,--build-id=md5"],["cgo_ldflag","-Wl,--hash-style=gnu"],["cgo_ldflag","-Wl,-z,relro,-z,now"],["cgo_ldflag","-l:libc++.a"],["cgo_ldflag","-l:libc++abi.a"],["cgo_ldflag","-l:libunwind.a"],["cgo_ldflag","-lpthread"],["cgo_ldflag","-ldl"],["cgo_ldflag","-rtlib=compiler-rt"],["cgo_ldflag","--sysroot=__GO_BAZEL_CC_PLACEHOLDER__external/_main~_repo_rules~ubuntu_20_04_sysroot/"],["cgo_ldflag","-pthread"],["cgo_ldflag","-ldl"],["cgo_import_static","_cgo_41342379732c_Cfunc_pluginLookup"],["cgo_import_static","_cgo_41342379732c_Cfunc_pluginOpen"],["cgo_import_static","_cgo_41342379732c_Cfunc_realpath"],["cgo_import_dynamic","__libc_start_main","__libc_start_main#GLIBC_2.2.5","libc.so.6"],["cgo_import_dynamic","__gmon_start__","__gmon_start__",""],["cgo_import_dynamic","__register_frame_info","__register_frame_info",""],["cgo_import_dynamic","__cxa_finalize","__cxa_finalize#GLIBC_2.2.5","libc.so.6"],["cgo_import_dynamic","__deregister_frame_info","__deregister_frame_info",""],["cgo_import_dynamic","dlsym","dlsym#GLIBC_2.2.5","libdl.so.2"],["cgo_import_dynamic","dlerror","dlerror#GLIBC_2.2.5","libdl.so.2"],["cgo_import_dynamic","dlopen","dlopen#GLIBC_2.2.5","libdl.so.2"],["cgo_import_dynamic","realpath","realpath#GLIBC_2.3","libc.so.6"],["cgo_import_dynamic","_","_","libm.so.6"],["cgo_import_dynamic","_","_","libpthread.so.0"],["cgo_import_dynamic","_","_","libdl.so.2"],["cgo_import_dynamic","_","_","libc.so.6"]]

I only seem to have issues building programs with the race detector turned on.

@voxeljorge
Copy link
Contributor

Ah right, I had missed the comments before about the ordering of flags in linkerFlagsSupported. I'm indeed running into the same issue, where the valid sysroot is passing through but only at the beginning of the arg list and not the end.

@voxeljorge
Copy link
Contributor

I have a patch that seems to be working for my case at least and opened a PR. I don't really know if this is the right solution but at least a problem here appears to be that #4009 was only a partial fix and only handled cases for compile operations but not link operations. Unfortunately because of how go tool link works I couldn't get the exact method working and had to write a sort of hacky script-based solution instead.

@voxeljorge
Copy link
Contributor

Here's the upstream issue: golang/go#69954

I'll try to get to a test this week.

voxeljorge added a commit to voxeljorge/rules_go that referenced this issue Oct 23, 2024
@fmeum fmeum closed this as completed in 97721d0 Oct 25, 2024
@lromor
Copy link

lromor commented Nov 14, 2024

@voxeljorge, I'm facing a similar issue:

ld.lld: error: relocation R_X86_64_64 cannot be used against symbol 'type:.eq.sync/atomic.Pointer[error]'; recompile with -fPIC
>>> defined in /tmp/go-link-3283695128/go.o
>>> referenced by go.go
>>>               /tmp/go-link-3283695128/go.o:(.rodata+0x1B78)

In my case I doubt it's a sysroot issue. I'm trying to make the "hermetic" llvm toolchain work under NixOS. Do you have any idea/tool to inspect better what's wrong?

@voxeljorge
Copy link
Contributor

Hard to say, the issue I had really wasn't with relocation it was just that the linker was getting the wrong flags because all the flags were reported as unsupported.

The simplest thing you could try is just adding -extldflags -v or something like that to make the linker more verbose. I wouldn't be able to tell you if your command is correct though, that would probably require a bit more spelunking than I have done.

@lromor
Copy link

lromor commented Nov 14, 2024

Fixed, I missed a no-pie 🤗 . Adding verbosity helped!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants