Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NativeAOT symbols are broken on linux when publishing Release #77407

Closed
agocke opened this issue Oct 24, 2022 · 8 comments
Closed

NativeAOT symbols are broken on linux when publishing Release #77407

agocke opened this issue Oct 24, 2022 · 8 comments

Comments

@agocke
Copy link
Member

agocke commented Oct 24, 2022

I would expect symbols to be worse, but not broken in this configuration.

It looks like change is caused by passing -O to clang when linking the binaries. Without -O, debugging symbols are present and functional. With -O, all symbols and stacks are broken.

It's something in the DWARF info emitted by ILC.

@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Oct 24, 2022
@agocke agocke added this to the 7.0.x milestone Oct 24, 2022
@agocke agocke removed the untriaged New issue has not been triaged by the area owner label Oct 24, 2022
@MichalStrehovsky
Copy link
Member

MichalStrehovsky commented Oct 27, 2022

With -O, all symbols and stacks are broken.

Do you have repro steps? I just debugged a coredump in #77522 (comment) and stacks looked fine.

I can't find where we would be passing -O to clang in our build targets.

@agocke
Copy link
Member Author

agocke commented Oct 27, 2022

Repro is in an Ubuntu 22.04 WSL instance with RC2, publishing helloworld using dotnet publish -r linux-x64 -c Release -bl.

Afterwards I open up the exe in GDB and I get the following:

GNU gdb (Ubuntu 12.0.90-0ubuntu1) 12.0.90
Reading symbols from ./bin/Release/net7.0/linux-x64/publish/aot1...
Dwarf Error: Could not find abbrev number 94 in CU at offset 0x0 [in module /home/andy/tmp/aot1/bin/Release/net7.0/linux-x64/publish/aot1]
(No debugging symbols found in ./bin/Release/net7.0/linux-x64/publish/aot1)

Function names are still around, but line number and file info is lost, as well as the symbol table.

You're right about -O though, I can't see that in the command line. I must have accidentally added that manually when altering the linking command to build against local bits. My clang invocation is

"clang" "obj/Release/net7.0/linux-x64/native/aot1.o" -o "bin/Release/net7.0/linux-x64/native/aot1" /home/andy/.nuget/packages/runtime.linux-x64.microsoft.dotnet.ilcompiler/7.0.0-rc.2.22472.3/sdk/libbootstrapper.a /home/andy/.nuget/packages/runtime.linux-x64.microsoft.dotnet.ilcompiler/7.0.0-rc.2.22472.3/sdk/libRuntime.WorkstationGC.a /home/andy/.nuget/packages/runtime.linux-x64.microsoft.dotnet.ilcompiler/7.0.0-rc.2.22472.3/framework/libSystem.Native.a /home/andy/.nuget/packages/runtime.linux-x64.microsoft.dotnet.ilcompiler/7.0.0-rc.2.22472.3/framework/libSystem.Globalization.Native.a /home/andy/.nuget/packages/runtime.linux-x64.microsoft.dotnet.ilcompiler/7.0.0-rc.2.22472.3/framework/libSystem.IO.Compression.Native.a /home/andy/.nuget/packages/runtime.linux-x64.microsoft.dotnet.ilcompiler/7.0.0-rc.2.22472.3/framework/libSystem.Net.Security.Native.a /home/andy/.nuget/packages/runtime.linux-x64.microsoft.dotnet.ilcompiler/7.0.0-rc.2.22472.3/framework/libSystem.Security.Cryptography.Native.OpenSsl.a -g -Wl,-rpath,'$ORIGIN' -Wl,--build-id=sha1 -Wl,--as-needed -pthread -lstdc++ -ldl -lm -lz -lrt -pie -Wl,-z,relro -Wl,-z,now -Wl,--discard-all -Wl,--gc-sections

@agocke
Copy link
Member Author

agocke commented Oct 27, 2022

AH, my mistake, it's not the linking command that makes a difference, it's passing -O to ILC. Without it, debugging info is fine.

agocke added a commit to dotnet/llvm-project that referenced this issue Nov 15, 2022
The DWARF specification states that the form of an exprloc consists
of an unsigned LEB128 length value, followed by the encoded location bytes
of the specified length. For some reason we were adding one to the length
value being emitted. This looks incorrect to me. The above calculation for
REG-REG (a variable stored in two registers) correctly calculates the length
of each register type tag, plus the size of the interpolating PIECE tags,
plus the size of notation for each register. The extra byte looks wrong.

I've tested this locally and it appears to resolve dotnet/runtime#77407.

Unfortunately, it also causes llvm-dwarfdump --verify to constantly
complain about missing base addresses. I can't confirm at the moment,
but my suspicion is that this is revealing an existing bug. Even if this
is somehow causing a new bug, I think the resulting symbols with this
change are better than the alternative (no working symbols at all).
agocke added a commit to dotnet/llvm-project that referenced this issue Nov 15, 2022
The DWARF specification states that the form of an exprloc consists
of an unsigned LEB128 length value, followed by the encoded location bytes
of the specified length. For some reason we were adding one to the length
value being emitted. This looks incorrect to me. The above calculation for
REG-REG (a variable stored in two registers) correctly calculates the length
of each register type tag, plus the size of the interpolating PIECE tags,
plus the size of notation for each register. The extra byte looks wrong.

I've tested this locally and it appears to resolve dotnet/runtime#77407.

Unfortunately, it also causes llvm-dwarfdump --verify to constantly
complain about missing base addresses. I can't confirm at the moment,
but my suspicion is that this is revealing an existing bug. Even if this
is somehow causing a new bug, I think the resulting symbols with this
change are better than the alternative (no working symbols at all).
agocke added a commit to dotnet/llvm-project that referenced this issue Nov 16, 2022
The DWARF specification states that the form of an exprloc consists
of an unsigned LEB128 length value, followed by the encoded location bytes
of the specified length. For some reason we were adding one to the length
value being emitted. This looks incorrect to me. The above calculation for
REG-REG (a variable stored in two registers) correctly calculates the length
of each register type tag, plus the size of the interpolating PIECE tags,
plus the size of notation for each register. The extra byte looks wrong.

I've tested this locally and it appears to resolve dotnet/runtime#77407.

Unfortunately, it also causes llvm-dwarfdump --verify to constantly
complain about missing base addresses. I can't confirm at the moment,
but my suspicion is that this is revealing an existing bug. Even if this
is somehow causing a new bug, I think the resulting symbols with this
change are better than the alternative (no working symbols at all).

(cherry picked from commit b85b64b)
filipnavara pushed a commit to filipnavara/llvm-project that referenced this issue Nov 16, 2022
The DWARF specification states that the form of an exprloc consists
of an unsigned LEB128 length value, followed by the encoded location bytes
of the specified length. For some reason we were adding one to the length
value being emitted. This looks incorrect to me. The above calculation for
REG-REG (a variable stored in two registers) correctly calculates the length
of each register type tag, plus the size of the interpolating PIECE tags,
plus the size of notation for each register. The extra byte looks wrong.

I've tested this locally and it appears to resolve dotnet/runtime#77407.

Unfortunately, it also causes llvm-dwarfdump --verify to constantly
complain about missing base addresses. I can't confirm at the moment,
but my suspicion is that this is revealing an existing bug. Even if this
is somehow causing a new bug, I think the resulting symbols with this
change are better than the alternative (no working symbols at all).
@agocke
Copy link
Member Author

agocke commented Nov 17, 2022

Re-opening to track servicing fix.

@agocke agocke reopened this Nov 17, 2022
agocke added a commit to dotnet/llvm-project that referenced this issue Nov 22, 2022
The DWARF specification states that the form of an exprloc consists
of an unsigned LEB128 length value, followed by the encoded location bytes
of the specified length. For some reason we were adding one to the length
value being emitted. This looks incorrect to me. The above calculation for
REG-REG (a variable stored in two registers) correctly calculates the length
of each register type tag, plus the size of the interpolating PIECE tags,
plus the size of notation for each register. The extra byte looks wrong.

I've tested this locally and it appears to resolve dotnet/runtime#77407.

Unfortunately, it also causes llvm-dwarfdump --verify to constantly
complain about missing base addresses. I can't confirm at the moment,
but my suspicion is that this is revealing an existing bug. Even if this
is somehow causing a new bug, I think the resulting symbols with this
change are better than the alternative (no working symbols at all).

(cherry picked from commit b85b64b)
@MichalStrehovsky
Copy link
Member

The fix is in the LLVM objwriter release branch and we picked up the change in release/7.0 of this repo. I think this can be closed.

@TheSpydog
Copy link
Contributor

Did this fix make it into .NET 7.0.1? I'm still running into this problem in a non-trivial project with the 7.0.1 tooling.

(Interestingly, upgrading from 7.0.0 to 7.0.1 did fix this issue for a simple Hello World app, but I'm not sure if it's a direct result of this change or if it was a complete fluke.)

@MichalStrehovsky
Copy link
Member

I think it will only be in 7.0.2.

7.0.1 happened in mid-november: https://github.com/dotnet/runtime/milestone/107?closed=1 and this change only landed after: dotnet/llvm-project#321

@ghost ghost locked as resolved and limited conversation to collaborators Feb 4, 2023
agocke added a commit to dotnet/llvm-project that referenced this issue Feb 7, 2023
* Apply llvm.patch

Taken from https://github.com/dotnet/runtime/blob/7ab969c84ef05ba948c0075392716ce335b47744/src/coreclr/tools/aot/ObjWriter/llvm.patch.

* Add objwriter library

* Taken from https://github.com/dotnet/runtime/tree/7ab969c84ef05ba948c0075392716ce335b47744/src/coreclr/tools/aot/ObjWriter.
* Updated README.md
* Updated CMakeLists.txt to remove reference to CORECLR_INCLUDE_DIR.
* Added cordebuginfo.h, cvconst.h, cfi.h from coreclr/inc at the above commit.

* Build the ObjWriter package

* Add ObjWriter API to set DWARF version (#161)

Contributes to https://github.com/dotnet/runtimelab/issues/1738.

* Add `.note.GNU-stack` section to produced executables (#162)

Do this unconditionally because there's no scenario where we would need executable stack for managed code.

* Remove Darwin workaround (#163)

This caught my attention as I was looking at the ObjWriter. LLVM no longer emits a `LC_VERSION_MIN_MACOSX` load command unless we explicitly set a version. I don't see a difference in `llvm-objdump -macho -x foo.o` with/without these lines (I didn't bother myself to boot into macOS to run `otool`).

* Fix llvm-dwarfdump warnings (#164)

Fixes https://github.com/dotnet/runtimelab/issues/1535. No warnings left with llvm-dwarfdump from LLVM 12.

* Revert "Fix llvm-dwarfdump warnings (#164)" (#218)

This reverts commit afc9070.

* Add new NuGet package, `Microsoft.NETCore.Runtime.JIT.Tools`, includes `FileCheck` and `llvm-mca` (#256)

https://github.com/dotnet/runtime is wanting to start writing assembly (x64/ARM64) verification tests. Instead of building our own tool to support writing those kinds of tests, we want to leverage LLVM's `FileCheck`.

We also want to include `llvm-mca` at the request of @EgorBo

This PR creates a new NuGet package for `dotnet/runtime` to consume which we named `Microsoft.NETCore.Runtime.JIT.Tools`. So far, this package only includes LLVM's `FileCheck` and `llvm-mca` tools.

* [ObjWriter] Enable DWARF debug information emitting for Mach-O (#269)

* Account for GOT VariantKind on osx-arm64 (#185)

* Add API for emitting compact unwind encoding, enforce DWARF encoding if not explicitly overridden

* Add comment

* Update ObjWriter to LLVM 14 API

* Add support for generating uninitialized sections (#306)

We support `.bss` but not custom sections that are bss-like. This adds such support.

* Do not indiscriminately create text section (#312)

If we ended up with nothing in the text section, this line would error LLVM out in:

https://github.com/dotnet/llvm-project/blob/3db8d68195c17386557f1a258312bbae4051dc05/llvm/lib/MC/ELFObjectWriter.cpp#L1458-L1459

Because we generate a reference to the empty text section in the `aranges` section.

I double checked and debugging on Linux still works fine without this. `SetCodeSectionAttribute` is an objwriter API and we have access to it from the managed side. We should be calling it from there if it's needed for something that I didn't realize (we do call it from the managed side for the `.managed` section, but that one actually has debug information generated, unlike `.text`).

* Fix off-by-one error in DWARF reg-reg location (#317)

The DWARF specification states that the form of an exprloc consists
of an unsigned LEB128 length value, followed by the encoded location bytes
of the specified length. For some reason we were adding one to the length
value being emitted. This looks incorrect to me. The above calculation for
REG-REG (a variable stored in two registers) correctly calculates the length
of each register type tag, plus the size of the interpolating PIECE tags,
plus the size of notation for each register. The extra byte looks wrong.

I've tested this locally and it appears to resolve dotnet/runtime#77407.

Unfortunately, it also causes llvm-dwarfdump --verify to constantly
complain about missing base addresses. I can't confirm at the moment,
but my suspicion is that this is revealing an existing bug. Even if this
is somehow causing a new bug, I think the resulting symbols with this
change are better than the alternative (no working symbols at all).

* Setting context object file info

* Add verbosity to linux x64 pipeline

In order to understand what is happening with std path error.

* Revert "Add verbosity to linux x64 pipeline"

This reverts commit 5c4636e.

* Upgrading linux build image

* [Temporary] Adding verbosity to get more pipeline error info

* Update image name for linux x64

* Fix Linux x64 build

* Revert "[Temporary] Adding verbosity to get more pipeline error info"

This reverts commit 9d76b36.

* Updating Build_Linux_musl timeout

* Update linux-musl Docker images

* Fix linux-musl-x64 build

* Setting clang/++ version 15 for linux musl

* Copying clang/clan++ vars to unix-like OS

* Fix cut & paste error

* Fix objcopy and strip path in cross-compilation

* Update azure-pipelines.yml

$(ClangVersion) $(ClangPlusVersion) weren't defined for OSX and should be defined for every Linux

* Bump timeout for Linux musl build

* Clean up .gitignore

* Consolidate Clang[Plus]Version into ClangVersionArg

* Move CLANG_TARGET from environment into build parameter
Always quote _BuildConfig on command line so empty value is not accidentally using next parameter as the value

* Update URL in cordebuginfo.h to point to dotnet/runtime

* Bump Windows build timeout to 210

* Fix a typo in compiler name

* Revert $(_BuildConfig) -> "$(_BuildConfig)" change

* Change ClangTarget to ClangTargetArg since apparently it gets propagated as environment variable into wrong steps

* Fix inadvertent change

* Bump timeout everywhere

---------

Co-authored-by: Michal Strehovský <[email protected]>
Co-authored-by: Andy Gocke <[email protected]>
Co-authored-by: Will Smith <[email protected]>
Co-authored-by: Adeel Mujahid <[email protected]>
Co-authored-by: Brian Bohe <[email protected]>
Co-authored-by: Alexander Köplinger <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
Archived in project
Development

No branches or pull requests

3 participants