Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ld64.lld: error: too many personalities (4) for compact unwind to encode #102754

Open
glandium opened this issue Oct 6, 2022 · 79 comments
Open
Assignees
Labels
A-linkage Area: linking into static, shared libraries and binaries C-bug Category: This is a bug. O-ios Operating system: iOS O-macos Operating system: macOS P-high High priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another. relnotes Marks issues that should be documented in the release notes of the next release. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Milestone

Comments

@glandium
Copy link
Contributor

glandium commented Oct 6, 2022

When mixing ObjC, C++ and Rust in the same binary on aarch64-apple-darwin targets (e.g. like in Firefox), lld from LLVM 15 complains about "too many personalities". This is a regression from #92845 and affects beta.

For some reason, since #92845, this code in lld is hit: https://github.com/llvm/llvm-project/blob/d30dccd21f2c3bc1ed6cd054c131436a1af548e1/lld/MachO/UnwindInfoSection.cpp#L300

Ultimately, this leads to https://github.com/llvm/llvm-project/blob/d30dccd21f2c3bc1ed6cd054c131436a1af548e1/lld/MachO/UnwindInfoSection.cpp#L378
where there are 4 personalities instead of 3:

  • ___objc_personality_v0,
  • ___gxx_personality_v0,
  • _rust_eh_personality
  • <internal>

With previous rust versions, only the first three ones would be there. It looks like the relocation that triggers the <internal> is related to the rust_eh_personality function in library/std/src/personality/gcc.rs itself.

STR:

  • Create a bar.cc file with content:
#include <stdexcept>

extern "C" void qux();

void bar() { qux(); throw std::runtime_error("foo"); }
  • Create a foo.m file with content:
void foo() { @try {} @finally {} }
  • Create a qux.rs file with content:
#![crate_type = "staticlib"]

extern "C" {
    fn foo();
}

#[no_mangle]
pub unsafe extern "C" fn qux() {
    println!("foo");
    foo();
}
  • Create a Makefile file with content:
TARGET = --target=aarch64-apple-darwin

lib.dylib: bar.o foo.o libqux.a
	clang++ -o $@ $^ -fuse-ld=lld -dynamiclib $(TARGET) -lobjc

bar.o: bar.cc
	clang++ -o $@ -c $^ $(TARGET)

foo.o: foo.m
	clang -o $@ -c $^ $(TARGET)

libqux.a: qux.rs
	rustc $^ $(TARGET)
  • Get clang 15 + lld and make sure it's in your $PATH.
  • Set SDKROOT to point to a macOS SDK
  • Run make

This does not happen on x86_64-apple-darwin, but interestingly the list of personalities there is:

  • ___objc_personality_v0,
  • ___gxx_personality_v0,
  • <internal>
    (no _rust_eh_personality).

With rust 1.64, the list of personalities was:

  • ___objc_personality_v0,
  • ___gxx_personality_v0,
  • _rust_eh_personality
@glandium glandium added the C-bug Category: This is a bug. label Oct 6, 2022
@nagisa
Copy link
Member

nagisa commented Oct 9, 2022

So, uh, wouldn’t this code basically mean that you cannot link a binary that consists of more than 3 languages with their own unwinding personalities? That seems like an unfortunate limitation, and I would argue that this is something ld.lld should be fixing, regardless.

As for why the <internal> gets constructed, I would imagine that with the move of the personality into libstd one of the codegen units now starts containing both the personality function and the functionality that depends on that personality (whereas before nothing in the standalone unwinding crate could unwind and thus did not reference the personality function.)

A workaround on our end could probably be to make sure these personality functions end up in a CU of their own.

@nagisa
Copy link
Member

nagisa commented Oct 9, 2022

Does this happen with other linkers (e.g. ld that ships with Xcode, if it is something other than lld?)

@nagisa nagisa added A-linkage Area: linking into static, shared libraries and binaries O-macos Operating system: macOS O-ios Operating system: iOS labels Oct 9, 2022
@glandium
Copy link
Contributor Author

It doesn't happen with ld64, but the resulting binary only lists 3 personalities as per llvm-object --unwind-info...

@glandium
Copy link
Contributor Author

Should this be labeled regression-stable-to-beta or something?

@Mark-Simulacrum Mark-Simulacrum added the regression-from-stable-to-beta Performance or correctness regression from stable to beta. label Oct 11, 2022
@rustbot rustbot added the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Oct 11, 2022
@Mark-Simulacrum Mark-Simulacrum added this to the 1.65.0 milestone Oct 11, 2022
@apiraino
Copy link
Contributor

WG-prioritization assigning priority (Zulip discussion).

@rustbot label -I-prioritize +P-critical

@rustbot rustbot added P-critical Critical priority and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Oct 11, 2022
@glandium
Copy link
Contributor Author

The LLVM bug has a fix for lld.

@bridiver
Copy link

The LLVM bug has a fix for lld.

@glandium do you have a link for the bug fix?

@glandium
Copy link
Contributor Author

@apiraino apiraino added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Oct 19, 2022
@bridiver
Copy link

btw - why is _rust_eh_personality exported at all when panic == "abort"? It only applies to unwind, right?

@rillian
Copy link
Contributor

rillian commented Oct 19, 2022

btw - why is _rust_eh_personality exported at all when panic == "abort"? It only applies to unwind, right?

This comment says it's because the distributed core/std libraries are compiled with panic=unwind and need an eh_personality symbol to link to even if it's not going the used.

Now, maybe it shouldn't be exported from a staticlib with that resolution having already happened?

@bridiver
Copy link

bridiver commented Oct 19, 2022

// To handle this the compiler only requires the `eh_personality` is defined if
// the panic runtime being linked in is the unwinding runtime, and otherwise
// it's not required to be defined (rightfully so). In this case, however, this
// library just defines this symbol so there's at least some personality
// somewhere.

It seems like this is actually having the reverse of the intended effect. The real problem is the lld bug, but adding this in when it isn't actually required doesn't seem great to me.

@wesleywiser
Copy link
Member

Does the lld fix need a backport to our LLVM fork?

@bridiver
Copy link

Does the lld fix need a backport to our LLVM fork?

I don't believe that would fix our issue because the error occurs during the final link of the rust static lib to chromium (which uses llvm 16). For us, we either need a fix in rust (don't export the unused and unnecessary symbol) or wait for the lld fix to land in chromium

@pnkfelix pnkfelix self-assigned this Oct 27, 2022
@bjorn3
Copy link
Member

bjorn3 commented Oct 27, 2022

rust_eh_personality will also be used in case of -Cpanic=abort for catching and aborting on exceptions crossing an extern "C-unwind" boundary in the future. This is in fact one of the reasons #92845 was opened.

@pnkfelix pnkfelix added the relnotes Marks issues that should be documented in the release notes of the next release. label Oct 27, 2022
@pnkfelix
Copy link
Member

pnkfelix commented Oct 27, 2022

@estebank suggested that we should make a note of this known bug issue in the release notes for 1.65. I think that's worth at least considering; @rust-lang/release any thoughts?

@bridiver
Copy link

bridiver commented Nov 2, 2022

@bridiver
Copy link

bridiver commented Nov 2, 2022

This reduces the number of distinct personalities we need in the final
binary, and helps us avoid hitting the "too many personalities" error.

this only handles the case of synthetic symbols

@bridiver
Copy link

also we still have the issue where rust stack traces are sometimes garbled (which seems likely to be related) and I'm not sure how to address that with no way to change the rust personality routine. We can test it (for now) by building std with abort, but even that option will apparently go away soon and we would have to patch rust if we wanted to change it

@Amanieu
Copy link
Member

Amanieu commented Jan 22, 2023

I think you should open a separate issue for that.

@bridiver
Copy link

I think you should open a separate issue for that.

I agree it's a separate issue, but I strongly suspect it's related.

@bridiver
Copy link

I don't think this is a very common issue. In practice it is extremely rare for custom personality functions (outside of those used by LLVM-generated code) to be used. In fact this chrome code is the first time I have ever seen one. Also this issue only affects apple targets which further restricts the set of applications that could be affected by this.

With enough people using rust, uncommon situations will happen with increasing frequency. I'm really struggling to understanding why you'd want to keep this known limitation that some people will encounter. There are a lot of things I've never personally seen and yet I know they exist, sometimes in large numbers.

@bridiver
Copy link

Also this issue only affects apple targets which further restricts the set of applications that could be affected by this.

I don't think that any cross platform app trivializes the set of applications that could be affected by this

@bridiver
Copy link

bridiver commented Jan 22, 2023

I'm saying that you can get the same effect as chrome's custom personality function by using __gxx_personality_v0 and modifying the assembly code a bit. This solves your problem which is that Apple's compact unwinding format only allows a maximum of 3 personality functions.

Can you explain a bit more here? I'm not seeing how modifying the assembly can produce the same effect without the custom personality function. There will be an exception handler which is the problem that the custom function addresses. From the issue:

When an exception occurs with CFRunLoop on the stack (as most of our code will have), 
then it will find the eh_frame from its @try/@catch block. 
The @catch(...)-all that CFRunLoopRunSpecific does then calls __cxa_rethrow, 
which begins again the proces of finding an exception handler. 
Since no other handler frame is on the stack, that will call std::terminate. 
But since this is a rethrow, the stack will only show up to where CFRunLoop caught the exception. 
This is what's producing our garbage stacks off the CFRunLoop. 
Basically, our code is no longer building as if -fno-exceptions were present, 
because we *do* enable ObjC exceptions (which are really the same thing).
This introduces base::mac::CallWithEHFrame(), which will run a block in a
stack frame that is set up to stop searching for an exception handler when it
is reached. This is used to prevent a top-level exception handler, installed
in CFRunLoopRunSpecific(), from catching and rethrowing C++ exceptions. When
it does this, the resulting stack traces are not useful for triage purposes. By
stoping the search for an exception handler, the runtime will call
std::terminate directly from the throwing stack trace.

@bjorn3
Copy link
Member

bjorn3 commented Jan 22, 2023

As I understand if __gxx_personality_v0 is used with the call site table empty, the personality function will call std::terminate during the search phase when all stack frames still exist as the search phase is way before the personality function for @catch in CFRunLoop gets a change to run and catch the exception to rethrow it.

@bridiver
Copy link

As I understand if __gxx_personality_v0 is used with the call site table empty, the personality function will call std::terminate during the search phase when all stack frames still exist as the search phase is way before the personality function for @catch in CFRunLoop gets a change to run and catch the exception to rethrow it.

but as it says in the issue there is an exception handler in CFRunLoopRunSpecific which causes the exception to be re-thrown and that's what causes the problem with the stack trace. My understanding is that the chromium personality function short circuits the exception handler search and then the default personality function works as you describe because it functions as if the call site table is empty when it's not.

@bridiver
Copy link

bridiver commented Jan 23, 2023

I think I found a hacky workaround by renaming CxxPersonalityRoutine -> rust_eh_personality and making it extern __attribute__((weak)) "C". I also modified the corresponding asm and ensured that the static library containing the chromium definition was listed before the rust static library. I then threw an exception outside of a CallWithEHFrame block and got *** Terminating app due to uncaught exception.... and inside the block I just got Received signal 4 <unknown>... so inside the block it's (correctly) not recognizing it as being caused by an uncaught exception.

@Amanieu
Copy link
Member

Amanieu commented Jan 31, 2023

@bridiver Have you tried my suggested solution?

  • Here delete everything between Lcall_site_table_start and Lcall_site_table_end.
  • Here change the personality to ___gxx_personality_v0.

This should result in exactly the same behavior as the custom personality function while using one less total personality function in the program.

(Once this is confirmed to work, the assembly code can be further simplified, such as by removing the unnecessary catch clauses.)

@bridiver
Copy link

bridiver commented Jan 31, 2023

@bridiver Have you tried my suggested solution?

  • Here delete everything between Lcall_site_table_start and Lcall_site_table_end.
  • Here change the personality to ___gxx_personality_v0.

This should result in exactly the same behavior as the custom personality function while using one less total personality function in the program.

(Once this is confirmed to work, the assembly code can be further simplified, such as by removing the unnecessary catch clauses.)

That doesn't work, but I should have followed up because my description of the correct behavior was actually incorrect. I created a small test exe that contains only libbase.a (which contains the chromium personality function) and libbrave_rust.a and this is the output of triggering an objc exception inside CallWithEHFrame that is handled by the chromium personality function (libbase.a listed first).

2023-01-27 07:39:37.333 a.out[55342:3250784] *** Terminating app due to uncaught exception 'FileNotFoundException', reason: 'File Not Found on System'
*** First throw call stack:
(
        0   CoreFoundation                      0x00007ff8136ab3eb __exceptionPreprocess + 242
        1   libobjc.A.dylib                     0x00007ff8131f7e25 objc_exception_throw + 48
        2   a.out                               0x000000010bdf7941 __main_block_invoke + 49
        3   a.out                               0x000000010bdf79c2 _ZN4base3mac15CallWithEHFrameEU13block_pointerFvvE + 10
        4   a.out                               0x000000010bdf7900 main + 16
        5   dyld                                0x00007ff813224310 start + 2432
)
libc++abi: terminating with uncaught exception of type NSException

With your proposed change what I get in Brave is

Received signal 4 <unknown> 0001299fbe04
 [0x0001299ee122]
 [0x000129926023]
 [0x0001299ee071]
 [0x7ff813581c1d]
 [0x011000224900]
 [0x0001293fb63e]
 [0x000124e605f8]
 [0x000129394437]
 [0x00012939418f]
 [0x0001293929a2]
 [0x000129392bb0]
 [0x000124e5d66e]
 [0x00010bb5187e]
 [0x7ff813224310]
[end of stack trace]
[0131/090158.118930:WARNING:crash_report_exception_handler.cc(235)] UniversalExceptionRaise: (os/kern) failure (5)
zsh: illegal hardware instruction  

which is the same thing I am currently getting in Brave with the change I described because there is still some dependency issue that is causing it to search libbrave_rust.a before libbase.a and is also similar to what I get in my test exe if I list libbrave_rust.a before libbase.a.

Things get very complicated when building all of chromium + Brave so I'm starting from something simple and slowly adding dependencies to try to figure out what is causing the difference between my small test app and Brave.

@bridiver
Copy link

bridiver commented Feb 4, 2023

I am not having any luck getting this to work. None of the above suggestions work and in general the inability to build a rust static library without rust_eh_personality is becoming really problematic for us.

@BrendanEich
Copy link

Cc'ing @pcwalton for help -- we use Rust in Brave's Chromium fork. IIUC, using C, Objective-C, and Rust in the same exe uses up all three personalities for compact unwind, and naively we need a fourth. Thanks.

@bridiver
Copy link

bridiver commented Feb 7, 2023

The personality function is always needed if anything is compiled with -Cpanic=unwind (which the standard library is for most targets) or extern "C-unwind" is used1 as it is referenced by the unwind tables to support unwinding. Omitting it for static libraries will cause a linker error. In the past I believe LTO was allowed to strip out the personality function in case of -Cpanic=abort, but this is wrong when using extern "C-unwind" as we guarantee that unwinding into Rust code from extern "C-unwind" will abort when using -Cpanic=abort.

(By the way the specific function you are pointing too is only used for Emscripten. The personality function for most targets is at https://github.com/rust-lang/rust/blob/a6269dad38e6ede0013ba3688099544833933c63/library/std/src/personality/gcc.rs#LL244)

Footnotes

  1. In the future it will even be used for extern "C" to ensure unwinding into rust code will abort when using -Cpanic=abort. This is a mitigation against UB.

@bjorn3 I tried build-std with panic=abort, but the "in the future" change is not in the future, that is exactly the change that broke things for us and rust_eh_personality is still exported 5ff0876

Instead of the original stub implementation, could this be implemented as a weak_import and some option not to include the symbol in the output? Maybe something like panic=abort_undefined or some other option?

@bridiver
Copy link

bridiver commented Feb 7, 2023

I found a workaround for now which is to use build-std=panic_abort,std (panic_abort may not be necessary) and patch gcc.rs in the downloaded source. It's not ideal, but it seems to be the least bad option atm. My current fix just leaves out this entire section 5ff0876#diff-fd385628ee6c18612493cf4e3280d7546e26dee8ebedad3bc896f5b4b534b252R83 for macos, but not sure if that leaves me with undefined behavior or not for rust crashes. Alternatively I could keep rust_eh_personality and implement rust_eh_personality_impl on the chromium side to forward to the chromium function.

@pcwalton
Copy link
Contributor

pcwalton commented Feb 7, 2023

That sounds like the right solution for now. I don't think that Rust aborts would have UB—they would kill the process safely.

(I don't have much to add to this issue beyond what's already been discussed, fwiw)

@bridiver
Copy link

bridiver commented Feb 7, 2023

I don't think that Rust aborts would have UB—they would kill the process safely.

@pcwalton are you sure about that? It sounds like the entire reason for making this change to move it into std and always include it is because it did have UB at least in some cases? In any case I got it working by leaving rust_eh_personality in place and doing a weak import on rust_eh_personality_impl which I defined in chromium to call base::mac::CxxPersonalityRoutine and it's calling the correct handler for objc exceptions

@bridiver
Copy link

bridiver commented Feb 9, 2023

So based on what I have working, I think this could be very easily implemented by moving the implementation of rust_eh_personality_impl into a separate rlib that gets conditionally included based on some flag passed to cargo. If the rlib is excluded then it's up to the embedding app to provide an implementation or they'll get an undefined symbol

@XrXr
Copy link
Contributor

XrXr commented Feb 14, 2023

I have a guess for a solution. There is -C llvm-args=--emit-dwarf-unwind=always with newer LLVM 15 Rust. If you compile all Rust code with that option (-Z build-std, too) and then use strip --remove-section to yank all the compact unwind sections in the Rust output objects, I think that makes it so that rust_eh_personality is not a compact unwind personality. That should free up a spot.

I don't see an option to tell LLVM to not emit compact unwind in the first place. If it exists, though, the post-processing with strip can be avoided too.

The unwinder should be able to deal with DWARF unwind, because even when compact unwind is enabled, it can't encode all situations and needs DWARF as a fallback anyways.

@jyknight
Copy link

Ultimately, this is an Apple ABI issue, but I don't know whether that'll ever be fixed. They only allow 3 personalities to be used in compact-unwind in a single shared-object. And the system already has 3 canonical personalities already (__gxx_personality_v0, __gcc_personality_v0, and __objc_personality_v0). So, anyone using a custom personality is already doing something really fragile.

Because of this, I'd like to propose a change to LLVM to only emit compact unwind when using one of those 3 canonical Apple-shipped personality functions.

That should fix this issue "once and for all" (for non-rust uses as well), but does mean Rust code would use DWARF unwind for all functions -- even if the program being linked doesn't happen to require all 3 of the "official" personalities. Which is a bit unfortunate, but presumably not terrible.

@danakj
Copy link
Contributor

danakj commented Mar 14, 2023

This issue is now a blocker for Rust integration into Chromium mainline. We can try some of the workarounds here, and I am just starting to explore the space. Thanks for all the information here so far.

oontvoo added a commit to llvm/llvm-project that referenced this issue May 18, 2023
…onality symbols. For the rest, use DWARFs.

Details: rust-lang/rust#102754

The MachO format uses 2 bits to encode these personality funtions, with 0 reserved for "no-personality".
This means we can only have up to 3 personality. There are already three popular personalities:  __gxx_personality_v0, __gcc_personality_v0, and __objc_personality_v0.
As a result, any system that needs custom-personality will run into a problem.

This patch implemented jyknight's proposal to simply force DWARFs for all non-canonical personality functions.

Differential Revision: https://reviews.llvm.org/D144999
@oontvoo
Copy link

oontvoo commented Jun 7, 2023

The patch implementing jyknight's proposal (with slight modification*) has landed. We should be able to have more than 3 personalities symbols now.

(*) In stead of making __gxx_personality_v0, __gcc_personality_v0, and __objc_personality_v0 as "canonical", hence taking up all three compact-unwind encoding slots, the patch only considers __gxx_personality_v0 and __objc_personality_v0 to be canonical, leaving one "free" slot for custom personality symbol, if needed. The rationale for this is that __gcc_personality_v0 is kind of rare compared to C++ and ObjC.

From a user's perspective, this feature requires no tweaking of any new flag - should "just work".
However, if you have a custom personality function (eg., Rust's in this case) and you want it to use compact-unwind, you can request that via the -femit-compact-unwind-non-canonical=true flag.

@StarMerely
Copy link

@oontvoo Hello, when we introduce the rust library into the iOS project, there will be a compilation error. I see that the above problem may be the same, but I tried it and it didn't work.

@landaire
Copy link

landaire commented Sep 7, 2023

For anyone else coming to this issue when trying to build a project in Xcode, I believe it's fixed/mitigated in Xcode 15. Not sure if it's properly fixed in Xcode 15, but my project now builds when targeting aarch64-apple-ios-sim.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-linkage Area: linking into static, shared libraries and binaries C-bug Category: This is a bug. O-ios Operating system: iOS O-macos Operating system: macOS P-high High priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another. relnotes Marks issues that should be documented in the release notes of the next release. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests