ARM EABI: Exception handling broken #489

dnadlinger · 2013-10-06T19:13:50Z

LDC now compiles a Phobos "Hello World", but any attempts of throwing an exception lead to _Unwind_RaiseException failing with error code 9 (generic error). We need to fix this ASAP as it also makes running the test suite really hard.

The text was updated successfully, but these errors were encountered:

dnadlinger · 2013-10-23T18:49:54Z

Turns out the libunwind prototypes are different on ARM: https://gist.github.com/klickverbot/7124007 (that patch is literally a snapshot from during a hacking session, we might want to properly separate out the implementations in the end). Still crashes, though, this time in _Unwind_GetLanguageSpecificData. The next step would be to dump the offending EH tables and figure out what is going wrong, I suppose.

dnadlinger · 2013-10-23T19:07:47Z

Oh, and I'd highly recommend everybody intending to work on this to get a debug version of the GCC libraries (package might be called something along the lines of gcc-libs), being able to step into the libunwind calls saved my sanity a few times.

dnadlinger · 2013-10-29T21:24:13Z

After having had a look at the G++ personality function, the code now (updated Gist) gets to the unwinding phase. The exception struct is propagated along correctly, but the catch classinfo lookup is still broken.

dnadlinger · 2014-01-07T03:41:56Z

Updated Gist again – now with working catch classinfo lookup. Simple EH examples work now, although the test suite reveals that there is still at least one big problem with the implementation that needs to be tracked down.

redstar · 2014-01-07T07:53:24Z

Floating point handling is currently broken. There is an assertion error in PortInitializer::PortInitializer while initializing Port::ldbl_nan. ARM uses only double and the provided pattern is therefore wrong. This causes a lot of failures.
I also get a linker error if I use -g, which causes all tests with debug information to fail.
(I have my new ARM board now ready and do a nightly LLVM and LDC build :-) )

dnadlinger · 2014-01-07T07:55:57Z

Yep, seems reasonable – I just used a release build for testing, as I was only hacking on the runtime anyway.

Can you acutually build the Phobos test in a reasonable amount of time? std.algorithm and std.datetime cause my ARM system to swap, even if the system itself, apart from LDC, only takes ~50 MB of RAM. -.-

redstar · 2014-01-07T08:05:41Z

I can time it. I run a script executed by cron and simply mail the output. But first the floating point failure must be fixed - only 17% of the tests passes right now (merge-2.064 branch).

redstar · 2014-01-09T06:02:52Z

The test suite runs roughly 16h. (Compile time for LDC is much less then an hour.) Only 68% pass.

dnadlinger · 2014-01-09T09:39:28Z

Eww, that's awfully long. Is your system swapping like crazy or something like that? From my vague recollection of the specs of your system and the performance on my ODROID-X, I would have maybe guessed a few hours or so, but definitely not 16!

redstar · 2014-01-09T09:54:54Z

I have to check. Previously, only 17% of the tests passed. But runtime of the test suite was only 20min.

The system has 2GB memory and no swap space.

redstar · 2014-01-20T12:02:41Z

I applied the gist patch yesterday. The tests seem to look better. I have the result hopefully this evening. (I currently think that the long runtime is caused by power management. I have to investigate this closer...)

I also identified the problem why applications with debug info do not link. (http://llvm.org/bugs/show_bug.cgi?id=18554) I already committed the first part of fixing this.

dnadlinger · 2014-01-20T19:18:10Z

@redstar: Yes, without the patch pretty much every testcase containing EH at all (also e.g. assertThrown) fails, whereas the patched personality function at least lets us get the simpler cases right. The remaining problems are still responsible for a fair share of the test suite failures, but I don't really have time to look into that right now.

andralex · 2014-02-12T06:02:06Z

$250 bounty placed: https://www.bountysource.com/issues/1132204-arm-exception-handling-broken

Includes some of the EH changes for ARM from issue #489. Adds thread support for MIPS64.

Correct use of Linux version identifier in epoll interface

This is a copy of the easy part of issue #489.

See issue ldc-developers#489.

joakim-noah · 2015-07-16T16:21:11Z

Could the remaining EH test failures be related to function inlining? I turned off function inlining on Android/ARM and got a bunch more EH-related tests to pass. Could one of you verify this on linux/ARM?

joakim-noah · 2015-09-27T13:57:37Z

I looked into this recently, especially the ARM load/store optimization pass I mentioned in the above linked thread. That backend pass would often cause a segfault in eh_personality_common right before get_uleb128 is called, because of the assignments of block_size and landing_pad right before.

Since callsite_walker is a byte array, it returns addresses that are not necessarily word-aligned. That works fine when that ARM optimization pass isn't run and they're converted to LDR instructions, but the optimization pass tries to turn the two assignments into a single LDMIB, which will fail if the callsite_walker address happens not to be aligned to a 4-byte word. I'm guessing this information about word alignment is lost to the backend because of the *cast(uint*) employed for all the assignments.

Turning off that ARM optimization pass fixed the EH problem up till ldc commit e9e8e4654d5 from a couple months ago, but it appears the EH rewrites since then have broken something else on ARM also, which I haven't tracked down. As noted in the above thread, inlining is not really the issue: turning on the debug statements and turning off inlining merely tickled the IR enough that this optimization wasn't being done, ie it was a hack that happened to work.

joakim-noah · 2015-10-09T09:55:23Z

I looked into what else was causing problems with EH on ARM since ldc commit e9e8e4654d5, turns out it was this check skipCatchComparison() before comparing exception types, that was since added. If I comment out that check and compile libunwind.d with the above ARM optimization pass turned off, ARM EH starts working for me on ldc master. The problem appears to be that HANDLER_FRAME is never set in the ARM _d_eh_personality function because ucb.barrier_cache.sp always returns zero. Not sure why as I didn't look further, but I did notice that some of the types are off, ie _uw is aliased to ptrdiff_t but is uint in the C headers, could be something like that.

The new test for exception chaining across fiber switches asserts, but other than that everything works on Android/ARM. Let me know if it's the same on linux/ARM.

joakim-noah · 2016-01-07T02:10:28Z

I'm still seeing the other issue with the check before comparing exception types mentioned in my last comment, @smolt, that one not hitting you?

smolt · 2016-01-07T04:56:23Z

No, that one is not hitting, but watchOS does not use ARM_EABI_UNWINDER path either. And I am not familiar with that path. I have a Raspberry Pi laying around that I should try it on.

I do have one EH catch problem remaining that I wrote up in the "Watch Out" forum announcement: the stack pointer not corrected in landing pads after a function call with stack adjustments. It appears to be an LLVM problem and I have my workaround in forked LLVM to restore stack from frame pointer. I was going to eventually write that up as a separate issue once I was sure it wasn't watchOS specific as there is no watchOS code in LDC yet. It ain't a real issue if no one can exercise it.

dnadlinger · 2016-01-07T15:56:40Z

I didn't mean to close this issue, apparently the words "resolve: ldc-developers/druntime#51" in the PR text were enough for GitHub to auto-close this.

I originally opened this issue working on arm-linux-gnueabihf, and I don't think anybody has fixed (or tested) that lately.

smolt · 2016-01-07T18:02:15Z

I think the original Pi I have is arm-linux-gnueabihf. I'll haven't played with it in a year but might be fun to cross-compile to it. I don't think it has the horse power to build everything. Last time I tried building a recent version of GDC on an older 800Mhz computer it took over 24 hours.

dnadlinger · 2016-01-07T19:14:01Z

I've built LDC on an ODROID-X (4x1.4 GHz Cortex-A9) in a couple of minutes. The biggest issue tends to be the slow I/O if you are using SD cards. For an RPi 1, I'd probably use a cross-compiler for my own sanity though.

smolt · 2016-01-31T22:12:04Z

Glad to say I have Pi with LDC master going.

I'm still seeing the other issue with the check before comparing exception types mentioned in my last comment, @smolt, that one not hitting you?

Yes - Pi has this issue too - as well as a bunch of others! As you reported, just commenting out the line:

        if (!nativeContext.skipCatchComparison())

seems to fix it. Now to figure out why.

smolt · 2016-02-01T04:07:15Z

Back to the puzzle. @joakim-noah as you noticed:

The problem appears to be that HANDLER_FRAME is never set in the ARM _d_eh_personality function because ucb.barrier_cache.sp always returns zero.

It seems it is the personality's responsibility to set sp and something like the following is missing from ldc implementation.

https://github.com/gcc-mirror/gcc/blob/bd3f0a53c07086e978ea4ec00e10a20e4a969c40/libobjc/exception.c#L467

I am trying to run entire test suite on Pi, see if it can do it now, but it is very S L O W. I like it.

smolt · 2016-02-01T16:35:07Z

Test suite run from Pi last night didn't do very well. I have a collection of changes, plus now a list of failures to sift through. I'll make a PR later this week for EH once I confirm it isn't part of the failures, and will also start a separate issue for getting ldc into shape on Raspberry Pi.

redstar · 2016-02-01T19:54:30Z

I reactivated my IFC6410. It is a Krait-based ARM board, using triple armv7a-hardfloat-linux-gnueabi.
The test suite is badly broken, see the output here: https://gist.github.com/redstar/f4fbc682e03471064fbe
Runtime of test suite is 2h30min.

smolt · 2016-02-01T20:51:05Z

Yeah, looks familiar. All the debug failures probably are due to the extra ldr instruction at -O0. I wrote something on that in the dlang LDC forum.

Sent from my iPhone

On Feb 1, 2016, at 11:57 AM, Kai Nacke [email protected] wrote:

I reactivated my IFC6410. It is a Krait-based ARM board, using triple armv7a-hardfloat-linux-gnueabi.
The test suite is badly broken, see the output here: https://gist.github.com/redstar/f4fbc682e03471064fbe
Runtime of test suite is 2h30min.

—
Reply to this email directly or view it on GitHub.

redstar · 2016-02-01T20:52:53Z

Yes, I read that. Good analysis.

smolt · 2016-02-02T18:04:54Z

Exceptions seem ok now. Should we start a separate issue or issues to work through remaining ARM gnueabi problems? Next thing I will work on is the C ABI. I have a more exhaustive test I developed for iOS that found many problems on Pi when calling or being called from C functions. In the end we should have an abi-arm.cpp and I'll put my additional tests in the test suite.

edit - not saying exception handling is totally done. Some of the remaining test suite or unittest failures may be related, just haven't looked close yet.

dnadlinger · 2016-02-02T20:18:57Z

In the end we should have an abi-arm.cpp

Definitely! abi.cpp is just intended as a no-op default implementation to use on a new platform until you discover which parts are broken. ;)

and I'll put my additional tests in the test suite.

Please do. Improved C ABI coverage is definitely very helpful for bringing up LDC on a new platform.

smolt · 2016-02-04T04:33:22Z

Following the AAPCS made a huge difference! I already had something close for watchOS which is AAPCS-16. I am working on cleaning it up for a proper PR, hope to have it soon. The hacked up stuff I ran tests on is in my smolt:pi play branch.

Look at the Total Test time. Yes, it really takes that long to run tests on a Pi, partially because it only has 512 MB and it often needs to use swap which I have setup on a USB 1 TB HD. Just linking phobos test runner takes 1.5 hours.

99% tests passed, 9 tests failed out of 673

Total Test time (real) = 67992.27 sec

The following tests FAILED:
    234 - std.exception (OTHER_FAULT)
    242 - std.datetime (OTHER_FAULT)
    263 - std.random (Failed)
    307 - std.internal.math.errorfunction (Failed)
    572 - std.datetime-debug (OTHER_FAULT)
    593 - std.random-debug (Failed)
    670 - dmd-testsuite-debug (Failed)
    672 - dmd-testsuite (Failed)
    673 - llvm-ir-testsuite (Failed)

joakim-noah · 2016-02-04T04:59:24Z

Following the AAPCS made a huge difference! I already had something close for watchOS which is AAPCS-16. I am working on cleaning it up for a proper PR, hope to have it soon. The hacked up stuff I ran tests on is in my smolt:pi play branch.

Great. :)

Look at the Total Test time. Yes, it really takes that long to run tests on a Pi, partially because it only has 512 MB and it often needs to use swap which I have setup on a USB 1 TB HD. Just linking phobos test runner takes 1.5 hours.

Couldn't you cross-compile, as I did a couple summers ago, and cross-link, save some time that way? You'd lose some of the automation from the build scripts, but it's not hard to run the test runner yourself.

99% tests passed, 9 tests failed out of 673

Total Test time (real) = 67992.27 sec

The following tests FAILED:
    234 - std.exception (OTHER_FAULT)
    242 - std.datetime (OTHER_FAULT)
    263 - std.random (Failed)
    307 - std.internal.math.errorfunction (Failed)
    572 - std.datetime-debug (OTHER_FAULT)
    593 - std.random-debug (Failed)
    670 - dmd-testsuite-debug (Failed)
    672 - dmd-testsuite (Failed)
    673 - llvm-ir-testsuite (Failed)

Could the std.exception, std.random, and std.internal.math.errorfunction failures be related to the optimization issues I saw on Android too? Those flags worked around those issues for me.

smolt · 2016-02-04T05:25:43Z

Couldn't you cross-compile

Yeah, but in this case I get dmd test suite run too - which has 33 failures right now. It's not so bad, it only takes long to run and rebuild everything and then I am off doing other stuff. Compile/link/run for a single test program is on order of tens of seconds.

Could the std.exception, std.random, and std.internal.math.errorfunction failures be related to the optimization issues I saw on Android too? Those flags worked around those issues for me.

Might be. I'll try what you did first to see if problem is related. But then I'd like to find cause and fix that.

I am curious. Is Android following AAPCS? Did you use something beyond default abi.cpp? I don't recall you mentioning that, yet using the default abi.cpp and core.std.varags doesn't work well for anything I've done on ARM so far: iOS - APCS, watchOS - AAPCS-16, and now arm-linux-gnueabihf - AAPCS.

joakim-noah · 2016-02-04T05:56:20Z

Yeah, but in this case I get dmd test suite run too - which has 33 failures right now.

That's one advantage of natively compiling on Android/ARM too: I should be able to get the compiler testsuite running easily, though I haven't tried it yet.

It's not so bad, it only takes long to run and rebuild everything and then I am off doing other stuff. Compile/link/run for a single test program is on order of tens of seconds.

Heh, "tens of seconds." I'd have thought the USB HD would better that, but I guess the CPU in the RPi is pretty weak too.

I am curious. Is Android following AAPCS? Did you use something beyond default abi.cpp? I don't recall you mentioning that, yet using the default abi.cpp and core.std.varags doesn't work well for anything I've done on ARM so far: iOS - APCS, watchOS - AAPCS-16, and now arm-linux-gnueabihf - AAPCS

I don't know if it's AAPCS. :) I use the default abi.cpp with the struct passing change you suggested, that's it. There are definitely some issues with varargs, which I have noticed with small test programs I wrote, but not in the druntime/phobos tests. Maybe that's tested by the dmd test suite, which I haven't run on Android yet.

smolt · 2016-02-04T06:11:51Z

I don't know if it's AAPCS. :) I use the default abi.cpp with the struct passing change you suggested, that's it.

Oh, forgot about how using "byval" arg passing with ARM can produce bad instruction sequences. AAPCS does not use "byval", so that is probably why most of the tests passed. However, full AAPCS compatibility allows structs passing/returns to be compatible with C, and may fix Android vararg problem. There is some special treatment for 8-byte aligned types and a couple other things. I'll get it out in next few days so you can try before it is pulled.

joakim-noah · 2016-02-04T10:46:33Z

OK, I'll try it.

smolt · 2016-02-04T16:39:31Z

Although it is messy code, you could try this: 4e6778b together with ldc-developers/druntime#58 on Android. Or you could wait until I cleanup the code and make PR. It mixes iOS APCS and watchOS AAPCS-16 ABI code and has lots of vestigial parts. It does seem to work well with C ABI on ARM though.

dnadlinger · 2016-02-04T17:25:12Z

19 hours?! At that rate, wouldn't it even be faster to just run it in qemu?

smolt · 2016-02-04T19:56:47Z

On Feb 4, 2016, at 9:25 AM, David Nadlinger [email protected] wrote:

19 hours?! At that rate, wouldn't it even be faster to just run it in qemu?

Maybe, but long builds aren't a problem because I am away most of day. I spend a couple hours before sleep, kick off full build, then I am not back at until 22 hrs later.

joakim-noah · 2016-02-06T10:48:35Z

I tried out the commits you suggested. The good news is all druntime/phobos tests still pass on Android/ARM, the bad news is that I don't have my vararg test case anymore and can't seem to reproduce it. So hopefully you fixed the issue, but I won't know for a bit.

smolt · 2016-02-06T19:10:53Z

When I get new C ABI test in test suite you can try that. It has some of vararg tests.

smolt · 2016-02-27T23:13:29Z

Heads up that this may be closed soon. Just need ldc-developers/druntime#62 in AND someone besides me to verify it fixes all exception handling failures in the test-suite.

joakim-noah · 2016-03-04T15:53:04Z

Now that that druntime pull is in and partially verified, I think this can be closed.

There is a bounty on this issue: @smolt should get it, if it's still available on bountysource.

smolt · 2016-03-05T04:36:44Z

If there is a bounty, I only worked on the last little pieces. David wrote most of the code I think. If no wishes to claim it, I say donate it to the D Foundation.

joakim-noah · 2016-03-15T09:04:22Z

@klickverbot, you can claim the bounty and close this issue, since you wrote most of the EH code, or @andralex can transfer it to the D Foundation (or who knows? Maybe FB won't pay it out anymore and we can just close the issue 🎱).

smolt · 2016-03-25T15:17:30Z

I hope bounty discussion isn't keeping this open? I think we should advertise that exception handling is good by closing.

dnadlinger mentioned this issue Oct 6, 2013

Compiling for ARM #116

Closed

redstar added a commit that referenced this issue Jun 27, 2014

Add some ARM and MIPS64 druntime changes.

4bbeef4

Includes some of the EH changes for ARM from issue #489. Adds thread support for MIPS64.

redstar pushed a commit that referenced this issue Sep 27, 2014

Merge pull request #489 from goughy/master

887bd80

Correct use of Linux version identifier in epoll interface

redstar added a commit that referenced this issue Sep 27, 2014

Add some ARM eh related code.

dd03340

This is a copy of the easy part of issue #489.

redstar added a commit to redstar/ldc that referenced this issue Nov 5, 2014

Merge of the ARM EH work by David.

1f14957

See issue ldc-developers#489.

redstar mentioned this issue Nov 5, 2014

Merge of the ARM EH work by David. #777

Merged

kinke pushed a commit to kinke/ldc that referenced this issue Nov 9, 2014

Merge of the ARM EH work by David.

638de0e

See issue ldc-developers#489.

smolt mentioned this issue Jan 6, 2016

Handle unaligned callsite table in eh personality ldc-developers/druntime#51

Merged

dnadlinger closed this as completed in ldc-developers/druntime#51 Jan 6, 2016

dnadlinger reopened this Jan 7, 2016

dnadlinger changed the title ~~ARM: Exception handling broken~~ ARM EABI: Exception handling broken Jan 7, 2016

joakim-noah mentioned this issue Feb 1, 2016

Android tweaks ldc-developers/druntime#56

Merged

smolt mentioned this issue Feb 2, 2016

Cache ARM EABI exception data ldc-developers/druntime#57

Merged

smolt closed this as completed Mar 25, 2016

joakim-noah mentioned this issue Apr 18, 2017

LDC 1.3 (1.2) ARMv5 issues #2058

Closed

ARM EABI: Exception handling broken #489

ARM EABI: Exception handling broken #489

Comments

dnadlinger commented Oct 6, 2013

dnadlinger commented Oct 23, 2013

dnadlinger commented Oct 23, 2013

dnadlinger commented Oct 29, 2013

dnadlinger commented Jan 7, 2014

redstar commented Jan 7, 2014

dnadlinger commented Jan 7, 2014

redstar commented Jan 7, 2014

redstar commented Jan 9, 2014

dnadlinger commented Jan 9, 2014

redstar commented Jan 9, 2014

redstar commented Jan 20, 2014

dnadlinger commented Jan 20, 2014

andralex commented Feb 12, 2014

joakim-noah commented Jul 16, 2015

joakim-noah commented Sep 27, 2015

joakim-noah commented Oct 9, 2015

joakim-noah commented Jan 7, 2016

smolt commented Jan 7, 2016

dnadlinger commented Jan 7, 2016

smolt commented Jan 7, 2016

dnadlinger commented Jan 7, 2016

smolt commented Jan 31, 2016

smolt commented Feb 1, 2016

smolt commented Feb 1, 2016

redstar commented Feb 1, 2016

smolt commented Feb 1, 2016

redstar commented Feb 1, 2016

smolt commented Feb 2, 2016

dnadlinger commented Feb 2, 2016

smolt commented Feb 4, 2016

joakim-noah commented Feb 4, 2016

smolt commented Feb 4, 2016

joakim-noah commented Feb 4, 2016

smolt commented Feb 4, 2016

joakim-noah commented Feb 4, 2016

smolt commented Feb 4, 2016

dnadlinger commented Feb 4, 2016

smolt commented Feb 4, 2016

joakim-noah commented Feb 6, 2016

smolt commented Feb 6, 2016

smolt commented Feb 27, 2016

joakim-noah commented Mar 4, 2016

smolt commented Mar 5, 2016

joakim-noah commented Mar 15, 2016

smolt commented Mar 25, 2016