-
-
Notifications
You must be signed in to change notification settings - Fork 262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LDC 1.3 (1.2) ARMv5 issues #2058
Comments
Hmm, this may not be so bad if it's only an optimization issue. Can you supply a backtrace for the segfault? I suggest you try building and running the test runner at various optimization levels and see how many modules pass. |
The seg fault back trace looks like: core was generated by `/usr/bin/test'.
I had some trouble getting the whole testing app to run. I decided to add tests one by one (or group them). Another thorny issue is that I can't change the kernel or (easily) the libc version. For this reason core atomic fails as I get a "A newer kernel is required to run this binary. (__kernel_cmpxchg64 helper)" message when trying to run the atomic test. But a good chunk of druntime tests are running, which is good news. I will investigate more as time allows and keep you posted. |
You should be able to disable the few relevant core.atomic tests by setting this enum to false and hope that there are no other parts requiring 64-bit atomic ops. ;) |
Oh and as the segfault seems to occur when constructing the stacktrace for the exception msg, you should be able to skip the tracing for now by setting the runtime traceHandler to null. |
@kinke Setting the atomic enum doesn't solve the problem. I did that first time. Don't know why, so I just removed the atomic test altogether. I managed to make a full blown tester app and I observed this:
The segfault stack trace looks like:
Which is strange |
The exception I got in the array module was most-likely there because I was using "-static" and that messed up something (different glibc version). Here are the results for druntime:
|
And this is the most I got from a full druntime and phobos test before running out of memory:
|
Looks pretty good. You can start the tester up with a list of just the remaining phobos tests and try those too, right? Update: Are these built without optimization, ie |
Yep. Using There are some issues on some modules, as the test shows, but they appear to be isolated. |
Here is the list of the test that run in release mode:
|
As you can see I got the atomic test included, by patching core.atomic as follows:
|
Nice work, only 3-4 druntime modules failing is a good sign. Now the job is to figure out the remaining issues on ARMv5 by investigating the failing tests. |
Well, there is the original issue with exception handling that I kinda worked around to get this far. I did a test and looks like that you only need to compile Any idea on how to attack the EH issue? |
We had a similar EH issue with ARMv7 failing at higher optimization levels a couple years ago, which Dan fixed last year, ldc-developers/druntime#51. Maybe it's still causing problems on ARMv5, or maybe there's an issue elsewhere in the EH code this time. Only way to find out is to carefully step through in a debugger and see what's going wrong. What makes it much easier for you is you can compile that module without optimizations, see what's happening in the debugger, then compare to the same module with optimizations, ie only re-compile and link the single module. That will give you an idea of how the two diverge. |
Thanks @joakim-noah for the links. Reading through them I suspected that something funky is still happening with I did a test with building with Here is compare dump for both optimizations on and off:
Looks that the alignment is still causing issues, check http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka15414.html section "Code Generation" |
LL-wise, all relevant stuff the optimized version does differently is
If case 1 is the issue, that's most likely an LLVM codegen bug. Case 2 would be invalid code I think (not-word-aligned storage for the pointer passed by ref?!). Checking the registers while debugging should show what the problem here is. |
Here is a registry dump when debugging the optimized function
|
Alright, it's case 1, and I think it's an LLVM ARM codegen bug. It ignores the So either the previous LLVM optimization wrt. replacing alloca+memcpy by an unaligned load is problematic for ARM, or it's an ARM codegen bug which can't simply ignore an |
Awesome @kinke! On a positive note, I was able to run a pretty sofisticated vibe.d app on my controller (with the tweaks for eh and for some missing functions on my glibc version). So great job guys, looks like we can run D apps on industrial controllers! |
Good news, thanks! |
@rracariu: Great news – Many thanks for your perseverance! |
Yep, and IMO the best way to start is to group your changes to logical commits and choose an appropriate pull-request target. |
I can report an llvm bug, but are we sure what the problem is? If it's another optimization improperly replacing alloca+memcpy that could be the real issue, we should narrow that down first before reporting. |
I'm 99.9% sure emitting a load instruction requiring 4-bytes alignment for an IR load with explicit From Radu's link:
|
OK, I will report the bug. @rracariu, which llvm version did you build ldc against? Update: reported here, let me know if i got any of the details wrong. |
Here is what I used:
Thanks @joakim-noah ! |
Thanks Joakim, looks really good. |
@rracariu, can you build ldc against llvm 4.0 or trunk and see if it has been fixed? 3.9.1 isn't the latest. |
There's been a reply regarding the LLVM issue. Apparently adding _D7current11udata4_readFKPhZk:
.fnstart
.save {r11, lr}
push {r11, lr}
ldr r1, [r0]
ldrb r12, [r1]
ldrb lr, [r1, #1]
ldrb r2, [r1, #2]
ldrb r3, [r1, #3]
add r1, r1, #4
str r1, [r0]
orr r0, r2, r3, lsl #8
orr r1, r12, lr, lsl #8
orr r0, r1, r0, lsl #16
pop {r11, pc} |
I can confirm that it works with Also, looks that the segfault when throwing the exception is no logger there. I thing the align attribute fixes a lot of other issues I encountered. I will make some time soon to rebuild the tests with the new option and see how it looks. |
Shouldn't we (for now) just simply always set the LLVM flag |
I haven't messed with the |
I think it is a fast solution to add |
OK, I was trying to understand where exactly the problem comes from. I'm not sure a fast solution does much, not like we have many ldc users clamoring to use ARMv5 chips. :) Better to just get it into llvm. Does anybody here have commit access to llvm? I think Amaury does, so we could always ask him, if nobody here does. |
I have LLVM commit access. Most important is to get the ok in the review. In general, you can ask one of the reviewers to commit, too. |
The LLVM mailing lists are usually pretty fast. I fixed a bug (2 line change uncontroversial fixed crash) and it was merged in less than a week. I think the main problem of relying on am LLVM fix it the time it will take to get into versions of LLVM we support (let alone release with). We should probably submit a patch anyway though. |
I guess this issue can be closed, the original problem reported is fixable by using I moved to other HW configurations since then, so this particular configuration is not relevant for me, however, this particular processor Also, for reference, here's a Go thread on why they still support it golang/go#17082 |
OK, I don't think anybody else is using this anyway. |
This is a continuation of #2024 (comment)
The issues so far observed are related to exception handling when compiled with optimizations turned on.
For example, compiling a simple test:
ldc druntime and phobos compiled with -O2 or -O3
= Throwing new exception of type object.Exception: 0x402ff280 (struct at 0x4008c4ac, classinfo at 0x61980, 1 structs in flight)
Fatal error in EH code: _Unwind_RaiseException failed with reason code: 9
ldc druntime and phobos compiled with -O0 (No optimization) seems to work:
= Throwing new exception of type object.Exception: 0x403be280 (struct at 0x400704ac, classinfo at 0x87170, 1 structs in flight)
[email protected](23)
----------------Segmentation fault (core dumped)
When compiling with optimizations turned on the landing pads are not found by the unwind function.
The segmentation fault is reported as a possible stack corruption by gdb (could be alignment issues, see bellow).
One problem I notices is that the ARM ehabi (http://infocenter.arm.com/help/topic/com.arm.doc.ihi0038b/IHI0038B_ehabi.pdf) suggest that _Unwind_Control_Block struct should be aligned to 8 byte boundary. Playing with that didn't do any good.
The text was updated successfully, but these errors were encountered: