-
Notifications
You must be signed in to change notification settings - Fork 258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Android NDK Clang produces 23% slower binaries than GCC #495
Comments
This isn't really that actionable without a repro case. 20%+ slower seems outside of the bounds of anything we have ever seen (in 7+ years of working on this), but it's hard to do more without further information. |
I understand. I can create application what you can test, it takes me 1-2 days. But I don't like make sources public. I can provide them via email to someone from team. Or is it enough to have apks compiled with gcc and with clang? |
Isn't |
Unfortunately I tried all these flags, but none brings back performance |
have you tried adding -Bsymbolic? that's one of the main differences between the default clang flags and the default [Android] GCC flags. |
nope, today is feast in our country, but will try tomorrow immedately at morning. Thank you very much for idea |
I created benchmark application. Please check readme.txt in root |
A couple of quick suggestions. I am not sure whether APP_CPPFLAGS (where you set -O3) gets cleared by |
Stephen, thank you very much for suggestions! I replaced APP_CPPFLAGS with LOCAL_CFLAGS. It was definitely good idea, seems binaries made by both compilers profits from this change and performs slightly better. Each performance gain is for me important. Unfortunately app compiled by gcc still performs much better :( |
You need to update every place after $(CLEAR_VARS) to add a LOCAL_CFLAGS with your optimization level to have it affect that compilation. Otherwise, I am pretty convinced at this point you are getting -O0 compilation for many of the sources files with Clang. |
Here is how I updated them |
Could you just upload that as a github project or at least a gist so we don't have to keep redownloading it? |
|
Which ABI is the one you're having issues with? |
armeabi-v7a. I don't own x86 based device, I can test it in emulator with haxm enabled, but I didn't. And old armeabi I have only due possible fallback, it is not very used. |
Please has someone idea what might be a problem? I like to switch to clang as soon as possible |
I'll do what I can to minimize a test case for the compiler folks. |
Thank you very much, do not hesitate contact me if you need something |
How do I build your project? |
Never mind, I reopened it and Studio stopped complaining. |
Confirmed a 12% perf regression for clang vs gcc on Pixel 2. Can you use simpleperf to try to reduce this? I didn't realize the "benchmark" was a full game... If you can get us something more concrete to work with (and something that builds out of the box) then we can take a look, but this beyond the scope of what we'll be able to look at any time soon I think. |
Hi Dan,
I never heard about simpleperf, but will try it next week and will post
result.
I already simplified complex emulator, now it starts automatically this
benchmark - game which counts how fast is host pc, in our case it means how
fast runs emulator
I am standard programmer (c, c++, java,..) I dont understan low level code.
I didnt make changes in code, but program compiled with clang runs slower.
I see this issue in competitive programs too - in programs which already
use clang. I hoped that ndk developers can compare *.o files, or maybe they
can run some "magic" profiler, sorry for my naivety..:)
…On Friday, October 13, 2017, Dan Albert ***@***.*** ***@***.***');>> wrote:
Confirmed a 12% perf regression for clang vs gcc on Pixel 2.
Can you use simpleperf
<https://developer.android.com/ndk/guides/simpleperf.html> to try to
reduce this? I didn't realize the "benchmark" was a full game... If you can
get us something more concrete to work with (and something that builds out
of the box) then we can take a look, but this beyond the scope of what
we'll be able to look at any time soon I think.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#495 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AZCVLSaZa2ZaNTekwMQucyHu20pxbAYiks5srpbAgaJpZM4PFZW->
.
|
I was thinking about it. Maybe I can give some hints and minimize the
problematic area. I believe it could be something with dynamic recompiler
to arm.
90% of all emulated power is about cpu emulation. Cpu emulation is really
sensitive, based on device cpu (device on which I run emulator).
Sources for cpu are located in jni/dosbox/src/cpu.
For arm are important files : cpu.cpp, core_dynrec.cpp
Then, is enabled dynamic recompiler to arm controlled with
DC_TARGETCPU=ARMV7LE.
If you search this flag then you can find appropriate plugin
risc_armv4le-o3.h
|
Might this this be related to #21? |
This is probably not related to #21. As I mentioned above, this project is a combination of many libraries, so it is very likely that there are other places where optimization flags are not being set or configured properly. I found one by quick inspection, and that already improved things. Diagnosing the rest of this will require someone to spend a lot more time with simpleperf and the entire build. I will note that this is not a high priority thing for my team to do, as our platform benchmarks don't show similar regressions (in performance or code size). |
2 things that should be changed in this project
|
Yes, armeabi is obsolete. arm64 dynamic recompiler is finished and will be added. I did not test it with clang and gcc yet. Maybe it will be better. However I report 32bit. Could there be differences in resulted binaries using cmake? |
|
BTW in 64bit mode you will get also advanced simd instructions but more important by default integer division is done with hardware while in armv7a idiv is done by software impl by default, as not all armv7a cpus has got not mandatory for armv7 cpu extnsions idiva idivt |
Thank you. That's reason why arm64 dynamic recompiler was made. However, I must still compile for 32 bit devices... |
The host spot is in very ineefficient method
every single write is done by
when i diged out what readb does so it looks like for clang E_exit does somthing cpu hungry (loging or what ever ) which is disabled on gcc release .. |
Hmm, yes, I know there are some gcc specific flags in code. I played with them, but without any bigger success. If I remember well, I removed line with E_Exit, but again without luck. I'll try it again, it's some time. Many thanks for your help. |
One note: ndk-build is not deprecated. It's every bit as supported as cmake is. For the most part, the two should behave the same, and where they don't you should feel free to report bugs. Work in NDK r19 to move logic out of the build systems and into the clang driver should also help here. |
@bruenor41: sorry, I'd missed this update:
I wonder if the flags aren't being added in the right order. You could add |
Hi people. After long time, I was able compile this benchmark for arm64-v8a, so I could compare performance against armeabi-v7a. I used ndk 15 and I compared clang and gcc binaries. And I was surprised! while arm7 binaries made by clang reports so big performance degradation against gcc, arm8 binaries gives almost same result for both compilers. Performance is slightly better with gcc binaries, around 3%. arm7 and arm8 comparison took so long time, because benchmark uses dynamic recompiler which was not made for arm8 before, now is. Edit : tested ndk 18b. Clang binaries gives same result for arm7 and arm8 like ndk 15, no regression, no improvement |
Reading back through this thread, it sounds like the differences discussed here are a result of 1. differences in inlining behavior and 2. intentional differences in behavior between clang and gcc within the project. The first is a trade-off where Clang has made a different decision than GCC, and the latter needs to be fixed in the project itself, not Clang.
If I'm understanding this correctly, it sounds very plausible that the project's recompiler has just been optimized thoroughly for 32-bit but not for 64-bit. Given that, I don't think there's any action to be taken here. |
No Dan. Previously I compared 32 bit binaries made by clang against 32 bit binaries made by gcc. Gcc performs much better in execution speed. I removed all gcc flags from project, but it did not help. Gcc still optimizes it better. Now I compared 64 bit binaries made by gcc against 64 bit binaries made by clang. Performance is roughly the same, why. In this case I reverted back sources with gcc flags, so they were included during test. |
I've spent the last few hours just trying to get your project to build. I finally succeeded, but it crashes when I start it. I stand by what I said above. The fact that you do see the regression with GCC in newer NDKs makes me very confident that this is not a Clang issue. If you really want to dig in to what changed between r16's and r17's GCC, add APP_CFLAGS := -v
APP_LDFLAGS := -v to your Application.mk files and rebuild. gcc will show the flags that it is using to invoke the compiler, the assembler, the linker, etc. Look for changes in the flags used between your old GCC build and a new one. Odds are that will have your answer. This is what I was trying to do while trying to build your project, but apparently one successful build is all I get. After cleaning the project, the Java code doesn't even compile. |
Thank you very much for your digging, I really appreciate it. My problem is not that gcc has been changed between ndk 16 and ndk 17 and that it's getting yet worse. My problem is that I need to switch to clang and project compiled by clang makes much slower binaries since beginning - in my case beginning means ndk r10e, I did not test lower versions. Edit : As I said, only 32 bit version is problem Please, I am not sure why you have problems compile it, but I can look it this in in next hour. I will be very happy for any clue. Edit : |
Updating studio and Gradle was the first thing I had to do, but I also had to make changes to get things working with |
Ah I know now, sorry. I was releasing new version and I did not realize that my app uses still ndk r10e. The same for github version of benchmark. I updated to newer ndk only on local repository, I will fix that Edit : fixed. Tested with ndk 15c and 18b. With ndk 15c you can compare differences between gcc and clang. Binaries made by clang in ndk 18b are slightly faster then made by clang in ndk 15c, but still behind binaries made by gcc in ndk 15c and lower |
Hi Dan, I updated project, you shouldn't have problems now. Please, I will be thankful for any kind of help. I spent a lot of time on debugging and switching between clang and gcc flags or removing them. I do not know on what I can focus now, except debugging assembler what is behind my knowledge... |
Thanks. It won't be taking priority over other work but I'll keep in my list of Friday projects. |
Yes, I understand, no hurry. Many thanks |
Is there any conclusion in this topic? . |
Hi, we also face the same issue after switching from gcc to clang. Old framerate of the game was 55 FPS. After passing to clang, the framerate dropped to 22 FPS on the same device. Over 50% performance loss! (We build against ndk-r19c using -Ofast3 and all other additional optimization flags.) Since clang and 64 bits compiling is mandatory, this issue has become urgent for my team! Would be great if any progress. |
I like to help, but i did not fix it. I did what i could to solve this
issue, but i did not succeed and there was not enough interest on ndk
developers side. Maybe is not problem in clang, maybe is it something in my code, but i removed all gcc specific stuff and it did not help, i dont know
…On Thursday, August 22, 2019, Zaccur ***@***.***> wrote:
Is there any conclusion in this topic? .
We face the same issue when switched all our c++ libraries from gcc to
clang.
Performance decreasing also occured.
OpenCV, and our source code all compiled with NDK 17 were running much
slower.
can anyone update if they succeed resolve this issue? Thanks.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#495>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGIJKLJMMC4KR33OL74VBH3QFYZXHANCNFSM4DYVSW7A>
.
|
Hey, we found the downgrade of the performance seems introduced by openmp.
Makefile: test_gcc_omp:main.cc test_clang_omp:main.cc test_gcc:main.cc test_clang:main.cc test script: echo "---------------------------------" echo "test clang version:" echo "---------------------------------" echo "test gcc omp version:" echo "---------------------------------" echo "test clang omp version:" The result seems as below for several running of the test (test in linux docker without NDK) real 0m0.236s
|
Hi, I was following this thread now for many years in hope for a solution, we have the exact same experience as many here. Code compiled with Clang with omp for Android does result in really bad performance. On other platforms we cannot see these issues. Does anyone had success in finding a solution/reason/workaround? Would it be possible for the NDK devs to maybe try to have a look again? |
Android developers just don't care at all. Forget it is closed because it will never be fixed. Buy new devices. Old don't bring money |
Check first the source code of bottleneck that has significant difference, then check in that code if there are specific macro definitions for gcc that causes to generate extraordinary optimized code for example for SIMD. |
This test doesn't make any sense because:
Here is exact comparision with using result of computation and optimisation |
Is openmp used in the code, we figured out there are some
We finally found there has something to do with the OpenMP configuration, we made the following changes in code and seems the clang and gcc compiled binaries are finally at the close performance level. KMP_BLOCKTIME=0 |
Thank you @joeshow79 for the hint! This actually improves the situation a lot! No more strange thread blocking. Just as a remark, on certain phones (e.g. Honor View 20) there seems to be an OpenMP version that listens to the GNU environment variables, I had to set GOMP_SPINCOUNT to 0 to get the same result. What we did: in Application onCreate() before loading any native libs:
What I noticed as well, now when giving the Clang-OpenMP issue a new try: on certain phones this was fixed after some Android update. The manufacturers were informed about this or noticed it themselves or whatever. Anyways, this is the solution for us, thanks again. |
Switching from android gcc to clang produces slower binaries.
Description
I am working on application what emulates old operating system from 90'. I build c++ binaries with android ndk r10e with gcc 4.8. But, because google drops support for gcc, I want switch to clang. After updating ndk to r15 and successful build, I ran benchmarks(in r15 is default compiler clang, instead gcc like it was in ndk r10e). Result is that emulated system runs 23% slower, even without benchmark it is very noticeable when playing more cpu demanding games.
I did none changes to Android.mk and I use -O3 for optimizations. I only updated ndk. Simply said, I only switch between toolchains with :
NDK_TOOLCHAIN_VERSION=4.9 or without it
In ndk r15 is still possible switch to gcc 4.9. I did it and I got back lost performance. And even more, seems gcc 4.9 makes slightly better optimizations then gcc 4.8.
From what I read on web I expected that clang will produce faster binaries or the same. And I didn't find special magic clang flags what I must enable for Android.mk.
For a test I returned back to ndk r10e and switched to clang 3.8. Compiled application is again around 20%+ slower. Seems it is not related to ndk r15, but is there long time.
Environment Details
In my application.mk I use this setup
APP_ABI := armeabi armeabi-v7a x86
APP_OPTIM := release
APP_STL := stlport_static
APP_PLATFORM := android-8
LOCAL_LDLIBS += -lz
APP_PLATFORM is set to 8 but I see in log that minimum was set to 14 automatically
Android.mk
APP_CPPFLAGS = -O3
LOCAL_ARM_MODE := arm
LOCAL_CFLAGS += -DHAVE_NEON=1
LOCAL_ARM_NEON := true
I tested on windows 10 and ubuntu 16.04, results are same
Tested on device with android 6.0 and 7.0
The text was updated successfully, but these errors were encountered: