-
Notifications
You must be signed in to change notification settings - Fork 328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for simde #7118
base: dev
Are you sure you want to change the base?
Add support for simde #7118
Conversation
Would this also be applicable to macOS? I tried compiling with Apple Clang but get about a dozen of the following:
|
I have originally only tested with gcc on linux, both x86_64 and aarch64 Using clang18 on aarch64 linux I'm getting the same errors as you. I will need to investigate further |
gcc vs clang: https://godbolt.org/z/E31zfMMvj |
It looks like a fix for this on clang would be non-trivial. I'm not sure why gcc allows I will look at sleef for an alternate aarch64 native implementation of the required functions. If this is something that Rawtherapee would be interested using, I think the best approach would be to make this gcc only (for now). That would keep this PR simple, and a future PR could add clang support. Thoughts? |
Clang is following the NEON intrinsic definition, which is for passing an immediate, which means compile-time constant (not an integer). Each of those particular failing neon calling functions would, if I understand it correctly, require passing those rejected integer values as compile-time constants. cf |
The macOS system headers and gcc don't really get along any more. I tried with all the SDKs I could find for Apple Silicon but none will build out. In the recent past RT was buildable with gcc on mac, but the succession of major macos versions have outpaced the fine tuning required on apple's part to make the system headers compatible with gcc. |
it is only affecting two functions, it turns out that those two are already being called with compile-time constants, so as a quick hack I've redefined them as macros, and it now builds successfully for me on clang.
|
@Calandracas606 much appreciated. I will be resuming my Apple Silicon testing in mid-August. In the meantime it might be worthwhile to have the .github CI build an |
PR generated to add arm64 to macOS CI action: A successful arm64 macOS clang build with this patch: |
I noticed that the performance of rawtherapee on aarch64 was somewhat disappointing.
This is understandable, since it is highly optimized with SSE2 intrinsics.
I have attempted to improve the performance by using the simde library, which provides portable SIMD intrinsics, and translates them to native intrinsics for a different platform, in my case, arm NEON. This will allow other platforms to make use of the tremendous effort rawtherapee has put into writing highly optimized algorithms
I have observed > 2x performance improvements in some scenarios.
For example on my M2 Macbook Air, running linux, with a 50MP ILCE-1 raw image:
Admittedly, this patch is pretty hacky, and mostly just a proof of concept.
The replacement of
#ifdef __SSE2__
with#if defined(__SSE2__) || defined(RT_SIMDE)
is quite ham fisted, and should probably be replaced with a much more concise macro.If this is something that you are interested in, please let me know what changes need to be made to the PR (I'm sure there will be many)