-
Notifications
You must be signed in to change notification settings - Fork 273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement all ARM NEON intrinsics #148
Comments
This comment has been minimized.
This comment has been minimized.
Is there a blocker for these, or is it just finding time to do it? I'd like to help, but I'd need a more experienced compiler/SIMD person to point me in the right direction. |
I can mentor. Start by taking a look at some of the intrinsics in the |
Is there some upstream source that these all get copied from, or are they actually written by hand? |
I am not sure I understand the question ? The |
Sorry yeah that was unclear. The other thing I wanted to ask was, what's
the upstream Source Of Truth for defining these functions?
…On Fri, Nov 16, 2018, 11:57 AM gnzlbg ***@***.*** wrote:
I am not sure I understand the question ? The neon modules in this
repository are written by hand, although @Amanieu
<https://github.com/Amanieu> has expressed interest into generating some
parts of them automatically.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#148 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA0jBJQbXX6E4XFv28wWNuDfvUYpY6mDks5uvu5ugaJpZM4QEQB6>
.
|
Ah, I see, that would be the ARM NEON spec: https://developer.arm.com/technologies/neon/intrinsics |
Now might be a great time to help make some more progress on this! We've got tons of intrinsics already implemented (thanks @gnzlbg!), and I've just implemented automatic verification of all added intrinsics, so we know if they're added they've got the correct signature at least! I've updated the OP of this issue with more detailed instructions about how to bind NEON intrinsics. Hopefully it's not too bad any more! We'll probably want to reorganize modules so they're a bit smaller and more manageable over time, but for now if anyone's interested to add more intrinsics and needs some help let me know! |
I have a proposal for this: using a macro to make definitions one-line e.g.: neon_op!(binary vadd_s8 : int8x8_t == simd_add, assert vadd / add, doc "Vector add");
neon_op!(binary vaddl_s8 : int8x8_t -> int16x8_t == simd_add, assert vaddl / saddl, doc "Vector long add");
neon_op!(unary vmovn_s16 : int16x8_t -> int8x8_t == simd_cast, assert vmovn / xtn, doc "Vector narrow integer"); This will make adding new ones easier (scrolling through a bolierplate-filled file just feels awful), and I'll add a lot more macro definition I currently havemacro_rules! neon_op {
(binary $name:ident : $type:ident == $op:ident, assert $instr32:ident / $instr64:ident, doc $doc:literal) => {
#[inline]
#[doc = $doc]
#[target_feature(enable = "neon")]
#[cfg_attr(target_arch = "arm", target_feature(enable = "v7"))]
#[cfg_attr(all(test, target_arch = "arm"), assert_instr($instr32))]
#[cfg_attr(all(test, target_arch = "aarch64"), assert_instr($instr64))]
pub unsafe fn $name(a: $type, b: $type) -> $type {
$op(a, b)
}
};
(binary $name:ident : $type:ident -> $result_type:ident == $op:ident, assert $instr32:ident / $instr64:ident, doc $doc:literal) => {
#[inline]
#[doc = $doc]
#[target_feature(enable = "neon")]
#[cfg_attr(target_arch = "arm", target_feature(enable = "v7"))]
#[cfg_attr(all(test, target_arch = "arm"), assert_instr($instr32))]
#[cfg_attr(all(test, target_arch = "aarch64"), assert_instr($instr64))]
pub unsafe fn $name(a: $type, b: $type) -> $result_type {
let a: $result_type = simd_cast(a);
let b: $result_type = simd_cast(b);
$op(a, b)
}
};
(unary $name:ident : $type:ident -> $result_type:ident == $op:ident, assert $instr32:ident / $instr64:ident, doc $doc:literal) => {
#[inline]
#[doc = $doc]
#[target_feature(enable = "neon")]
#[cfg_attr(target_arch = "arm", target_feature(enable = "v7"))]
#[cfg_attr(all(test, target_arch = "arm"), assert_instr($instr32))]
#[cfg_attr(all(test, target_arch = "aarch64"), assert_instr($instr64))]
pub unsafe fn $name(a: $type) -> $result_type {
$op(a)
}
};
} |
For the definitions, I think that using macros is ok. I am not sure I follow how does macros generate run-time tests for the intrinsics, that's usually the bulk of the work. |
What is the reasoning behind some intrinsics linking in the LLVM intrinsic directly while others are using the generic For example: stdarch/crates/core_arch/src/arm/neon/generated.rs Lines 1416 to 1430 in a371069
Versus: stdarch/crates/core_arch/src/aarch64/neon/generated.rs Lines 12 to 18 in a371069
Given the sheer volume of neon intrinsics, it seems rather daunting to implement them all by hand using the guide in the first post. I'm wondering if there's a deterministic data driven way to generate all of them using |
Not all intrinsics have a corresponding
Please don't. The |
@aloucks most of the intrinsics (AFAIK) have been added piecemeal over time, so it's sort of expected that they're not 100% consistent. Otherwise though I'd imagine that whatever works best would be fine to add to this repository. Auto-generation sounds pretty reasonable to me, and for an implementation we strive to match what Clang does in its implementation of these intrinsics. |
Also, to be clear, this library is not designed for ease of implementation in alternate codegen backends. The purpose of this crate is to get the LLVM backend up and running with SIMD. Discussions and design constraints for alternate backends should be discussed in a separate issue. |
Hey all, some friends and I have made a google sheet of all the Neon intrinsics, their inputs, output, and the ARM summary comment. There could easily have been errors when copying around and manipulating thousands of entries of text, but I think that it's got all the bugs sorted out. If you want to try some auto-generation, this is a good place to start. There's even a column where I've marked what we have in nightly so far, so if you just auto-gen all the functions that aren't checked you shouldn't hit any duplicate definitions. I hope to find the time to actually contribute some functions, but for now this will have to do. EDIT: also I just subscribed to the entire repo, so if there's any PRs that add more functions I'll try to check those boxes on the sheet and keep it up to date. |
Working with @Shnatsel, I described the "godbolt process" and they were kind enough to make it a bash script that you can run locally #!/bin/bash
set -e
INTRINSIC_NAME="$1"
TEMP_DIR="$(mktemp -d)"
cleanup() {
rm -r "$TEMP_DIR"
}
trap cleanup EXIT
(
cd "$TEMP_DIR"
echo "#include <arm_neon.h>
int test() {
return (int) $INTRINSIC_NAME;
}" > ./in.c
clang -emit-llvm -O2 -S -target armv7-unknown-linux-gnueabihf -g0 in.c
ARM_NAME=$(grep --only-matching '@llvm.arm.neon.[A-Za-z0-9.]\+' ./*.ll | tr -d '@' | head -n 1)
clang -emit-llvm -O2 -S -target aarch64-unknown-linux-gnu -g0 in.c
AARCH64_NAME=$(grep --only-matching '@llvm.aarch64.neon.[A-Za-z0-9.]\+' ./*.ll | tr -d '@' | head -n 1)
echo "$INTRINSIC_NAME, $ARM_NAME, $AARCH64_NAME"
) You will probably need the Note that many functions don't have an associated llvm intrinsic that can be as easily scrapped out this way, but maybe 1/4th or so of them do. |
@Lokathor Several instructions have been added recently: vaddhn, vbic, vorn, vceqz, vtst, vabd, vaba. Though some of them are not fully supported( like vceqzd). If you don’t have time to maintain this google sheet, I think I can help |
Awesome, looking forward to this! |
Any updates after a long time...? Thanks |
If you look at the pull request list you can see that there has been activity on this quite recently. For example #1224 was opened yesterday. |
@bjorn3 Thanks! Indeed I mostly want to know when can we see it in stable version. |
@SparrowLii You marked the following instructions as completed (same for min): It doesn't seem like those instructions are actually part of your recent PR (nor were they on the master branch before that) so I unmarked them again. |
@CryZe They can be found in the master branch now: |
Welp, I'll mark them again then. Somehow the GitHub Pull Request UI doesn't show them as diffs at all: https://i.imgur.com/BsHR5in.gif |
Github’s comparison tool will always have problems when changing a large amount of code XD |
As in #1230, except for the following instructions and those use 16-bit floating-point, other instructions have been implemented:
|
On LLVM's ARM backend,
Already discussed in rust-lang/rust#90079.
Use
You need to make you test function
These all seem to exist in LLVM at least for AArch64. For ARM we can just leave these out for now. |
Hope someone can help implement the remaining instructions. |
@Amanieu |
@SparrowLii Shouldn't that work with the |
Looks useful: https://rust.godbolt.org/z/894W8cndG |
LLVM only supports |
Steps for implementing an intrinsic:
coresimd/arm/neon.rs
andcoresimd/aarch64/neon.rs
coresimd/arm/neon.rs
. If it's different place it in both with appropriate#[cfg]
incoresimd/arm/neon.rs
. If it's only AArch64 place it incoresimd/aarch64/neon.rs
rustup run nightly sh ci/run-docker.sh aarch64-unknown-linux-gnu
.All unimplemented NEON intrinsics
The text was updated successfully, but these errors were encountered: