Implement all ARM NEON intrinsics #148

gnzlbg · 2017-10-24T11:15:51Z

Steps for implementing an intrinsic:

Select an intrinsic below
Review coresimd/arm/neon.rs and coresimd/aarch64/neon.rs
Consult ARM official documentation about your intrinsic
Consult godbolt for how the intrinsic should be codegen'd, using clang as an example. Use the links below and replace the name of the intrinsic in the code with your intrinsic. Note that if ARM is an error then your intrinsic may be AArch64-only
- ARM
- AArch64
If the codegen is the same on ARM/AArch64, place the intrinsic in coresimd/arm/neon.rs. If it's different place it in both with appropriate #[cfg] in coresimd/arm/neon.rs. If it's only AArch64 place it in coresimd/aarch64/neon.rs
Write a test for your intrinsic at the bottom of the file as well
Test! Probably use rustup run nightly sh ci/run-docker.sh aarch64-unknown-linux-gnu.
When ready, send a PR!

The text was updated successfully, but these errors were encountered:

oconnor663 · 2018-11-15T21:51:47Z

Is there a blocker for these, or is it just finding time to do it? I'd like to help, but I'd need a more experienced compiler/SIMD person to point me in the right direction.

gnzlbg · 2018-11-15T22:25:21Z

I can mentor. Start by taking a look at some of the intrinsics in the coresimd/aarch64/neon.rs module :)

oconnor663 · 2018-11-16T16:48:24Z

Is there some upstream source that these all get copied from, or are they actually written by hand?

gnzlbg · 2018-11-16T16:57:16Z

I am not sure I understand the question ? The neon modules in this repository are written by hand, although @Amanieu has expressed interest into generating some parts of them automatically.

oconnor663 · 2018-11-16T17:45:57Z

Sorry yeah that was unclear. The other thing I wanted to ask was, what's the upstream Source Of Truth for defining these functions?

…

On Fri, Nov 16, 2018, 11:57 AM gnzlbg ***@***.*** wrote: I am not sure I understand the question ? The neon modules in this repository are written by hand, although @Amanieu <https://github.com/Amanieu> has expressed interest into generating some parts of them automatically. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#148 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA0jBJQbXX6E4XFv28wWNuDfvUYpY6mDks5uvu5ugaJpZM4QEQB6> .

gnzlbg · 2018-11-16T17:49:17Z

Ah, I see, that would be the ARM NEON spec: https://developer.arm.com/technologies/neon/intrinsics

alexcrichton · 2018-12-20T20:36:39Z

Now might be a great time to help make some more progress on this! We've got tons of intrinsics already implemented (thanks @gnzlbg!), and I've just implemented automatic verification of all added intrinsics, so we know if they're added they've got the correct signature at least!

I've updated the OP of this issue with more detailed instructions about how to bind NEON intrinsics. Hopefully it's not too bad any more!

We'll probably want to reorganize modules so they're a bit smaller and more manageable over time, but for now if anyone's interested to add more intrinsics and needs some help let me know!

valpackett · 2019-07-28T19:17:02Z

more manageable

I have a proposal for this: using a macro to make definitions one-line e.g.:

neon_op!(binary vadd_s8 : int8x8_t == simd_add, assert vadd / add, doc "Vector add");
neon_op!(binary vaddl_s8 : int8x8_t -> int16x8_t == simd_add, assert vaddl / saddl, doc "Vector long add");
neon_op!(unary vmovn_s16 : int16x8_t -> int8x8_t == simd_cast, assert vmovn / xtn, doc "Vector narrow integer");

This will make adding new ones easier (scrolling through a bolierplate-filled file just feels awful), and I'll add a lot more simd_sub simd_mul simd_lt etc. ones. Would this be accepted?

macro definition I currently have

macro_rules! neon_op {
    (binary $name:ident : $type:ident == $op:ident, assert $instr32:ident / $instr64:ident, doc $doc:literal) => {
        #[inline]
        #[doc = $doc]
        #[target_feature(enable = "neon")]
        #[cfg_attr(target_arch = "arm", target_feature(enable = "v7"))]
        #[cfg_attr(all(test, target_arch = "arm"), assert_instr($instr32))]
        #[cfg_attr(all(test, target_arch = "aarch64"), assert_instr($instr64))]
        pub unsafe fn $name(a: $type, b: $type) -> $type {
            $op(a, b)
        }
    };
    (binary $name:ident : $type:ident -> $result_type:ident == $op:ident, assert $instr32:ident / $instr64:ident, doc $doc:literal) => {
        #[inline]
        #[doc = $doc]
        #[target_feature(enable = "neon")]
        #[cfg_attr(target_arch = "arm", target_feature(enable = "v7"))]
        #[cfg_attr(all(test, target_arch = "arm"), assert_instr($instr32))]
        #[cfg_attr(all(test, target_arch = "aarch64"), assert_instr($instr64))]
        pub unsafe fn $name(a: $type, b: $type) -> $result_type {
            let a: $result_type = simd_cast(a);
            let b: $result_type = simd_cast(b);
            $op(a, b)
        }
    };
    (unary $name:ident : $type:ident -> $result_type:ident == $op:ident, assert $instr32:ident / $instr64:ident, doc $doc:literal) => {
        #[inline]
        #[doc = $doc]
        #[target_feature(enable = "neon")]
        #[cfg_attr(target_arch = "arm", target_feature(enable = "v7"))]
        #[cfg_attr(all(test, target_arch = "arm"), assert_instr($instr32))]
        #[cfg_attr(all(test, target_arch = "aarch64"), assert_instr($instr64))]
        pub unsafe fn $name(a: $type) -> $result_type {
            $op(a)
        }
    };
}

gnzlbg · 2019-07-28T19:20:59Z

For the definitions, I think that using macros is ok.

I am not sure I follow how does macros generate run-time tests for the intrinsics, that's usually the bulk of the work.

aloucks · 2020-07-08T01:36:19Z

What is the reasoning behind some intrinsics linking in the LLVM intrinsic directly while others are using the generic simd_XXX functions?

For example:

stdarch/crates/core_arch/src/arm/neon/generated.rs

Lines 1416 to 1430 in a371069

    
           /// Halving add 
        
           #[inline] 
        
           #[target_feature(enable = "neon")] 
        
           #[cfg_attr(target_arch = "arm", target_feature(enable = "v7"))] 
        
           #[cfg_attr(all(test, target_arch = "arm"), assert_instr("vhadd.u16"))] 
        
           #[cfg_attr(all(test, target_arch = "aarch64"), assert_instr(uhadd))] 
        
           pub unsafe fn vhadd_u16(a: uint16x4_t, b: uint16x4_t) -> uint16x4_t { 
        
               #[allow(improper_ctypes)] 
        
               extern "C" { 
        
                   #[cfg_attr(target_arch = "arm", link_name = "llvm.arm.neon.vhaddu.v4i16")] 
        
                   #[cfg_attr(target_arch = "aarch64", link_name = "llvm.aarch64.neon.uhadd.v4i16")] 
        
                   fn vhadd_u16_(a: uint16x4_t, b: uint16x4_t) -> uint16x4_t; 
        
               } 
        
           vhadd_u16_(a, b) 
        
           }

Versus:

stdarch/crates/core_arch/src/aarch64/neon/generated.rs

Lines 12 to 18 in a371069

    
           /// Compare bitwise Equal (vector) 
        
           #[inline] 
        
           #[target_feature(enable = "neon")] 
        
           #[cfg_attr(test, assert_instr(cmeq))] 
        
           pub unsafe fn vceq_u64(a: uint64x1_t, b: uint64x1_t) -> uint64x1_t { 
        
               simd_eq(a, b) 
        
           }

Given the sheer volume of neon intrinsics, it seems rather daunting to implement them all by hand using the guide in the first post. I'm wondering if there's a deterministic data driven way to generate all of them using #[link_name = "llvm.*"] as done in the first example. Maybe the llvm c headers could be useful?

bjorn3 · 2020-07-08T07:38:55Z

What is the reasoning behind some intrinsics linking in the LLVM intrinsic directly while others are using the generic simd_XXX functions?

Not all intrinsics have a corresponding simd_* platform-intrinsic.

I'm wondering if there's a deterministic data driven way to generate all of them using #[link_name = "llvm.*"] as done in the first example. Maybe the llvm c headers could be useful?

Please don't. The simd_* platform intrinsics are much easier to implement in alternative codegen backends than the llvm intrinsics, as they are generic over vector types and they are backend independent.

alexcrichton · 2020-07-08T14:39:45Z

@aloucks most of the intrinsics (AFAIK) have been added piecemeal over time, so it's sort of expected that they're not 100% consistent. Otherwise though I'd imagine that whatever works best would be fine to add to this repository. Auto-generation sounds pretty reasonable to me, and for an implementation we strive to match what Clang does in its implementation of these intrinsics.

alexcrichton · 2020-07-08T14:40:31Z

Also, to be clear, this library is not designed for ease of implementation in alternate codegen backends. The purpose of this crate is to get the LLVM backend up and running with SIMD. Discussions and design constraints for alternate backends should be discussed in a separate issue.

Lokathor · 2020-07-17T06:35:43Z

Hey all, some friends and I have made a google sheet of all the Neon intrinsics, their inputs, output, and the ARM summary comment.

There could easily have been errors when copying around and manipulating thousands of entries of text, but I think that it's got all the bugs sorted out.

If you want to try some auto-generation, this is a good place to start. There's even a column where I've marked what we have in nightly so far, so if you just auto-gen all the functions that aren't checked you shouldn't hit any duplicate definitions.

I hope to find the time to actually contribute some functions, but for now this will have to do.

EDIT: also I just subscribed to the entire repo, so if there's any PRs that add more functions I'll try to check those boxes on the sheet and keep it up to date.

Lokathor · 2020-07-20T22:56:42Z

Working with @Shnatsel, I described the "godbolt process" and they were kind enough to make it a bash script that you can run locally

#!/bin/bash
set -e
INTRINSIC_NAME="$1"
TEMP_DIR="$(mktemp -d)"
cleanup() {
    rm -r "$TEMP_DIR"
}
trap cleanup EXIT
(
cd "$TEMP_DIR"
echo "#include <arm_neon.h>
int test() {
  return (int) $INTRINSIC_NAME;
}" > ./in.c

clang -emit-llvm -O2 -S -target armv7-unknown-linux-gnueabihf -g0 in.c
ARM_NAME=$(grep --only-matching '@llvm.arm.neon.[A-Za-z0-9.]\+' ./*.ll | tr -d '@' | head -n 1)

clang -emit-llvm -O2 -S -target aarch64-unknown-linux-gnu -g0 in.c
AARCH64_NAME=$(grep --only-matching '@llvm.aarch64.neon.[A-Za-z0-9.]\+' ./*.ll | tr -d '@' | head -n 1)

echo "$INTRINSIC_NAME, $ARM_NAME, $AARCH64_NAME"
)

You will probably need the gcc-multilib package or similar installed so that the correct headers are available.

Note that many functions don't have an associated llvm intrinsic that can be as easily scrapped out this way, but maybe 1/4th or so of them do.

SparrowLii · 2021-03-08T11:28:12Z

@Lokathor Several instructions have been added recently: vaddhn, vbic, vorn, vceqz, vtst, vabd, vaba. Though some of them are not fully supported( like vceqzd). If you don’t have time to maintain this google sheet, I think I can help

nano-bot · 2021-03-09T13:51:49Z

@Lokathor Several instructions have been added recently: vaddhn, vbic, vorn, vceqz, vtst, vabd, vaba. Though some of them are not fully supported( like vceqzd). If you don’t have time to maintain this google sheet, I think I can help

Awesome, looking forward to this!

fzyzcjy · 2021-09-27T09:24:09Z

Any updates after a long time...? Thanks

bjorn3 · 2021-09-27T09:29:52Z

If you look at the pull request list you can see that there has been activity on this quite recently. For example #1224 was opened yesterday.

fzyzcjy · 2021-09-27T11:53:58Z

@bjorn3 Thanks! Indeed I mostly want to know when can we see it in stable version.
By the way do you suggest use nightly in production environment? If so I can use it now.

CryZe · 2021-10-21T10:53:58Z

@SparrowLii You marked the following instructions as completed (same for min):

It doesn't seem like those instructions are actually part of your recent PR (nor were they on the master branch before that) so I unmarked them again.

SparrowLii · 2021-10-21T11:08:02Z

@CryZe They can be found in the master branch now:
https://github.com/rust-lang/stdarch/blob/master/crates/core_arch/src/aarch64/neon/generated.rs#L8519-L8539
https://github.com/rust-lang/stdarch/blob/master/crates/core_arch/src/aarch64/neon/generated.rs#L8545-L8565
Sorry I marked them before #1230 merged, this is to prevent others from submitting duplicate PRs

CryZe · 2021-10-21T11:16:54Z

Welp, I'll mark them again then. Somehow the GitHub Pull Request UI doesn't show them as diffs at all: https://i.imgur.com/BsHR5in.gif

SparrowLii · 2021-10-21T11:19:48Z

Github’s comparison tool will always have problems when changing a large amount of code XD

SparrowLii · 2021-10-21T11:20:47Z

As in #1230, except for the following instructions and those use 16-bit floating-point, other instructions have been implemented:

The following instructions are only available in aarch64 now, because the corresponding target_feature cannot be found in the available features of arm:
vcadd_rot、vcmla、vdot
The feature i8mm is not valid:
vmmla、vusmmla: https://rust.godbolt.org/z/8GbKW5ef4
LLVM ERROR(Can be reproduced in godbolt):
vsm4e: https://rust.godbolt.org/z/xhT1xvGTP
LLVM ERROR(Normal in gotbolt, but LLVM ERROR: Cannot select: intrinsic raises at runtime)
vsudot、vusdot: https://rust.godbolt.org/z/aMnEvab3n
vqshlu: https://rust.godbolt.org/z/hvGhrhdMT
Not implmented in LLVM and cannot be implemented manually:
vmull_p64（for arm）、vsm3、vrax1q_u64、vxarq_u64、vrnd32、vrnd64、vsha512

Amanieu · 2021-10-21T11:41:27Z

As in #1230, except for the following instructions and those use 16-bit floating-point, other instructions have been implemented:
1. The following instructions are only available in aarch64 now, because the corresponding `target_feature` cannot be found in the available features of arm:
   `vcadd_rot`、`vcmla`、`vdot`

On LLVM's ARM backend, vcadd_rot and vcmla are under the v8.3a feature. vdot is under the dotprod feature. I got this information from llvm-project/llvm/lib/Target/ARM/ARMInstrNEON.td.

2. The feature `i8mm` is not valid:
   `vmmla`、`vusmmla`: [rust.godbolt.org/z/8GbKW5ef4](https://rust.godbolt.org/z/8GbKW5ef4)

Already discussed in rust-lang/rust#90079.

3. LLVM ERROR(Can be reproduced in godbolt):
   `vsm4e`: [rust.godbolt.org/z/xhT1xvGTP](https://rust.godbolt.org/z/xhT1xvGTP)

Use llvm.aarch64.crypto.sm4ekey instead of llvm.aarch64.sve.sm4ekey.

4. LLVM ERROR(Normal in gotbolt, but `LLVM ERROR: Cannot select: intrinsic` raises at runtime)
   `vsudot`、`vusdot`: [rust.godbolt.org/z/aMnEvab3n](https://rust.godbolt.org/z/aMnEvab3n)
   `vqshlu`: [rust.godbolt.org/z/hvGhrhdMT](https://rust.godbolt.org/z/hvGhrhdMT)

You need to make you test function pub in godbolt, otherwise it will be optimized away as unreachable by rustc before LLVM.

vsudot/vusdot require the i8mm target feature. vqshlu seems to work fine in godbolt after changing the pub.

5. Not implmented in LLVM and cannot be implemented manually:
   `vmull_p64`（for arm）、`vsm3`、`vrax1q_u64`、`vxarq_u64`、`vrnd32`、`vrnd64`、`vsha512`

These all seem to exist in LLVM at least for AArch64. For ARM we can just leave these out for now.

SparrowLii · 2021-10-25T02:52:20Z

Hope someone can help implement the remaining instructions.

SparrowLii · 2021-11-09T11:16:38Z

@Amanieu v8.5a feature is non-runtime detected so we can't use #[simd_test(enable = "neon,v8.5a")]. So how do we add tests for instructions that use v8.5a, like vrnd32x and vrnd64x?

hkratz · 2021-11-09T11:38:29Z

@SparrowLii Shouldn't that work with the frintts feature?

SparrowLii · 2021-11-09T12:26:22Z

@SparrowLii Shouldn't that work with the frintts feature?

Looks useful: https://rust.godbolt.org/z/894W8cndG

Amanieu · 2021-11-09T12:43:00Z

LLVM only supports frintts on AArch64, so it's fine to not support this intrinsic on ARM.

This comment has been minimized.

Sign in to view

gnzlbg mentioned this issue Dec 14, 2017

Find inconsistencies between the intel intrinsics XML file and the Rust code #240

Closed

quininer mentioned this issue Jan 28, 2018

Hardware accelerated AES for ARM RustCrypto/block-ciphers#10

Closed

alexcrichton added the A-arm label Jan 29, 2018

gnzlbg mentioned this issue Apr 18, 2018

vld1q_u32/vst1q_u32 etc. #429

Closed

This was referenced May 24, 2018

Make SIMD tracking issue marked for stdsimd too #460

Merged

What should be do about the stdsimd feature? #461

Open

valpackett mentioned this issue May 6, 2019

NEON/AdvSIMD comparison intrinsics #754

Closed

jmaibaum mentioned this issue Jun 29, 2019

Add ARM Neon vmvn_*/vmvnq_* bitwise not intrinsics #770

Merged

This was referenced Jul 31, 2019

ARM NEON support simd-lite/simd-json#32

Merged

Add more ARM SIMD intrinsics #792

Merged

aloucks mentioned this issue May 2, 2020

ARM NEON Intrinsics aloucks/directx_math#1

Open

calebzulawski mentioned this issue Aug 24, 2020

Add ARM NEON support calebzulawski/generic-simd#1

Open

6 tasks

bluss mentioned this issue Dec 7, 2020

std::arch does not implement some Neon SIMD intrinsics rust-lang/rust#75373

Closed

dragostis mentioned this issue Jan 7, 2021

Feature Tracking raygon-renderer/thermite#1

Open

31 tasks

This was referenced Mar 10, 2021

add vcgez, vcgtz, vclez, vcltz neon instructions #1069

Merged

add vcvt, vcvta, vcvtn, vcvtm, vcvtp neon instructions #1084

Merged

hkratz mentioned this issue Sep 29, 2021

Neon ejmahler/RustFFT#78

Merged

SparrowLii mentioned this issue Oct 13, 2021

Complete the remaining neon instructions #1230

Merged

Amanieu mentioned this issue Nov 9, 2021

Add remaining insturctions #1250

Merged

skewballfox mentioned this issue May 30, 2022

reworking the FFT in power_spectrum secretsauceai/mfcc-rust#2

Closed

mberry mentioned this issue Sep 21, 2022

Optimised Neon Arm-v8 Argyle-Software/kyber#11

Open

Nugine mentioned this issue Sep 23, 2022

Inlining failure for arm neon Nugine/simd#10

Closed

eirnym mentioned this issue Feb 12, 2024

Add Arm's NEON vectorization DoumanAsh/xxhash-rust#34

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement all ARM NEON intrinsics #148

Implement all ARM NEON intrinsics #148

gnzlbg commented Oct 24, 2017 •

edited by alexcrichton

Loading

This comment has been minimized.

oconnor663 commented Nov 15, 2018

gnzlbg commented Nov 15, 2018

oconnor663 commented Nov 16, 2018

gnzlbg commented Nov 16, 2018

oconnor663 commented Nov 16, 2018 via email

gnzlbg commented Nov 16, 2018

alexcrichton commented Dec 20, 2018

valpackett commented Jul 28, 2019

gnzlbg commented Jul 28, 2019

aloucks commented Jul 8, 2020

bjorn3 commented Jul 8, 2020

alexcrichton commented Jul 8, 2020

alexcrichton commented Jul 8, 2020

Lokathor commented Jul 17, 2020 •

edited

Loading

Lokathor commented Jul 20, 2020

SparrowLii commented Mar 8, 2021

nano-bot commented Mar 9, 2021

fzyzcjy commented Sep 27, 2021

bjorn3 commented Sep 27, 2021

fzyzcjy commented Sep 27, 2021

CryZe commented Oct 21, 2021 •

edited

Loading

SparrowLii commented Oct 21, 2021 •

edited

Loading

CryZe commented Oct 21, 2021

SparrowLii commented Oct 21, 2021

SparrowLii commented Oct 21, 2021 •

edited

Loading

Amanieu commented Oct 21, 2021

SparrowLii commented Oct 25, 2021

SparrowLii commented Nov 9, 2021 •

edited

Loading

hkratz commented Nov 9, 2021 •

edited

Loading

SparrowLii commented Nov 9, 2021

Amanieu commented Nov 9, 2021

Implement all ARM NEON intrinsics #148

Implement all ARM NEON intrinsics #148

Comments

gnzlbg commented Oct 24, 2017 • edited by alexcrichton Loading

This comment has been minimized.

oconnor663 commented Nov 15, 2018

gnzlbg commented Nov 15, 2018

oconnor663 commented Nov 16, 2018

gnzlbg commented Nov 16, 2018

oconnor663 commented Nov 16, 2018 via email

gnzlbg commented Nov 16, 2018

alexcrichton commented Dec 20, 2018

valpackett commented Jul 28, 2019

gnzlbg commented Jul 28, 2019

aloucks commented Jul 8, 2020

bjorn3 commented Jul 8, 2020

alexcrichton commented Jul 8, 2020

alexcrichton commented Jul 8, 2020

Lokathor commented Jul 17, 2020 • edited Loading

Lokathor commented Jul 20, 2020

SparrowLii commented Mar 8, 2021

nano-bot commented Mar 9, 2021

fzyzcjy commented Sep 27, 2021

bjorn3 commented Sep 27, 2021

fzyzcjy commented Sep 27, 2021

CryZe commented Oct 21, 2021 • edited Loading

SparrowLii commented Oct 21, 2021 • edited Loading

CryZe commented Oct 21, 2021

SparrowLii commented Oct 21, 2021

SparrowLii commented Oct 21, 2021 • edited Loading

Amanieu commented Oct 21, 2021

SparrowLii commented Oct 25, 2021

SparrowLii commented Nov 9, 2021 • edited Loading

hkratz commented Nov 9, 2021 • edited Loading

SparrowLii commented Nov 9, 2021

Amanieu commented Nov 9, 2021

gnzlbg commented Oct 24, 2017 •

edited by alexcrichton

Loading

Lokathor commented Jul 17, 2020 •

edited

Loading

CryZe commented Oct 21, 2021 •

edited

Loading

SparrowLii commented Oct 21, 2021 •

edited

Loading

SparrowLii commented Oct 21, 2021 •

edited

Loading

SparrowLii commented Nov 9, 2021 •

edited

Loading

hkratz commented Nov 9, 2021 •

edited

Loading