-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARM NEON support #32
ARM NEON support #32
Conversation
When you have something that builds, please let me and @WorksOnArm know - would love to provide test cycles and diverse hardware to check out performance on. |
Thank you so much :D that's awesome! |
Nooo! it seems like we're going to be blocked on missing intrinsics :( https://doc.rust-lang.org/core/arch/aarch64/index.html / rust-lang/stdarch#148 |
@Licenser Do you have an inventory yet of intrinsics that you need / intrinsics that are missing? Reading the linked issue, sounds like there's slow progress. |
Ah yes I made a list and then posted it to the wrong ticket ... silly me ... Those are the intrinsics I found in @lemire's arm64 implementation
(there is a full list of missing instructions on the rust ticket - those are the required ones for porting simdjson.rs) |
@Licenser alas! Would you be open to the possibility of a PR that uses assembly macros in the meantime? Maybe it won't be that far off from the intrinsic version... |
Absolutely, I also gave you contributor permission so no or required;) I might take a look on the weekend to see what is required to get the intrinsics at least into nighly |
@Licenser that's awesome - I'll take a pass at defining some intrinsics in 'src/neon/intrinsics.rs', and we can compare notes as you work with nightly! |
I started working on a pull request: rust-lang/stdarch#792 |
We just published simdjson 0.2.0 with NEON support... |
Huzza! |
* feat: neon support * feat: temp stub replacements for neon intrinsics (pending rust-lang/stdarch#792) * fix: drone CI rustup nightly * feat: fix guards, use rust stdlib for bit count operations * fix: remove double semicolon * feat: fancy generic generator functions, thanks @Licenser
OMG OMG OMG! this is great! :D |
@Licenser are you thinking we might be able to merge this today and then have a subsequent PR to delete the intrinsics once everything's available in nightly? Thank you again for all your help. PS the new UTF8 tests look great! |
I'd rather not, I could see that in resulting in some headache downstream if the intrinsics make it in and that'd be very, very, very hacky for a crate. That said brave people ca already use it as a git dependency by pointing to the git branch. |
Ah, that makes sense. What is taking you so long?!! ;) |
Maybe I found something? Let me know what you think... https://godbolt.org/z/36hnUE #[cfg(target_arch = "aarch64")]
#[cfg_attr(target_arch = "arm", target_feature(enable = "v7"))]
#[rustc_args_required_const(1)]
pub unsafe fn vget_lane_u8(a: uint8x8_t, n: u32) -> u8 {
if n < 0 || n > 7 {
unreachable_unchecked();
};
match n {
0 => a.0,
1 => a.1,
2 => a.2,
3 => a.3,
4 => a.4,
5 => a.5,
6 => a.6,
7 => a.7,
_ => unreachable_unchecked()
}
} (Also clang: https://clang.godbolt.org/z/TpqJIp) |
@Licenser I think your vld1q is all set, since the intrinsic turns into ldr anyway? |
Oh that's a very good catch! then the ld1 commands are indeed done :D for |
This leaves only those two functions: // uint64_t vget_lane_u64 (uint64x1_t v, const int lane)
arm_vget_lane!(vget_lane_u64, uint64x1_t, u64, 0);
#[simd_test(enable = "neon")]
unsafe fn test_vget_lane_u64() {
let v = i64x1::new(1);
let lane = 0;
let r = vget_lane_u64(transmute(v), lane);
assert_eq!(r, 1);
}
// uint32_t vgetq_lane_u32 (uint32x4_t v, const int lane)
arm_vget_lane!(vgetq_lane_u32, uint32x4_t, u32, 3);
#[simd_test(enable = "neon")]
unsafe fn test_vgetq_lane_u32() {
let v = i32x4::new(1, 2, 3, 4);
let lane = 1;
let r = vgetq_lane_u32(transmute(v), lane);
assert_eq!(r, 2);
} |
|
@Licenser that's awesome work... very nice! I think some of the "ldr" confusion is because the operands are Does this look good? Let me know what you think! All the best, -Sunny |
* Use simd-lite * Update badge * Update badge * Get rid of transmutes * Use NeonInit trait * vqsubq_u8 fix * vqsubq_u8 fix pt. 2 * use reexprted values from simd-lite
No description provided.