Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement SIMD-specific functions #16

Open
2 of 4 tasks
calebzulawski opened this issue Oct 1, 2020 · 11 comments
Open
2 of 4 tasks

Implement SIMD-specific functions #16

calebzulawski opened this issue Oct 1, 2020 · 11 comments

Comments

@calebzulawski
Copy link
Member

calebzulawski commented Oct 1, 2020

An incomplete list (motivated by a reddit comment)

  • scatter/gather
  • [ ] nontemporal load/store (not really SIMD, but maybe. see nontemporal_store)
  • lookup tables (possibly related to Do The SIMD Shuffle #11)
  • masked load/stores
  • reductions/horizontal fns, such as horizontal add
@Lokathor
Copy link
Contributor

Lokathor commented Oct 1, 2020

i saw the non-temporal stuff in std::arch when i wrote safe_arch but couldn't understand what was going on enough to confidently make a safety claim, so i skipped it.

@workingjubilee
Copy link
Member

workingjubilee commented Oct 1, 2020

(Non-)Temporal hinting lets you indicate loads and stores that should not be cached, because you are only reading once and storing once to that location, so you have no need in the rest of the algorithm to frequently read and update it. Basically, it's a way to indicate "please do not shatter my cache with these particular read/writes, I am using that for the ENTIRE REST of my algorithm."

This is a good link: https://vgatherps.github.io/2018-09-02-nontemporal/ My apologies if you already knew that much, but if you did, I would be interested to know what other information you feel would be needed?

@workingjubilee
Copy link
Member

workingjubilee commented Oct 1, 2020

I do not see nontemporal loads and stores as "SIMD-specific" (but maybe we just meant "SIMD loads"!) but I will acknowledge their heightened usage in SIMD. Obviously scatter/gather is!

@programmerjake
Copy link
Member

other load/stores that should be implemented are masked load/stores, strided load/stores (often faster than gather/scatter), and combinations thereof.

There's also vector reductions, e.g. reduce-add for things like dot-product.

@workingjubilee
Copy link
Member

workingjubilee commented Oct 2, 2020

Emits a !nontemporal store according to LLVM (see their docs). Probably will never become stable.

I am feeling pretty doubtful about exposing an interface to this in std::simd if this is not planned for eventual stabilization.
This concern probably doesn't matter, in retrospect. The note is still curious, however.

@workingjubilee
Copy link
Member

In #45 we found out that nontemporal stores are not really supported in the codegen backends, not LLVM and as far as I'm aware not in cranelift either, and also that there's no plans to expose this as a scalar API yet, so I took it off the list. That still leaves all the other stuff, of course.

@farnoy
Copy link
Contributor

farnoy commented Nov 15, 2023

How difficult would it be to expose masked loads & stores? Gather/scatter are already supported, while being more complex. Both are backed by LLVM intrinsics.

I tried working around it by using those gather operations, but the only optimization LLVM does on them it seems, is recognizing a broadcast/splat operation:

u8x8::gather_select_unchecked(
    bytestring,
    Mask::splat(true),
    usizex8::from_array([
        0, 0, 0, 0, 0, 0, 0, 0
    ]),
    u8x8::splat(0),
);

This translates to a single broadcast instruction on AVX, but any divergence in the indices causes inefficient emulation, like when the indices are 0 through 7. Even when specifically compiling for a target-cpu that supports AVX-512 masked loads.

@calebzulawski
Copy link
Member Author

I believe it's straightforward, like you said LLVM supports it. We only need to add an intrinsic to rustc that emits it.

@programmerjake
Copy link
Member

alternatively we could add an optimization to rustc and/or llvm that detects a vector of successive indices/pointers and converts to a masked load/store.

@farnoy
Copy link
Contributor

farnoy commented Nov 15, 2023

That sounds intimidating to me. I'll focus on getting the masked load/store intrinsics running and adding new methods to impl<...> Simd

@farnoy
Copy link
Contributor

farnoy commented Nov 15, 2023

I've submitted the rustc changes here: rust-lang/rust#117953 (so far only masked loads, no stores yet)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants