Implement SIMD-specific functions #16

calebzulawski · 2020-10-01T17:06:43Z

An incomplete list (motivated by a reddit comment)

scatter/gather
~~[ ] nontemporal load/store (not really SIMD, but maybe. see nontemporal_store)~~
lookup tables (possibly related to Do The SIMD Shuffle #11)
masked load/stores
reductions/horizontal fns, such as horizontal add

The text was updated successfully, but these errors were encountered:

Lokathor · 2020-10-01T19:06:02Z

i saw the non-temporal stuff in std::arch when i wrote safe_arch but couldn't understand what was going on enough to confidently make a safety claim, so i skipped it.

workingjubilee · 2020-10-01T20:56:52Z

(Non-)Temporal hinting lets you indicate loads and stores that should not be cached, because you are only reading once and storing once to that location, so you have no need in the rest of the algorithm to frequently read and update it. Basically, it's a way to indicate "please do not shatter my cache with these particular read/writes, I am using that for the ENTIRE REST of my algorithm."

This is a good link: https://vgatherps.github.io/2018-09-02-nontemporal/ My apologies if you already knew that much, but if you did, I would be interested to know what other information you feel would be needed?

workingjubilee · 2020-10-01T21:23:31Z

I do not see nontemporal loads and stores as "SIMD-specific" (but maybe we just meant "SIMD loads"!) but I will acknowledge their heightened usage in SIMD. Obviously scatter/gather is!

programmerjake · 2020-10-01T23:20:43Z

other load/stores that should be implemented are masked load/stores, strided load/stores (often faster than gather/scatter), and combinations thereof.

There's also vector reductions, e.g. reduce-add for things like dot-product.

workingjubilee · 2020-10-02T20:11:48Z

Emits a !nontemporal store according to LLVM (see their docs). Probably will never become stable.

~~I am feeling pretty doubtful about exposing an interface to this in std::simd if this is not planned for eventual stabilization.~~
This concern probably doesn't matter, in retrospect. The note is still curious, however.

workingjubilee · 2021-02-08T19:03:57Z

In #45 we found out that nontemporal stores are not really supported in the codegen backends, not LLVM and as far as I'm aware not in cranelift either, and also that there's no plans to expose this as a scalar API yet, so I took it off the list. That still leaves all the other stuff, of course.

farnoy · 2023-11-15T16:10:01Z

How difficult would it be to expose masked loads & stores? Gather/scatter are already supported, while being more complex. Both are backed by LLVM intrinsics.

I tried working around it by using those gather operations, but the only optimization LLVM does on them it seems, is recognizing a broadcast/splat operation:

u8x8::gather_select_unchecked(
    bytestring,
    Mask::splat(true),
    usizex8::from_array([
        0, 0, 0, 0, 0, 0, 0, 0
    ]),
    u8x8::splat(0),
);

This translates to a single broadcast instruction on AVX, but any divergence in the indices causes inefficient emulation, like when the indices are 0 through 7. Even when specifically compiling for a target-cpu that supports AVX-512 masked loads.

calebzulawski · 2023-11-15T16:26:45Z

I believe it's straightforward, like you said LLVM supports it. We only need to add an intrinsic to rustc that emits it.

programmerjake · 2023-11-15T17:21:39Z

alternatively we could add an optimization to rustc and/or llvm that detects a vector of successive indices/pointers and converts to a masked load/store.

farnoy · 2023-11-15T18:06:45Z

That sounds intimidating to me. I'll focus on getting the masked load/store intrinsics running and adding new methods to impl<...> Simd

farnoy · 2023-11-15T22:23:03Z

I've submitted the rustc changes here: rust-lang/rust#117953 (so far only masked loads, no stores yet)

calebzulawski mentioned this issue Nov 18, 2020

Nontemporal (streaming) stores #45

Closed

farnoy mentioned this issue Nov 16, 2023

Add support for masked loads & stores #374

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement SIMD-specific functions #16

Implement SIMD-specific functions #16

calebzulawski commented Oct 1, 2020 •

edited

Loading

Lokathor commented Oct 1, 2020

workingjubilee commented Oct 1, 2020 •

edited

Loading

workingjubilee commented Oct 1, 2020 •

edited

Loading

programmerjake commented Oct 1, 2020

workingjubilee commented Oct 2, 2020 •

edited

Loading

workingjubilee commented Feb 8, 2021

farnoy commented Nov 15, 2023

calebzulawski commented Nov 15, 2023

programmerjake commented Nov 15, 2023

farnoy commented Nov 15, 2023

farnoy commented Nov 15, 2023

Implement SIMD-specific functions #16

Implement SIMD-specific functions #16

Comments

calebzulawski commented Oct 1, 2020 • edited Loading

Lokathor commented Oct 1, 2020

workingjubilee commented Oct 1, 2020 • edited Loading

workingjubilee commented Oct 1, 2020 • edited Loading

programmerjake commented Oct 1, 2020

workingjubilee commented Oct 2, 2020 • edited Loading

workingjubilee commented Feb 8, 2021

farnoy commented Nov 15, 2023

calebzulawski commented Nov 15, 2023

programmerjake commented Nov 15, 2023

farnoy commented Nov 15, 2023

farnoy commented Nov 15, 2023

calebzulawski commented Oct 1, 2020 •

edited

Loading

workingjubilee commented Oct 1, 2020 •

edited

Loading

workingjubilee commented Oct 1, 2020 •

edited

Loading

workingjubilee commented Oct 2, 2020 •

edited

Loading