-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SIMD groundwork #1199
SIMD groundwork #1199
Conversation
Not really necessary: the type safety it offers can be provided by libraries.
X-posting my late comment from the pre-RFC: What I was saying was a bit more than that.
And yeah, I edited "alignment voodoo" into my post before you mentioned that 😛 Anyway, the result of the above is that one only really needs two changes to the compiler:
But as far as benefits, this avoids a large mass of worryingly magical (regarding parameter types) intrinsics being added to the compiler. |
Unfortunately if we want to get SIMD on stable Rust this would necessitate a stable
I believe a core aspect of this RFC is that it's stabilizing the absolute bare minimum of what the compiler needs to support SIMD. If we add in a few language traits and other various types here and there it's more surface area that will have to be stabilized. If this stuff can be built externally in a library, that'd be great! This RFC, however, is just focused on the compiler support. |
Sure! I just think that the way it makes the intrinsics unify types with sufficiently similar layouts under Rust currently has a nominal type system rather than a structural one; some subset of intrinsics magically ignoring that feels very questionable to me. |
Hm, maybe I haven't been clear. They're not really ignoring it. Suppose we have: #[repr(simd)]
struct A(f64, f64);
#[repr(simd)]
struct B(f64, f64);
extern {
fn some_simd_intrinsic(x: A);
} It's not legal to call (That said, it's slightly different, since the compiler doesn't enforce that arguments have the right layout for C functions (has to be done by the programmer), whereas the SIMD intrinsics do have layout-enforcement.) |
Ah, that does make rather more sense, and is not the impression I had gotten. |
The Motivation section mentions how this RFC aims to provide just some ground-work on top of which nice SIMD functionality could be built. While builtin arithmetic, shuffles etc for
I have a bad feeling about this. A regular method call shouldn't require parameters to be compile time constants. Using generics to express this requirement as shown here depends on #1062, but it would be a much cleaner solution.
Wouldn't a compile error be nicer here? That aside, when reading this RFC I had the same thought as @eternaleye: Why implement compiler magic when these functions could be implemented in plain rust with Additionally, inline asm allows the programmer to influence things like instruction scheduling and register allocation (within the asm section), in case the compiler is doing a bad job in that regard. So I'd suggest solving the "Operations" section of the RFC in a way that doesn't require any compiler changes (at least not specifically for SIMD). I'm unsure about what the People often argue that vectorization is best left to the compiler and as Rust uses LLVM, many simple cases can benefit from great optimizations. But unfortunately optimizers aren't perfect and simply unable to handle sufficiently complex code, so obviously an explicit way for programmers to express how they want code to be vectorized is necessary. Somewhere above, I suggested removing even those basic operations from So to summarize, I'm in favor of simplifying/breaking up the "types" and the "operations" sections of the RFC by building on top of much less specific compiler features and pushing as much work as possible into the to-be-written SIMD crate (and moving the "platform detection" section into its own RFC). This feels appropriate as the RFC's intention was just basic groundwork. Or in other words: I'm arguing that "the absolute bare minimum of what the compiler needs to support SIMD" is zero. The missing parts aren't necessarily SIMD-specific. |
@huonw, did I miss anything? ISTR you mentioning it didn't validate things like that they need to be homogenous, and that such things would be left up to |
Arithmetic identity-based optimizations apply just as well to SIMD operations as they do to non-SIMD operations. By using the intrinsics as opposed to inline asm we give LLVM the ability to do those optimizations. |
@pcwalton Oh, I didn't know that! Yes, that's a big advantage of the intrinsics then. |
Concerning the structural typing when importing the intrinsics: Please be careful that this does not end up allowing people to "peek through" private abstractions of data-types. That would be a horrible mess of a safety issue. Essentially, such merely structural typing should only be allowed if the module had access to all the fields of the type anyway: Either because they were all public (all the way down), or because the type was defined in the same module. |
Just a note to all the people that keep mentioning inline assembly: |
I agree that it isn't totally necessary to actually use the arithmetic operators: we could instead use a generic intrinsic similar to the comparison operators. However, I think it is important we do more than the platform intrinsics: LLVM (and compilers in general) knows more about its internal For shuffles, the optimisation applies: the compiler can e.g. simplify a sequence of shuffles into a single one. Also. the RFC discusses this. One point in it is the compiler synthesizing an optimal (/close to optimal) sequence of instructions for an arbitrary shuffle, instead of forcing the programmer to think about doing that themselves.
This isn't a regular method call: intrinsics are special in many ways. Note that my solution on #1062 that you link to just calls the intrinsic. This is the low-level API, people generally won't be calling the intrinsics directly.
Yes, sort of. However, using the trick mentioned in #1062 would result in very poor error messages, since the shuffle order may be passed through multiple layers of generic function calls possibly in external crates, meaning the out-of-bounds error is generated deep inside code that the programmer didn't write.
As other have said,
Neither of these apply to this: the API is essentially exposing individual CPU instructions, i.e. each These reasons apply if one was, say, writing an entire inner loop as a single
Yes.
It sort-of does, but in a very very restricted way, that's already possible with
Tuples and arrays don't have the right low-level details.
|
Transmute requires unsafe. It shouldn't be possible for safe code to violate abstraction boundaries. What you are proposing (if I follow your RFC correctly) is essentially that The restriction that the type used for the intrinsic has to be defined in the same module, shouldn't be a problem for the implementations you envision (with some crate(s) taking care of providing a descent abstraction) , right? |
Hm, I misunderstood what you were talking about. I'm unsure what the problematic situation you're envisioning could be. Is it something like: Crate (I'm not against the privacy restriction, I'm just trying to understand the motivation more concretely.) |
Is a lang item the best intuition here? Wouldn't a closer analogy be
Would it be possible to just make this best-effort, and fall back to laying the type out normally if it doesn't meet the SIMD requirements (perhaps with a warning)? That seems cleaner than implicit requirements at the code generator level, which feels like a contract or layering violation (if it passes the typechecker, it should compile), and perhaps more in the spirit of
I wonder if the fact that the borrow checker enforces limits on observability wouldn't actually let us support the interior reference-taking, just in a less efficient way, by first copying the field out onto the stack when taking a shared In this way it would be even more truly the case that
I might be more comfortable with this if you had to explicitly write |
This essentially means its much less of an issue than I thought.
Which part is unclear - whether these rules allow code to get around the restriction that usually apply to private fields, or whether getting access to private fields is an issue? The answer to the latter is that it violates parametricity - for now I'll just assume that as accepted. Please tell me if I should elaborate on that. Unsafe code (through transmute) can violate parametricity anyways, but I would still prefer if no additional violations would be introduced. Regarding the first part, let me try to come up with some examples. I assume "A::Simd" is some other crate's |
Ah, I think you might be working with the same misunderstanding as @eternaleye, that the intrinsics can be called with any type that matches structurally, it's just that they can be declared with any type that matches structurally. See #1199 (comment) . (I 100% agree that being able to write a transmute via SIMD intrinsics would be unfortunate.) |
I had that misunderstanding at first, but the post above was written without that assumption. The first one, regarding add, can't I declare this with |
I agree that
I think we can relax this in future if we find the RFC doesn't work well in practice. (It's part of the reason I proposed a hard error.)
Interesting idea, however it seems relatively complicated, and not worth it for SIMD: efficient SIMD code won't be handling/mutating individual elements like this very much.
Another alternative is just providing arithmetic intrinsics.
Oh, I see. It seems sensible to disallow it. I.e. have nominal equality constraints within a definition. And for the Shuffle is generic because it can be used with literally any SIMD type, i.e. it's not restricted to some subset of types with the same structure. |
It's valid to `extern` them, though.
Automatic crazy boolean bit-packing is crazy.
Updated. Highlights:
|
This implements rust-lang/rfcs#1199 (except for doing all the platform intrinsics). Things remaining for SIMD (not necessarily in this PR): - [x] I (@huonw) am signed up to ensure the compiler matches the RFC, when it lands - [x] the platform specific intrinsics aren't properly type checked at the moment (LLVM will throw a "random" assertion) - [ ] there's a lot of useful intrinsics that are missing, including whole platforms (mips, powerpc) - [ ] the target-feature `cfg` detection/adding is not so great at the moment - [x] I think the platform specific intrinsics should go in their own `extern` ABI (i.e. not `"rust-intrinsic"`) (I'm adjusting the RFC to reflect the latter.) I think it would be very nice for this to land without requiring the RFC to land first, because of the first point, and because this is the only way for any further work to happen/be experimented with, without requiring people to build/install/multirust a compiler from a custom branch. r? @alexcrichton
This implements rust-lang/rfcs#1199 (except for doing all the platform intrinsics). Things remaining for SIMD (not necessarily in this PR): - [x] I (@huonw) am signed up to ensure the compiler matches the RFC, when it lands - [x] the platform specific intrinsics aren't properly type checked at the moment (LLVM will throw a "random" assertion) - [ ] there's a lot of useful intrinsics that are missing, including whole platforms (mips, powerpc) - [ ] the target-feature `cfg` detection/adding is not so great at the moment - [x] I think the platform specific intrinsics should go in their own `extern` ABI (i.e. not `"rust-intrinsic"`) (I'm adjusting the RFC to reflect the latter.) I think it would be very nice for this to land without requiring the RFC to land first, because of the first point, and because this is the only way for any further work to happen/be experimented with, without requiring people to build/install/multirust a compiler from a custom branch. r? @alexcrichton
shuffles are exposed generally: intrinsics that represent arbitrary | ||
shuffles. | ||
|
||
This may violate the "one instruction per instrinsic" principal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"principal" seems to be a typo (for "principle")
Hear ye, hear ye. This RFC is entering final comment period. |
What's the basic plan with respect to stabilization of these features? As long as the idea is just to get these into the unstable compiler, so that the implementation PR can land and we can start experimenting more seriously with higher-level APIs, further evolving the lower-level ones, and whatever else, that's completely cool. But the number of places (there's a few) where we're currently forced to say "enforcing this in the type system is difficult-to-impossible right now, so let's punt it to the backend" still bothers me if/when it's something we'd be committing to support as a stable feature forever (or until 2.0, anyways). |
@glaebhoerl I think there is no plan to stabilize until experience has been gained, but of course the further we go down this road, the less likely we'll back up and start from another. |
Basically, I suspect the approach I'd prefer is that once we have this infrastructure, and we've used it to gain some experience and to figure out the best way to formulate higher-level "type-safe" (i.e. without checks deferred to codegen) abstractions for SIMD (various |
To be clear, I agree with you Gabor -- I'd like to gain experience in what On Thu, Sep 10, 2015 at 5:45 PM, Gábor Lehel [email protected]
|
Huzzah! The language subteam has decided to accept this RFC. |
Tracking issue is rust-lang/rust#27731 |
Rendered.