SIMD vector type syntax: [|N|]T #6771

andrewrk · 2020-10-22T23:40:46Z

Currently we have @Vector for this, however, see #5207 and #6209.

Array syntax is [N]T. This is a proposal for SIMD vector syntax to be [|N|]T instead of @Vector(N, T). For example, a vector of four 32-bit integers would be [|4|]i32.

The main motivation for this would be that the compiler needs to be able to talk about primitive types in type names and in compile errors. Without syntax for this primitive type, in order to do this the compiler would introduce a dependency on the std lib such as std.meta.Vector(4, i32) which is verbose and can make compile errors and types more difficult to read at a glance, or it would have to do something like @Type(.{.Vector = .{.len = 4, .child = i32}}) which is even more verbose, making people wonder whether simd vectors really are first-class types in zig after all.

I chose | because it is already associated with bitwise operations, and because it looks OK when symmetrically positioned against the [ and ].

Add SIMD Support Add SIMD Support #903

The text was updated successfully, but these errors were encountered:

tadeokondrak · 2020-10-23T00:21:46Z

The main motivation for this would be that the compiler needs to be able to talk about primitive types in type names and in compile errors.

Note @typeInfo(@typeInfo(@TypeOf(%s)).Fn.return_type.?).ErrorUnion.error_set is already
used for inferred error sets.

I think syntax like v4f32 or f32x4 is easier to read and much better for the common case of non-pointer vectors. @Type/std.meta.Vector is available for any others.

LemonBoy · 2020-10-23T08:39:49Z

I think syntax like v4f32 or f32x4 is easier to read and much better for the common case of non-pointer vectors.

I like the v syntax, it feels a natural extension of the usual scalar type syntax.
On top of that we should offer a set of "sane" combinations of N and T in std.simd that make sense for the hardware and avoid people running face-first into a (performance) wall.

LLVM covers your ass when working on non-canonical (where T is not i/u{1,8,16,32,64,128}) but every single load/store/op done on such vectors is slow AF since the hardware has no native support for such wonky-sized vectors and the generated code scalarizes, performs the requested operation, masks off the unwanted bits and then re-packs the vector.

ikskuh · 2020-10-23T08:48:08Z

I also think we should add a native syntax, although i prefer TxN, so f32x4 is more readable imho, as it still conveys type information first (it's a f32, but 4 of them). But i can understand why v4f32 is preferrable, as it follows the zig array decl: [4]f32 and v4f32 are similar.

another option would be <4>f32, but this would introduce ambiguities

michal-z · 2020-10-23T11:40:00Z

I think that f32x4 is more readable and easier to write.

Snektron · 2020-10-23T11:49:05Z

The problem with v4xf32 and f32x4 though is that the compiler still needs to generate calls to std.meta.Vector (or a long chain of builtin calls like with error set returns).

ikskuh · 2020-10-23T12:14:13Z

After some discussion with @Snektron we came up with another idea, utilizing already existing features and make people remember less syntax (assimg @Vector(4, f32)):

~~[|4|]*f32~~
~~*f32x4~~
~~v4*f32~~
~~<4> *i32~~
~~4 ** *i32~~
*i32 ** 4

Just use the ** operator for comptime repetition to also lift types from scalar to vector type:

T ** N == @Vector(N, T) == std.meta.Vector(N, T)

LemonBoy · 2020-10-23T13:00:03Z

The problem with v4xf32 and f32x4 though is that the compiler still needs to generate calls to std.meta.Vector (or a long chain of builtin calls like with error set returns).

Why? Once that syntax is adopted for all the Vector types the compiler is free to use that syntax as well.
std.meta.Vector returns v4f32 (or f32v4) vectors with this proposal.

Snektron · 2020-10-23T13:08:51Z

Why? Once that syntax is adopted for all the Vector types the compiler is free to use that syntax as well.
std.meta.Vector returns v4f32 (or f32v4) vectors with this proposal.

This syntax doesn't allow for vectors of pointers, or vectors of some aliased type.
I probably should have clarified in my original comment, sorry about that.

LemonBoy · 2020-10-23T13:22:31Z

This syntax doesn't allow for vectors of pointers, or vectors of some aliased type.

I always forget of the vectors of pointers, thanks for reminding me.

ghost · 2020-10-23T13:23:30Z

I very strongly disapprove of **. Firstly, on values that's an array operator, so it's easily confused; secondly, a SIMD multiplier would then be the only postfix type modifier, and we'd have C-style spiral precedence. Packed-native format is also terrible, because it only covers a subset of valid use cases, so it doesn't actually eliminate any human memory overhead.

A strictly regular type modifier is a necessity in my eyes, and since we don't have a modifier application operator (and we absolutely should not ever add one), another variation on bracket syntax seems like the best option. Andrew's original proposal fits that, as well as being "augmented" enough that it's clear something else is going on.

Rocknest · 2020-10-23T15:55:23Z

I dont like any of the proposals, bars, x'es, stars don't look good and also confusing, one looks like an array and others like an identifier. I'm fine with status quo, whenever i use vectors i make aliases. If we really really need syntax for vectors i guess this is the least confusing, since it similar to const array ptr:
[4]vec i32

SpexGuy · 2020-10-23T17:36:45Z

I know we removed it but personally I think @Vector(N, T) is clearer than any of these.

ghost · 2020-10-23T17:59:39Z

[4]vec i32 looks nice. Though maybe it should be [4]simd i32? To more explicitly signal that it is an array-like object that is meant specifically for SIMD processing.

I know this is a bit off topic, but "vector" is such an overloaded term in computing, and usually little to do with the original mathematical term to boot.

LemonBoy · 2020-10-23T18:03:04Z

[N]vec T is inconsistent with the array syntax, here vec applies to the whole thing while other modifiers such as const affect the type. If you want to stretch this syntax you could use something like [N]lane T that makes sense from the simd point of view.

ghost · 2020-10-23T18:11:28Z

@LemonBoy, could you clarify? I was thinking of [N]vec as an atomic modifier just like const, [N:0] or anything else.

ghost · 2020-10-23T18:32:14Z

Some more syntax variants on a slightly more complex example:

[w][h][4]vec f32

[w][h][4v]f32

[w][h][4 simd]f32

[w][h][|4|]f32

[w][h]@Vector(4, f32)

All of the bracket-based variants have the disadvantage that they only make sense on the inner-most array (you can't really have [w][|4|][h]f32), which is a bit inconsistent. In light of that, I would agree with @SpexGuy that the old @Vector syntax is still best in many cases.

LemonBoy · 2020-10-23T18:53:30Z

@LemonBoy, could you clarify? I was thinking of [N]vec as an atomic modifier just like const, [N:0] or anything else.

[N]const T is a N-element array of const T values, the whole array is transitively constant too.
*const T is a pointer to a constant value.
Following this logic [N]vec T is a N-element array of vector T (??), hence my suggestion to use the term lane as [N]lane T means a bundle of N lanes of width equal to the one of T.

ghost · 2020-10-23T18:56:03Z

Yeah, I guess the associativity is backwards in this case 😄

Snektron · 2020-10-23T19:06:40Z

Following this reasoning the modifier could simply be placed right of the array: simd [N]T

ghost · 2020-10-23T19:14:44Z

[4x]T anyone?

kyle-github · 2020-10-23T20:39:19Z

Delurking for a minute.

Is there any projected impact on Zig's use of SIMD vectors from things like Arm's SVE? The examples I have seen of what compilers can do to automatically vectorize normal arrays using tools like SVE and RISC-V's V extension are quite impressive.

ghost · 2020-10-24T12:32:04Z

Interesting question. From my superficial understanding of ARM-SVE, it represents a very significant departure from the SIMD paradigm. It is designed to operate directly on large arrays with runtime-known length, rather than manually partitioned fixed-sized chunks. In particular, array length does not need to be a multiple of the native vector size, thanks to the ability to load and operate on incomplete vectors. SVE also relies heavily on a separate bank of predicate registers that don't have a direct counterpart in traditional SIMD.

My cautious conclusion would be that SVE is not urgently relevant to the present bikeshedding session, since we are discussing syntax sugar for a fixed-width SIMD data type. It should also be kept in mind that the availability of SVE-supporting commodity hardware is still pretty much zero (I'm not going to count the Fujitsu A64FX), so introducing special syntax for it may be premature. All in all, it would probably be best to extract this question into a separate issue.

kyle-github · 2020-10-24T15:54:44Z

Availability, yeah, that is an issue today, but probably not within a year or so. Even availability of Arm servers has gone from zero to lots with AWS being so cheap for Graviton instances. Arm is clearly pushing (we'll see what NVidia does) SVE/Helium everywhere in their next generations of cores. Everything is going to have some form of VLA (variable length array) support.

It was precisely these facts that made me wonder a bit if Zig was skating to where the puck is today and not where it will be in a few years:

VLA extensions toss out a lot (most?) of the pain of dealing with SIMD.
Arm is used in supercomputers, cloud instances (AWS), and will be in mainstream laptops shortly (Apple), as well as the usual area of phones where it has more than 99% market share.
RISC-V is not mainstream yet, but gaining surprising wins and there is only one accepted SIMD extension and that is VLA too.

Where is x86 in this? No idea but with Arm now entering into the Supercomputer 500 list due to SVE... There are so many, many advantages to VLA support.

But I agree that this is a different discussion point. Sorry for the diversion! I am really excited about VLA support in CPUs because of the ability to write code once that just works across a large range of hardware and it means far less support for Intel's idiotic market segmentation by ISA version (try to figure out which AVX512 instructions are supported on which processor!).

I'll go back to lurking 😃

ghost · 2020-12-05T08:07:21Z

@andrewrk There is an ambiguity in the proposed syntax: if the length is the result of a bitwise or, the lexer will need to look ahead to know that the pipe does not pair with a close bracket. We don't have this problem with captures because they can only be alphanumeric, but integers can be arbitrary expressions. We could potentially make use of the unused #, $ sigils, but that would be ugly.

Re: VLA, I think our fixed-length paradigm can be adapted, if we relax the requirement of corresponding strictly to hardware SIMD, like we already do with integers. So, we have a vector corresponding to the size of our problem, which we can make as big as we like, and then the compiler is free to split it up into appropriately-sized chunks. Lane predication could be handled by vectors of bool and overloading of index syntax.

Snektron · 2020-12-05T16:04:22Z

There is an ambiguity in the proposed syntax: if the length is the result of a bitwise or, the lexer will need to look ahead to know that the pipe does not pair with a close bracket. We don't have this problem with captures because they can only be alphanumeric, but integers can be arbitrary expressions. We could potentially make use of the unused #, $ sigils, but that would be ugly.

Adding a separate token for [| and |] would solve that.

ghost · 2020-12-16T10:53:40Z

Both this and #1974 are talking about the same issue: numbers formats hold a lot of information. We kind of want some sort of compositional syntax that's easy to read and easy to type, if only for ease of standard communication about boilerplate.

This makes nobody happy, and I'm just going to use the equivalent of const Worldspace = FixedPoint(.{ .signed = true, .integer_bits= 16, .fractional_bits = 16, .simd_width = 8}); anyway, but I feel this commonality between issues is worth pointing out.

pixelherodev · 2021-06-27T10:52:39Z

I'd go with

[-N-]T

personally, for the same reasons of consistency I outlined earlier.

[*]T pointer-to-many
[]T slice
[N]T array
[-N-]T vector

vs

[*]T pointer-to-many
[]T slice
[N]T array
N[-T-]

lemaitre · 2021-06-28T09:18:23Z

I would like to highlight an argument against consistency.

Pointers and slices are "decorators" of any types, even user defined ones. However, SIMD should most likely be restrained to primitive types (uX, iX, fX, and maybe to pointers and slices). So giving a syntax close to pointers and slices could make people believe they can use it for any types, even their own. Giving a totally different syntax (like @simd(T, N) or i8x16) will make it explicit that it cannot be used by custom types.

The reason that SIMD should most likely not be defined on custom types is what would be the meanings of methods on those.
Methods could be forbidden, but it would make the resulting type a bit useless. Methods could be kept, but with what semantics?

All in all, I'm not saying we should stay away from a consistent syntax, just that we need to be careful with it because of this distinction.

pixelherodev · 2021-06-29T00:41:22Z

That is a good point. However, I think the advantage of consistency is more important, regardless. If I attempt to, say, make a vector of a structure, the compiler will reject it, and it will be clear that is not allowed. Moreover, anyone using vectors should by necessity understand how they work anyways - the documentation should render "vectors can only be made of primitives" clear, so it shouldn't be a concern.

That said, making it more immediately obvious has clear benefits as well. If a different syntax is desired, that is reasonable - however, using builtins is still a horrid solution, since it continues to leave vectors as second-class types.

andrewrk · 2021-12-19T04:35:26Z

We're definitely going to have SIMD vector syntax, and get rid of std.meta.Vector as well as @Vector. The only question is what color the bikeshed should be.

haoyu234 · 2021-12-20T23:58:51Z

how about [^N]T

InKryption · 2021-12-21T01:34:07Z

Is this open to more bikeshedding? If so, then I'll throw mine out there: [simd N]T.

kenaryn · 2021-12-22T17:43:58Z

I suggest Vect N T like Idris2 :D

nektro · 2021-12-25T20:39:49Z

I like @Vector(T, N) or [[N]]T

topolarity · 2022-07-14T22:19:52Z

One interesting thought:

Layout-wise, vectors are what you get when you pack arrays without padding and store them in integers. In this sense, they are just like integer-backed structs (#5049). Maybe packed(u256) [8]f32, or simply packed [8]f32?

Notice for both packed [N]T and packed struct:

Elements are laid out contiguously from LSB to MSB
They can be expected to be held in machine registers (if small enough)
They have increased alignment, corresponding to their backing integer
They can bitcast to/from their backing integer type
They cross C ABI boundaries like their respective backing integer

To my knowledge, all of this is already true about @Vector(N,T) (except for bitcasts for which Zig is overly strict right now, and reversed indexing on big-endian systems). It's just not at all obvious from the existing syntax.

ifreund · 2022-07-14T22:52:53Z

(except for bitcasts for which Zig is overly strict right now)

Note that @ptrCast()ing can be used to work around this and is currently the only way I know of to get from e.g. a @Vector(16, bool) to a often more useful u16. I'm not sure if this is 100% intended in the Zig language design but the stage1 generated LLVM IR is valid and does what I want.

topolarity · 2022-07-15T00:26:46Z

A natural extension would be to allow packed [N]T in a packed struct

That would give us back arrays in packed structs, which are currently unsupported under #5049

xdBronch · 2023-07-14T10:42:31Z

i noticed this got moved up recently so thought i'd give my 2 cents

@Vector(N, T) is clear but quite verbose, with Apply RLS to @splat builtin, eliminating its length parameter #16346 sometimes annoyingly so in cases where @as is needed (no one wants to write @as(@Vector(4, u32), @splat(5)))
[|N|]T isnt bad but as someone else pointed out, [ and | and both vertical which can make it slightly less readable. very minor but with some fonts, | is noticeably taller than [, not a fan of how it looks
f32x4 and friends look good but would disallow aliased types and lengths, i suspect in most cases this wouldn't be a big deal but if vectors are meant to be treated like fancy arrays, i dont think this would work
[-N-]T isnt a bad contender but imo just doesnt seem fitting. i dont think i have further justification for why
packed [N]T, [simd N]T, [vec N]T, and friends help with clarity but begin to become verbose like the current @Vector, not much of a fan and "packed" doesnt convey the SIMD part of vectors well imo
<N>T is probably my personal pick, its very short to type and makes it obvious that its similar to arrays but still a distinct type. @as(<4>u32, @splat(5)) is even almost pleasant to type where needed. possibly problematic for newcomers since <> is used for templating in languages like C++ and Rust.

any of the other suggestions either fall into one of these categories and/or have already been discussed enough

sidenote, We're definitely going to have SIMD vector syntax, and get rid of std.meta.Vector as well as @Vector. The only question is what color the bikeshed should be. maybe change the title to a more general "Change SIMD vector type syntax" and add accepted label?

ghost · 2023-07-17T10:55:13Z

Something to consider about pointer vectors:
AVX2 gather instructions come in 32-bit and 64-bit-index variants, and this distinction would be lost if the indices were instead pointers. More generally, pointers are a concept that is more suitable for single-item references and complex data structures. When working with blocks of uniform data (even if it's strided), indices and slices tend to be more appropriate. So overall I was wondering whether pointers are all that useful a concept when it comes to SIMD. Maybe we're trying to overengineer a solution here and would better off to limit the design to basic numeric types only?

expikr · 2023-10-28T13:25:36Z

Since SIMD only works with a handful of primitive types, why not make them individual built-in functions?

i.e.

@f32(n)
@f64(n)

Snektron · 2023-10-29T19:44:31Z

why not make them individual built-in functions?

i.e.

@f32(n)

@f64(n)

This syntax doesn't allow for vectors of pointers, or vectors of some aliased type.

Also,

T is probably my personal pick, its very short to type and makes it obvious that its similar to arrays but still a distinct type.

This has parsing issues like C++ templates. Thats a hard pass from me.

Snektron · 2023-10-29T19:47:41Z

To be honest, I don't really see the problem with keeping @Vector other than the confusion that it creates with math vectors. Maybe all that it needs is renaming it to @Simd? I don't see why we need separate syntax for it. Keeping it as named function makes it much clearer for the user what is happening - no other language that I know of implements syntax like [|N|]T. Most of the other solutions in this thread are either some low-effort variants of this, or solutions that exhibit the problems outlined in my previous comments.

Currently we have @vector for this, however, see #5207

By the way, the original reason why this issue was opened has been rejected. Is this still relevant at all?

expikr · 2023-10-31T10:05:55Z

By the way, the original reason why this issue was opened has been rejected. Is this still relevant at all?

Did some digging, do I have this timeline correctly?

remove builtin functions for creating types that are redundant with @Type #5207
- "we have a general-purpose @Type, so dedicated builtins are redundant"
Remove @Vector #6209
- starts process of removing dedicated builtins
SIMD vector type syntax: [|N|]T #6771
- since [[5207]] is deleting @Vector, we'll need a replacement syntax for SIMD
remove @Type from the language, replacing it with individual type-creating builtins #10710
- "nvm using @Type sucks, let's go back to short builtins"
closed [2]
(we are here) should close [3], since there is no longer a [[5207]] deleting @Vector

But if a concise syntax is still needed, I'd like to propose the following:

`type^dim` (i.e. $\mathbb{R}^n$ , $\mathbb{Z}^n$ etc)

const Color = u8^4;
const Vec3 = f64^3;

const up = f32^3 {0,1,0};

leverage Zig's unique "types are values" semantic --> types can be exponentiated like values, resulting in a product type
reuse existing operator precedence for ^
basically the "f32x4" syntax but without the drawback @xdBronch mentioned as it's using an operator symbol.
mathematically meaningful ($\mathbb{R}^n$ for set of multidim floats, $\mathbb{Z}^n$ for set of multidim ints) in denoting vector properties, formally distinct from simple list of data without special symmetries such as regular arrays.

Symbol	Set
`u8^4`	$\underset{[0,256)}{\mathbb{Z}^4}$
`i8^3`	$\underset{[-128,128)}{\mathbb{Z}^3}$
`f64^3`	$\underset{[-2^{53},2^{53})\cdot 2^{[-1024,1024)}}{\mathbb{R}^3}$
`f32^4`	$\underset{[-2^{24},2^{24})\cdot 2^{[-256,256)}}{\mathbb{R}^4}$

nektro · 2023-10-31T10:13:17Z

yes but the existence of 10710 doesn't invalidate this. this issue is independently about replacing @Vector(N, T) with [|N|]T

RetroDev256 · 2024-10-08T22:54:50Z

With the accepted issue #21635, it got me thinking.

Why not simply have the idea of "vector" represent a restricted definition of an array? A vector could be defined as [8]vector u8 for example, where "vector" would be an attribute of the array [8]u8.

What are the main differences between raw arrays and vector types? It seems to be this:

Arrays are aligned to the alignment of their element type, whereas @Vector types are aligned to their total size (I think)
Vector types allow certain operators to be used on them (such as +, -, *, /, etc.), while arrays only have standard concatenation operators (++ and **).
Vector types can only hold boolean, integer, and float types, whereas arrays can hold a much wider variety of types.

The "vector" annotation would just mean that these 3 restrictions/allowances are in place for the types, and would help the mental model of why arrays and vectors can generally coerce to each other, and also why certain operations (such as @splat) could work for both.

Personally I feel this way of defining a vector makes some good sense, but I would love to hear some other opinions.

EDIT: With this definition of a vector, you could directly slice a vector, but the "vector" attribute would be lost, acting as if you sliced a normal array (idk if this is already implemented, but it makes sense).

Lucifer-02 · 2024-10-10T15:42:52Z

You can also refer to implement of Mojo lang

andrewrk added the proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. label Oct 22, 2020

andrewrk added this to the 0.8.0 milestone Oct 22, 2020

andrewrk mentioned this issue Oct 22, 2020

Remove @Vector #6209

Closed

ghost mentioned this issue Nov 11, 2020

Proposal: Generalise SIMD to arbitrary tensor types, remove footguns in vector syntax #7076

Closed

andrewrk modified the milestones: 0.9.0, 0.10.0 Nov 23, 2021

andrewrk mentioned this issue Nov 30, 2021

make overflow arithmetic builtins return a tuple instead of using a pointer parameter and bool return value #10248

Closed

andrewrk modified the milestones: 0.10.0, 0.11.0 Apr 16, 2022

topolarity mentioned this issue Jul 15, 2022

Builtin Matrix type #4960

Open

andrewrk modified the milestones: 0.11.0, 0.12.0 Apr 9, 2023

andrewrk modified the milestones: 0.13.0, 0.12.0 Jul 9, 2023

SIMD vector type syntax: [|N|]T #6771

SIMD vector type syntax: [|N|]T #6771

Comments

andrewrk commented Oct 22, 2020

tadeokondrak commented Oct 23, 2020

LemonBoy commented Oct 23, 2020

ikskuh commented Oct 23, 2020

michal-z commented Oct 23, 2020

Snektron commented Oct 23, 2020 • edited Loading

ikskuh commented Oct 23, 2020

LemonBoy commented Oct 23, 2020

Snektron commented Oct 23, 2020

LemonBoy commented Oct 23, 2020

ghost commented Oct 23, 2020

Rocknest commented Oct 23, 2020

SpexGuy commented Oct 23, 2020

ghost commented Oct 23, 2020

LemonBoy commented Oct 23, 2020

ghost commented Oct 23, 2020

ghost commented Oct 23, 2020

LemonBoy commented Oct 23, 2020

ghost commented Oct 23, 2020

Snektron commented Oct 23, 2020

ghost commented Oct 23, 2020

kyle-github commented Oct 23, 2020

ghost commented Oct 24, 2020

kyle-github commented Oct 24, 2020

ghost commented Dec 5, 2020

Snektron commented Dec 5, 2020

ghost commented Dec 16, 2020

pixelherodev commented Jun 27, 2021

lemaitre commented Jun 28, 2021

pixelherodev commented Jun 29, 2021

andrewrk commented Dec 19, 2021

haoyu234 commented Dec 20, 2021

InKryption commented Dec 21, 2021

kenaryn commented Dec 22, 2021 • edited Loading

nektro commented Dec 25, 2021

topolarity commented Jul 14, 2022 • edited Loading

ifreund commented Jul 14, 2022 • edited Loading

topolarity commented Jul 15, 2022 • edited Loading

xdBronch commented Jul 14, 2023

ghost commented Jul 17, 2023

expikr commented Oct 28, 2023 • edited Loading

Snektron commented Oct 29, 2023 • edited Loading

Snektron commented Oct 29, 2023 • edited Loading

expikr commented Oct 31, 2023 • edited Loading

type^dim (i.e. $\mathbb{R}^n$ , $\mathbb{Z}^n$ etc)

nektro commented Oct 31, 2023

RetroDev256 commented Oct 8, 2024 • edited Loading

Lucifer-02 commented Oct 10, 2024 • edited Loading

Snektron commented Oct 23, 2020 •

edited

Loading

kenaryn commented Dec 22, 2021 •

edited

Loading

topolarity commented Jul 14, 2022 •

edited

Loading

ifreund commented Jul 14, 2022 •

edited

Loading

topolarity commented Jul 15, 2022 •

edited

Loading

expikr commented Oct 28, 2023 •

edited

Loading

Snektron commented Oct 29, 2023 •

edited

Loading

Snektron commented Oct 29, 2023 •

edited

Loading

expikr commented Oct 31, 2023 •

edited

Loading

`type^dim` (i.e. $\mathbb{R}^n$ , $\mathbb{Z}^n$ etc)

RetroDev256 commented Oct 8, 2024 •

edited

Loading

Lucifer-02 commented Oct 10, 2024 •

edited

Loading