Skip to content

VROT Vector Rotate

AndyGlew edited this page Jul 23, 2020 · 3 revisions

Vector rotate instruction(s) summary

vrot.vv vd,vs2,vs1,m

No: .vi, .vs, .vx

Maybe: .vv8 or .vv16 - where the rotate count is a fixed width, not SEW

vector-vector single width rotate

vrot.vv vd,vs2,vs1,vm

Functional notation

I'll use my notation, since the V-spec doesn't consistently use a notation to describe functionality. (Eventually this will be SAIL, but need the V-spec in SAIL first).

vd.eew[i] := trunc.eew(concat(vs2.eew[i],vs2.eew[i]).eew2x) >> vs1.eew[i] if vm[i] else vd.eew[i]

where EEW = the effective element width. (I'm being explicit about EEW here, because some of the variants below might differ in their handling of SEW and EEW.)

If you don't like my notation, then just vd[i] := (vm[i]? concat(vs2[i],vs2[i]) >> vs1[i] : vd[i])

if you don't like the concat notation, then rotate(vs2[i],vs1[i]). (I'm using the concat notation because it is convenient for some other instructions like funnel shift. But these are not crypto.)

TBD: C, SAIL

TBD: V spec says that it is an implementation option as to whether mask 0 leaves unchanged or zeroes. I'll leave that out for now.

Other forms - IMHO optional

vector-scalar/imm single width rotate

GLEW OPINION: not really worth spending the instruction encoding space on.

# vector-scalar single width rotate
# vd[i] = concat(vs2[i],vs2[i]) >> rs1 if vm[i] else unchanged
vrot.vx vd,vs2,rs1,vm

# vector-imm single width rotate
# vd[i] = concat(vs2[i],vs2[i]) >> (imm & 0x7F) if vm[i] else unchanged
vrot.vx vd,vs2,rs1,vm

imm & 0x7F so that it works up to 128-bit

vector-vector mixed width rotate

This is motivated by the observation that the rotation count really doesn't need to be a full SEW/EEW wide. the upper bits are just plain wasted.

The only hard part about this is that the RISC-V vector spec's notation for mixed width is clumsy - it is suitable for 2X and 4X narrowing, but as far as I know there is no notation to say interpret this vector is a vector of 8 or 16 bit elements. So I will invent such a notation.

# EEW vector - 8b vector rotate
vrot.vv8 vd,vs2,vs1,vm
vd.eew[i] = concat(vs2.eew[i],vs2.eew[i]) >> vs1.8[i] if vm[i] ...

i.e. vs2 is Interpreted as a vector of whatever is the effective element width, in the normal manner taking into account SEW and LMUL. While vs1 is interpreted as a vector of byte elements, no matter what the current

basically, the vs1 that you would get from

vsetvli a0, a0, e8,m1  # Byte vector; m = Whatever the current LMUL is
vle8.v v1,(a1)         # where memory at (a1) contains a byte vector, equivalent to C uint8_t rotcnt[VLEN/8]

Similarly, 16, 32...

vrot.vv16 vd,vs2,vs1,vm
vrot.vv32 vd,vs2,vs1,vm

RATIONALE:

IMHO there is little value in this being a widening or narrowing instruction where the widening of narrowing is expressed relative to the current SEW/EEW. relative widening/narrowing is useful for calculations that require guard bits. Here, we don't need guard bits, we just want to save space. Specifically, space in the vector register file.

I think that single width SEW/EEW rotate count is the most "natural" extension to the current vector specification, even though it is the least efficient with respect to space.

8-bit byte vector element rotate count is the most space efficient, supporting rotations of elements up to 256 bits, larger than we need.

Any intermediate vector element rotate counts are gravy - nice to have, but not really needed.

Similarly, scalar or immediate rotate counts are nice to have, but are not really necessary. So if there is any chance at all that we want to conserve vector instruction encoding space, we should not have them.

PRIORITIES:

vrot.vv
    EEW=XLEN, 128,64,32
           16, 8,  nice but not necessary
vrot.vv8
    same EEW/SEW values
vrot.v16
vrot.v32
vrot.vi
vrot.vx 
Clone this wiki locally