You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The vector specification requires that, when vstart≥vl, which includes vl=0, ~all operations do not disturb tail elements, even is tail-agnostic is set.
This to me seems like a rather odd requirement, especially considering that it forces vl=0 to be very special for register-renaming implementations. (I am not a hardware designer, but I did notice onerepohavingagoodamountof commits fixing vl=0)
Is there some significant benefit to software from this (as compared to allowing tail elements to be replaced with all-1s if ta if hardware so wants)? I can't come up with any for the general vstart≥vl case, considering that vstart is intended to only be non-zero when restoring from a previously-interrupted instruction, which already could've thrashed the tail.
There are some cases that are possible for software to somewhat-reasonably meaningfully rely on (..primarily only to work around those instructions not working as desired at vl=0 but whatever..) - namely, reductions and vmv.s.x - that are perhaps too late to relax, but, if desired, I feel like it wouldn't be too unreasonable to relax everything else even now (especially considering that software has already been rather severely misled on RVV in a different aspect).
Of note is that the C/C++ RVV intrinsics have their own relaxed behavior on agnostic elements, which I believe means that they would be unaffected by the change, even reductions and vmv.s.x (those two don't even have a destination input outside of explicit tu).
(reductions still having a false dependency on their destination wouldn't be particularly nice, but not catastrophic, considering that the common vd==vs1 usage isn't affected; vmv.s.x is worse off though. Perhaps an option would be allowing those (or just vmv.s.x) to either set the first element as either the old value or the newly-calculated one, thus preserving all existing hardware remaining compliant, while allowing unconditional register renaming when ta for the future, while not affecting any software use-cases that I can think of; anyway, I'm not one to request a spec change (that'd be those actually making OoO vector hardware if they have design conditions where this is actually problematic), my primary question is really just what reason is there for the strictness in the first place)
The text was updated successfully, but these errors were encountered:
Since this question is very much a question about the architecture and a particular architectural design choice that was made (and not a question about some mistake or ambiguity in the arch spec), this question should instead be posted to the [email protected] and/or [email protected] email lists.
The vector specification requires that, when
vstart≥vl
, which includesvl=0
, ~all operations do not disturb tail elements, even is tail-agnostic is set.This to me seems like a rather odd requirement, especially considering that it forces
vl=0
to be very special for register-renaming implementations. (I am not a hardware designer, but I did notice one repo having a good amount of commits fixing vl=0)Is there some significant benefit to software from this (as compared to allowing tail elements to be replaced with all-
1
s ifta
if hardware so wants)? I can't come up with any for the generalvstart≥vl
case, considering thatvstart
is intended to only be non-zero when restoring from a previously-interrupted instruction, which already could've thrashed the tail.There are some cases that are possible for software to somewhat-reasonably meaningfully rely on (..primarily only to work around those instructions not working as desired at
vl=0
but whatever..) - namely, reductions andvmv.s.x
- that are perhaps too late to relax, but, if desired, I feel like it wouldn't be too unreasonable to relax everything else even now (especially considering that software has already been rather severely misled on RVV in a different aspect).Of note is that the C/C++ RVV intrinsics have their own relaxed behavior on agnostic elements, which I believe means that they would be unaffected by the change, even reductions and
vmv.s.x
(those two don't even have a destination input outside of explicittu
).(reductions still having a false dependency on their destination wouldn't be particularly nice, but not catastrophic, considering that the common
vd==vs1
usage isn't affected;vmv.s.x
is worse off though. Perhaps an option would be allowing those (or justvmv.s.x
) to either set the first element as either the old value or the newly-calculated one, thus preserving all existing hardware remaining compliant, while allowing unconditional register renaming whenta
for the future, while not affecting any software use-cases that I can think of; anyway, I'm not one to request a spec change (that'd be those actually making OoO vector hardware if they have design conditions where this is actually problematic), my primary question is really just what reason is there for the strictness in the first place)The text was updated successfully, but these errors were encountered: