Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reason for vstart≥vl requiring undisturbed tail elements even with ta vtype #1715

Open
dzaima opened this issue Nov 7, 2024 · 1 comment

Comments

@dzaima
Copy link

dzaima commented Nov 7, 2024

The vector specification requires that, when vstart≥vl, which includes vl=0, ~all operations do not disturb tail elements, even is tail-agnostic is set.

This to me seems like a rather odd requirement, especially considering that it forces vl=0 to be very special for register-renaming implementations. (I am not a hardware designer, but I did notice one repo having a good amount of commits fixing vl=0)

Is there some significant benefit to software from this (as compared to allowing tail elements to be replaced with all-1s if ta if hardware so wants)? I can't come up with any for the general vstart≥vl case, considering that vstart is intended to only be non-zero when restoring from a previously-interrupted instruction, which already could've thrashed the tail.

There are some cases that are possible for software to somewhat-reasonably meaningfully rely on (..primarily only to work around those instructions not working as desired at vl=0 but whatever..) - namely, reductions and vmv.s.x - that are perhaps too late to relax, but, if desired, I feel like it wouldn't be too unreasonable to relax everything else even now (especially considering that software has already been rather severely misled on RVV in a different aspect).

Of note is that the C/C++ RVV intrinsics have their own relaxed behavior on agnostic elements, which I believe means that they would be unaffected by the change, even reductions and vmv.s.x (those two don't even have a destination input outside of explicit tu).

(reductions still having a false dependency on their destination wouldn't be particularly nice, but not catastrophic, considering that the common vd==vs1 usage isn't affected; vmv.s.x is worse off though. Perhaps an option would be allowing those (or just vmv.s.x) to either set the first element as either the old value or the newly-calculated one, thus preserving all existing hardware remaining compliant, while allowing unconditional register renaming when ta for the future, while not affecting any software use-cases that I can think of; anyway, I'm not one to request a spec change (that'd be those actually making OoO vector hardware if they have design conditions where this is actually problematic), my primary question is really just what reason is there for the strictness in the first place)

@gfavor
Copy link
Collaborator

gfavor commented Nov 8, 2024

Since this question is very much a question about the architecture and a particular architectural design choice that was made (and not a question about some mistake or ambiguity in the arch spec), this question should instead be posted to the [email protected] and/or [email protected] email lists.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants