-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate SVE for 80bit load/stores when possible #4166
Conversation
Probably not a big win in practice. A single 80-bit store required 2 stores (64 + 16). Now, we require three instructions: mov + whilelt + st1b. You need three 80-bit stores in a block to get to a draw instruction-wise. The next step would be not to assemble the predicate register every time, which I will do next, but we'll still require in practice at least three stores per block for it to "win" instruction-wise. |
Converting to draft, so it's not merged by mistake. |
In preparation for FEX-Emu#4166 which should improve on these results.
106dc77
to
163f2a2
Compare
In preparation for FEX-Emu#4166 which should improve on these results.
3836479
to
4dff75c
Compare
In preparation for FEX-Emu#4166 which should improve on these results.
In preparation for FEX-Emu#4166 which should improve on these results.
4dff75c
to
e01a568
Compare
In preparation for FEX-Emu#4166 which should improve on these results.
In preparation for FEX-Emu#4166 which should improve on these results.
In preparation for FEX-Emu#4166 which should improve on these results.
In preparation for FEX-Emu#4166 which should improve on these results.
af05434
to
e0ccc9c
Compare
e0ccc9c
to
070e833
Compare
This is almost ready - the bit missing is really that the predicate register is not yet being cached. I thought RA would take care of this but... apparently there's still some magic missing. |
@Sonicadvance1 pushed changes with the predicate register caching. It's more general than what we need now but if we generate predicate registers from patterns for other purposes we can use the cache as for that as well. |
I will add the generation of SVE instructions to load/store x87 environment. |
1cf3405
to
be1467c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one super minor nit, but looks good overall
be1467c
to
8f8aa55
Compare
Actually this unfortunately is not worth it due to the fact that predicated SVE ldst seem to be more restrictive than 128bit ldst memory addressing. This means that we lose in address computation if we try to use this in fnsave and fnrstor. I have some work locally to add offsets to Load/Store MemPredicate and then fold Constants into InlineConstants in ConstProp into these and generate these loads/stores from FNSAVE and FNRSTOR but it turns out not to be worth it so I think this work is now complete. Thanks for the reviews. |
Fixes #4126