Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate SVE for 80bit load/stores when possible #4166

Merged
merged 5 commits into from
Dec 6, 2024

Conversation

pmatos
Copy link
Collaborator

@pmatos pmatos commented Nov 20, 2024

Fixes #4126

@pmatos
Copy link
Collaborator Author

pmatos commented Nov 20, 2024

Probably not a big win in practice. A single 80-bit store required 2 stores (64 + 16). Now, we require three instructions: mov + whilelt + st1b. You need three 80-bit stores in a block to get to a draw instruction-wise. The next step would be not to assemble the predicate register every time, which I will do next, but we'll still require in practice at least three stores per block for it to "win" instruction-wise.

@pmatos
Copy link
Collaborator Author

pmatos commented Nov 20, 2024

Converting to draft, so it's not merged by mistake.

@pmatos pmatos marked this pull request as draft November 20, 2024 09:55
pmatos added a commit to pmatos/FEX that referenced this pull request Nov 22, 2024
In preparation for FEX-Emu#4166 which should improve on these results.
@pmatos pmatos force-pushed the HostFeaturesInPass branch from 106dc77 to 163f2a2 Compare November 22, 2024 13:32
@pmatos pmatos changed the title Generate SVE for 80bit stores when possible Generate SVE for 80bit load/stores when possible Dec 1, 2024
pmatos added a commit to pmatos/FEX that referenced this pull request Dec 1, 2024
In preparation for FEX-Emu#4166 which should improve on these results.
pmatos added a commit to pmatos/FEX that referenced this pull request Dec 1, 2024
@pmatos pmatos force-pushed the HostFeaturesInPass branch from 3836479 to 4dff75c Compare December 1, 2024 11:54
pmatos added a commit to pmatos/FEX that referenced this pull request Dec 1, 2024
In preparation for FEX-Emu#4166 which should improve on these results.
pmatos added a commit to pmatos/FEX that referenced this pull request Dec 1, 2024
In preparation for FEX-Emu#4166 which should improve on these results.
pmatos added a commit to pmatos/FEX that referenced this pull request Dec 1, 2024
@pmatos pmatos force-pushed the HostFeaturesInPass branch from 4dff75c to e01a568 Compare December 1, 2024 11:59
pmatos added a commit to pmatos/FEX that referenced this pull request Dec 1, 2024
In preparation for FEX-Emu#4166 which should improve on these results.
pmatos added a commit to pmatos/FEX that referenced this pull request Dec 2, 2024
In preparation for FEX-Emu#4166 which should improve on these results.
pmatos added a commit to pmatos/FEX that referenced this pull request Dec 2, 2024
In preparation for FEX-Emu#4166 which should improve on these results.
pmatos added a commit to pmatos/FEX that referenced this pull request Dec 2, 2024
In preparation for FEX-Emu#4166 which should improve on these results.
pmatos added a commit to pmatos/FEX that referenced this pull request Dec 2, 2024
@pmatos pmatos force-pushed the HostFeaturesInPass branch from af05434 to e0ccc9c Compare December 2, 2024 14:03
pmatos added a commit to pmatos/FEX that referenced this pull request Dec 2, 2024
@pmatos pmatos force-pushed the HostFeaturesInPass branch from e0ccc9c to 070e833 Compare December 2, 2024 17:41
@pmatos pmatos marked this pull request as ready for review December 2, 2024 17:50
@pmatos
Copy link
Collaborator Author

pmatos commented Dec 2, 2024

This is almost ready - the bit missing is really that the predicate register is not yet being cached. I thought RA would take care of this but... apparently there's still some magic missing.

@pmatos
Copy link
Collaborator Author

pmatos commented Dec 3, 2024

@Sonicadvance1 pushed changes with the predicate register caching. It's more general than what we need now but if we generate predicate registers from patterns for other purposes we can use the cache as for that as well.

@pmatos
Copy link
Collaborator Author

pmatos commented Dec 3, 2024

I will add the generation of SVE instructions to load/store x87 environment.

@pmatos pmatos force-pushed the HostFeaturesInPass branch from 1cf3405 to be1467c Compare December 4, 2024 09:16
Copy link
Member

@Sonicadvance1 Sonicadvance1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Copy link
Contributor

@lioncash lioncash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one super minor nit, but looks good overall

FEXCore/Source/Interface/Core/JIT/MemoryOps.cpp Outdated Show resolved Hide resolved
@pmatos pmatos force-pushed the HostFeaturesInPass branch from be1467c to 8f8aa55 Compare December 6, 2024 09:15
@pmatos
Copy link
Collaborator Author

pmatos commented Dec 6, 2024

I will add the generation of SVE instructions to load/store x87 environment.

Actually this unfortunately is not worth it due to the fact that predicated SVE ldst seem to be more restrictive than 128bit ldst memory addressing. This means that we lose in address computation if we try to use this in fnsave and fnrstor.

I have some work locally to add offsets to Load/Store MemPredicate and then fold Constants into InlineConstants in ConstProp into these and generate these loads/stores from FNSAVE and FNRSTOR but it turns out not to be worth it so I think this work is now complete.

Thanks for the reviews.

@Sonicadvance1 Sonicadvance1 merged commit 8427731 into FEX-Emu:main Dec 6, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

80-bit x87 Loadstores can use SVE masked loadstores
3 participants