-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FEXCore: Add non-atomic Memcpy and Memset IR fast paths #3478
Conversation
Seems as if this is broken only for 32 bit code |
Ready for review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This causes a crash in backwards stos/movs benchmark that needs to be fixed.
https://gist.github.com/Sonicadvance1/d893db3dd2c96f272f8639b4b24da234 for the stos memset bench.
This unit test recreates the error condition that FEX-Emu#3478 causes. With a string operation that is a backwards copy then the optimization will read past the end of the page and result in a crash. Seemingly only happens with backwards string operations, but test forward and backward in this test.
unittests/ASM: Implements a unit test for #3478
When TSO is disabled, vector LDP/STP can be used for a two instruction 32 byte memory copy which is significantly faster than the current byte-by-byte copy. Performing two such copies directly after oneanother also marginally increases copy speed for all sizes >=64.
Fixed the issue with backwards copies |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a bunch of testing on this and couldn't find a regression.
Seems like pretty much entirely a win. Good job!
Caused by FEX-Emu#3478 This was missed in the review that it could cause issues. bylaws already has a fix incoming that will get this unit test working.
Caused by FEX-Emu#3478 This was missed in the review that it could cause issues. bylaws already has a fix incoming that will get this unit test working.
Caused by FEX-Emu#3478 This was missed in the review that it could cause issues. bylaws already has a fix incoming that will get this unit test working.
When TSO is disabled, vector LDP/STP can be used for a two instruction 32 byte memory copy which is significantly faster than the current byte-by-byte copy. Performing two such copies directly after oneanother also marginally increases copy speed for all sizes >=64.
Before:
After: