-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TESTERS NEEDED AGAIN] SPU: PUTLLC16 Optimization, SPU Analyzer capabilities upgrade #15429
Conversation
Also, please put the spu stuff in another PR than all the progress stuff. |
b3b71fc
to
b72bfb1
Compare
rpcs3/Emu/System.cpp
Outdated
fs::file to_close_file; | ||
{ | ||
auto reset = init_mtx->reset(); | ||
to_close_file = std::move(file.file); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't forget to reset file.file after move
dd707ec
to
9041e98
Compare
Can someone retest this? I pushed many changes. |
I could give it a try, do you recommend some games ? |
Metal Gear Online hangs on building the SPU cache. I'm also forced to close RPCS3 via the task manager. Log contains a bunch of these: F {SPU Worker 7} SIG: Thread terminated due to fatal error: Verification failed
(in file F:\rpcs3\rpcs3\Emu\Cell\SPUCommonRecompiler.cpp:6968[:7], in function evaluate_start_state) as well as S SPU: PUTLLC16 Pattern Detected! (put_pc=0x6814, is_pc_rel=0, offset=0x0, is_const=0, Gd63vsaR9xJkYQH5C22uKCF6tXbR) (putllc0=0, putllc16+0=37, all=38) |
112b952
to
b3e3dc0
Compare
Do you want some test ? |
Sure. |
On uncharted, the game crash after I make a new game in the menu |
Relaxed zcull causes the game to crash, overall you are using pretty weird settings so set them back to default and only use the wiki ones |
It's what I did, it's option on the wiki and I didn't change anything else |
Relaxed zcull and accurate xfloat are not listed on game wiki page lol |
It's default option I have when I load it I give a retry, I delete the rpcs3 folder so I restart from zero, and now the game run, but the audio still doesn't work, Zcull has nothing to do with it. |
Also, do not check LR event if already raised in PUTLLC
Identify them using their unique error codes.
Any other issues? |
YOLO |
Tested with 2 games in an FX-6300 6 core using default settings with accurate rsx reservations. |
Need to note that from now on PUTLLC16 optimization is disabled with RSX reservations. |
It's been a few days since I lasted played GT5, and I must disclose I use the GT5 online mod, nevertheless, I had put quite a few hours without a problem, but one of these latest updates absolutely broke it. RPCS3 config: PC SPECS: I also tested for a couple minutes Skate3 but it had no problem. |
I need logs and proper report |
How to replicate my crash: Logs and game config attached below. |
For a while, I had a few complex SPU optimizations in mind. One of which was the "PUTLLC16" loop optimization (see #8703)
The concept itself was great, detect atomic loops in SPU code which only update 16 bytes of data at maximum in order to bridge between atomic operation capacity of X86 and ARM which 16 bytes between the CELLBE's SPU architecture's capacity which is a whopping 128 bytes.
So in theory, if we can analyse the code to detect when it is possible the atomic loop to update 16 bytes only (about a third of all SPU atomic loops in games are coded this way), the performance of that code would increase dramatically (especially on non-TSX CPUs for which the implementation is slower compared to TSX). But, as I started implementing analysis for detection of this pattern across a variaty of code from games, things started to entangle and many hacks were put in the original pull request in order to support as many code variations as possible for different code flows (mainly for single backward loops and single forward if inside tge atomic update). But, this is both hacky and less valueable than equiping the SPU analyzer with cross-block analysis, allowing more optimizations deriving from it in the future and detection of all possible 16-byte atomic loops cases,
But this was no simple task, as the underline algorighm was difficult as hell to resolve it took me a whole year to do it.
It was worth it though.
Please test performance of games, the difference would probably not be huge but noticeable in titles that have gaps betwseen TSX and non-TSX CPUs.
Significant performance improvements have been noted in Red Dead Redemption, Spider-Man Web Of Shadows, Metal Gear Solid 4 and Metal Gear Solid Online. Do note that changes are CPU subjective.
What to expect and test:
Example of a simple SPU atomic loop with only 16 bytes of the reservation modified (notice how both STQR and LQR address the same offset and no other store/load types are used):