Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LV2 Optimizations #12378

Merged
merged 13 commits into from
Aug 7, 2022
Merged

LV2 Optimizations #12378

merged 13 commits into from
Aug 7, 2022

Conversation

elad335
Copy link
Contributor

@elad335 elad335 commented Jul 21, 2022

  • Reduce time spent while owning mutexes.
  • Avoid situations in which the notified thread gets blocked again on the same mutex the notifier thread still holds.
  • Add busy waiting before entering atomic wait.
  • Make LV2 synchronization syscalls allocation-free using a node system. (nearly)
  • Move memory unlocking outside of mutex ownership and make it conditional.
  • Make sys_mutex lock-free, add some busy waiting in sys_mutex_lock.
  • Make sys_lwmutex nearly lock-free.
  • Do not lock IDM mutex if the preliminary ID check failed.
  • Optimize shared_mutex for more than 2 concurrent threads.

In addition to improving performance for low theaded CPUs (8 threads or less), this pull request helps the performamce of more than 2 PPU threads (the most accurate value and the default yet may be slower than more threads).
This pr may improve loading times as well due to improved PPU performance.

Results of a test locking and unlocking 4 LV2 mutexes on 8 threads.
ppumain.self.zip

Master, 2 PPU threads: sample finished in 24.269772 seconds.
Master, 4 PPU threads: sample finished in 59.015042 seconds.
Pull request, 2 PPU threads: sample finished in 2.951544 seconds.
Pull request, 4 PPU threads: sample finished in 5.736856 seconds.

@elad335 elad335 marked this pull request as ready for review July 21, 2022 09:05
@elad335 elad335 force-pushed the savestates branch 2 times, most recently from 88b6204 to 44d295d Compare July 21, 2022 14:25
@elad335
Copy link
Contributor Author

elad335 commented Jul 21, 2022

The latest commit reduces the time a made-up test finishes by 4 times compared to master, all the test does is lock and unlock sys_mutex_t on multiple threads. Keep in mind this won't scale as nicely to games.

@elad335 elad335 changed the title LV2: Postpone thread notifications to afterward mutex ownership(s) [Optimization] LV2: Postpone thread notifications to afterward mutex ownership(s) Jul 21, 2022
@Megamouse Megamouse added the Optimization Optimizes existing code label Jul 21, 2022
@elad335 elad335 force-pushed the savestates branch 2 times, most recently from debd36c to 392584f Compare July 21, 2022 18:35
@elad335 elad335 changed the title [Optimization] LV2: Postpone thread notifications to afterward mutex ownership(s) [Optimization] LV2/SPU: Postpone thread notifications to afterward mutex ownership(s) Jul 21, 2022
@solarmystic
Copy link

solarmystic commented Jul 22, 2022

Performance remains roughly the same, with similar CPU utilization in the games tested below:-

GOW 3
This PR - 99 FPS/72 1% FPS/60 0.1% FPS - 54% CPU Usage
GOW3 12378

Current master - 99 FPS/72 1% FPS/59 0.1% FPS - 54% CPU Usage
GOW3 master

Persona 5
This PR - 80 FPS/66 FPS 1% FPS/58 0.1% FPS - 67% CPU Usage
P5 12378

Current master - 81 FPS/67 FPS 1% FPS/61 0.1% FPS - 67% CPU Usage
P5 master

Demon's Souls
This PR - 81 FPS/58 1% FPS/52 0.1% FPS - 33% CPU usage
DS 12378

Current master - 81 FPS/58 1% FPS/51 0.1% FPS - 34% CPU usage
DS master

Tested on an i5-12400F/1070ti system

@elad335
Copy link
Contributor Author

elad335 commented Jul 22, 2022

Same when increasing PPU threads count (debug tab)? Especially on low thread count cpus.

@elad335 elad335 marked this pull request as draft July 22, 2022 09:31
@elad335 elad335 marked this pull request as ready for review July 25, 2022 16:00
@elad335 elad335 force-pushed the savestates branch 3 times, most recently from 919cb02 to aa197e6 Compare July 25, 2022 17:15
@elad335
Copy link
Contributor Author

elad335 commented Jul 25, 2022

Added perf diff.

@elad335 elad335 changed the title [Optimization] LV2/SPU: Postpone thread notifications to afterward mutex ownership(s) LV2 Optimizations Jul 25, 2022
@elad335 elad335 force-pushed the savestates branch 6 times, most recently from f4e53d9 to 4f4682f Compare July 26, 2022 18:45
@Nekotekina
Copy link
Member

What's PS3 performance of the test? Also is any game found that benefits from it?

@elad335 elad335 marked this pull request as draft July 27, 2022 17:22
@elad335
Copy link
Contributor Author

elad335 commented Aug 7, 2022

Latest commit greatly narrows the gap between 4 PPU threads and 2. Keep in mind that this is a stress test, 4 PPU threads should still be faster in compatible games if you have enough threads. It also improves the preformance of other stuff using mutexes as well.

…teal the notifying bit

Add an unused has_waiters() method.
@Nekotekina Nekotekina merged commit 2ec0393 into RPCS3:master Aug 7, 2022
@Augusto7743
Copy link

Perhaps is possible see an better performance in low cpus (AMD FX).

@coolllman
Copy link

coolllman commented Aug 8, 2022

With this pr Infamous 2 not go ingame, freeze on loading
#12485

@Kravickas Kravickas mentioned this pull request Aug 8, 2022
elad335 added a commit to elad335/rpcs3 that referenced this pull request Aug 8, 2022
elad335 added a commit to elad335/rpcs3 that referenced this pull request Aug 9, 2022
elad335 added a commit to elad335/rpcs3 that referenced this pull request Aug 9, 2022
elad335 added a commit to elad335/rpcs3 that referenced this pull request Aug 9, 2022
elad335 added a commit to elad335/rpcs3 that referenced this pull request Aug 9, 2022
elad335 added a commit to elad335/rpcs3 that referenced this pull request Aug 9, 2022
elad335 added a commit to elad335/rpcs3 that referenced this pull request Aug 9, 2022
elad335 added a commit to elad335/rpcs3 that referenced this pull request Aug 9, 2022
elad335 added a commit to elad335/rpcs3 that referenced this pull request Aug 9, 2022
elad335 added a commit to elad335/rpcs3 that referenced this pull request Aug 9, 2022
elad335 added a commit to elad335/rpcs3 that referenced this pull request Aug 9, 2022
elad335 added a commit to elad335/rpcs3 that referenced this pull request Aug 9, 2022
@NFSROADCHALLENGE
Copy link

gran turismo 6 bces01893 ver 1.5. no controller works, try xbox and ps4 controller no one works :(

@Augusto7743
Copy link

Have anyone tested in Intel TSX cpu with an OS disabled all security mitigations ? (spectre, meltdown and etc)

elad335 added a commit to elad335/rpcs3 that referenced this pull request Aug 10, 2022
elad335 added a commit to elad335/rpcs3 that referenced this pull request Aug 10, 2022
elad335 added a commit to elad335/rpcs3 that referenced this pull request Aug 10, 2022
elad335 added a commit to elad335/rpcs3 that referenced this pull request Aug 11, 2022
@elad335 elad335 mentioned this pull request Aug 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Optimization Optimizes existing code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants