-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vk: Rework async texture uploads #12389
Conversation
This reintroduces some bugs already fixed. #9968 |
Fixes performance issue with Async in Killzone 3. Performance is better as well However introduces visual issue with cutscenes and ingame |
- Use CONCURRENT queue access instead of fighting with queue acquire/release via submit chains. The minor benefits of forcing EXCLUSIVE mode are buried under the huge penalty of multiple vkQueueSubmit. Batching submits does not help alleviate this situation. We simply must avoid interrupting execution.
- Fix up flush sequence in DMA handling (WCB) - Do not request resource sharing if queue family is not different!
Reimplemented to just use concurrent access. This usually disables some optimizations like DCC but it works just fine. There is a small penalty but nothing compared to the submit hell that was there before. |
Graphics fixed in latest commit, but performance seems to have gone down a bit. |
Hmm. I thought maybe it was just pascal but it seems all nvidia cards have this impact. I'll try to tune it a bit more, but it's fine if it performs as well as async off. Anything as long as performance is not worse. |
@kd-11 Someone is claiming this PR broke Code Veronica X for them. i.e. last working build 0.0.23-13964 This is their log, including a fatal rsx::thread related crash. They had some bad settings but claimed it still happened even after fixing them. it seems potentially legit. |
That game works fine for me on both AMD and NVIDIA using latest master, so I cannot reproduce the error to fix it. I'll need more information including steps required to make it break including savegames and/or screen recordings showing steps needed to recreate the crash. |
The purpose of flushing the CB for every upload was to improve parallelism, but it triggers a driver bottleneck handling queue submits. There are much better ways to optimize job parallelism if it is indeed required.
Performance now matches the 'fast' mode on AMD and I can see a healthy uptick in performance on NVIDIA hardware as well, accompanies by lower GPU usage (yes, this is a good thing) and lower RSX latency when async streaming is enabled.
If there are any regressions, I have several ideas on how to handle job scheduling to avoid building long dependent chains. Ideally this shouldn't matter too much as a long tail on the leading submit actually improves parallelism with the next leading sequence.
Tested on AMD 22.5.1 and NVIDIA 512.15.
Fixes #11707