-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vulkan: Depalettize in shaders #10911
Conversation
Well, at least this did make it super easy to "fix" the performance of those games :) (Vulkan-only). Maybe should risk a merge after all... |
@unknownbrackets Just realized this partly reimplements #8246. Oh well :) |
Merging, but disabled for GL until 1.7.0. Can be easily disabled if there's trouble, too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, #8246 is more about taking a palette that was rendered, and using the GPU for regular CLUT textures. I think fixing it might mean reinterpreting the bits within the shader.
The hacks are a bit ugly, but the games have been unplayably slow for a while, so what are you going to do, I guess.
-[Unknown]
@@ -422,6 +428,66 @@ void VulkanQueueRunner::RunSteps(VkCommandBuffer cmd, const std::vector<VKRStep | |||
} | |||
} | |||
|
|||
void VulkanQueueRunner::ApplyMGSHack(std::vector<VKRStep *> &steps) { | |||
// We want to turn a sequence of copy,render(1),copy,render(1),copy,render(1) to copy,copy,copy,render(n). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory, since we render bind with UV, we may be able to detect if these copies are non-overlapping and do this in a general case, right?
-[Unknown]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it can definitely be done, might at some point. Not sure how many games benefit though, and it's more work :)
@@ -244,4 +256,7 @@ class VulkanQueueRunner { | |||
VkDeviceMemory readbackMemory_ = VK_NULL_HANDLE; | |||
VkBuffer readbackBuffer_ = VK_NULL_HANDLE; | |||
VkDeviceSize readbackBufferSize_ = 0; | |||
|
|||
// TODO: Enable based on compat.ini. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Outdated TODO?
-[Unknown]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed.
@@ -488,6 +495,84 @@ void VulkanQueueRunner::ApplyMGSHack(std::vector<VKRStep *> &steps) { | |||
} | |||
} | |||
|
|||
void VulkanQueueRunner::ApplySonicHack(std::vector<VKRStep *> &steps) { | |||
// We want to turn a sequence of render(3),render(1),render(6),render(1),render(6),render(1),render(3) to | |||
// render(1), render(1), render(1), render(6), render(6), render(6) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is definitely more gross, heh. Reminds me of a certain other emulator's "skip codes"...
-[Unknown]
I checked a few games and indeed this is looking good. In most games it's just slightly faster, but that's still a very good thing. Of course, I assume anisotropic filtering won't apply but that seems likely fine for the use case. Anyway, I did notice one thing: 3rd Birthday had glitches that were unexpectedly fixed. Only 1/4 of the screen or so had the overlay. Fixed is great, but it seems to be a state issue because when I stepped through the bloom manually it fixed it, permanently. I wonder if it's related to #10634. -[Unknown] |
Should speed things up a little bit in some cases (and a lot in some others), we save a render pass (and switches) at the cost of a rather heavy pixel shader (so in theory could be loss in some high resolution cases).
Sonic Rivals is like four times faster with this. Unfortunately it does still do bad things preventing the full impact of this optimization, it switches to another seemingly unused framebuffer and does a draw many times per frame.
I thought this would also massively improve Metal Gear Acid 2, but it does this using shader blending in small chunks which forces us to copy the framebuffer before each draw, and to do each copy we must end the renderpass anyway so there's still a ton of passes.
To solve these we might have to post process the command stream in the queue runner to reorder things in a sane way.
EDIT: Did just that! MGS Acid 2 and Sonic Rivals are now fast. See #10908 .
Note: This approach can be easily ported to GLES 3.