Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vulkan: Depalettize in shaders #10911

Merged
merged 8 commits into from
Apr 13, 2018
Merged

Vulkan: Depalettize in shaders #10911

merged 8 commits into from
Apr 13, 2018

Conversation

hrydgard
Copy link
Owner

@hrydgard hrydgard commented Apr 13, 2018

Should speed things up a little bit in some cases (and a lot in some others), we save a render pass (and switches) at the cost of a rather heavy pixel shader (so in theory could be loss in some high resolution cases).

Sonic Rivals is like four times faster with this. Unfortunately it does still do bad things preventing the full impact of this optimization, it switches to another seemingly unused framebuffer and does a draw many times per frame.

I thought this would also massively improve Metal Gear Acid 2, but it does this using shader blending in small chunks which forces us to copy the framebuffer before each draw, and to do each copy we must end the renderpass anyway so there's still a ton of passes.

To solve these we might have to post process the command stream in the queue runner to reorder things in a sane way.

EDIT: Did just that! MGS Acid 2 and Sonic Rivals are now fast. See #10908 .

Note: This approach can be easily ported to GLES 3.

@hrydgard hrydgard changed the title Depalettize in shaders Vulkan: Depalettize in shaders Apr 13, 2018
@hrydgard hrydgard added this to the v1.7.0 milestone Apr 13, 2018
@hrydgard
Copy link
Owner Author

hrydgard commented Apr 13, 2018

Well, at least this did make it super easy to "fix" the performance of those games :) (Vulkan-only). Maybe should risk a merge after all...

@hrydgard
Copy link
Owner Author

@unknownbrackets Just realized this partly reimplements #8246. Oh well :)

@hrydgard
Copy link
Owner Author

Merging, but disabled for GL until 1.7.0. Can be easily disabled if there's trouble, too.

@hrydgard hrydgard merged commit 21b2cbc into master Apr 13, 2018
@hrydgard hrydgard deleted the shader-depal branch April 13, 2018 19:22
@hrydgard hrydgard modified the milestones: v1.7.0, v1.6.0 Apr 13, 2018
Copy link
Collaborator

@unknownbrackets unknownbrackets left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, #8246 is more about taking a palette that was rendered, and using the GPU for regular CLUT textures. I think fixing it might mean reinterpreting the bits within the shader.

The hacks are a bit ugly, but the games have been unplayably slow for a while, so what are you going to do, I guess.

-[Unknown]

@@ -422,6 +428,66 @@ void VulkanQueueRunner::RunSteps(VkCommandBuffer cmd, const std::vector<VKRStep
}
}

void VulkanQueueRunner::ApplyMGSHack(std::vector<VKRStep *> &steps) {
// We want to turn a sequence of copy,render(1),copy,render(1),copy,render(1) to copy,copy,copy,render(n).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory, since we render bind with UV, we may be able to detect if these copies are non-overlapping and do this in a general case, right?

-[Unknown]

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it can definitely be done, might at some point. Not sure how many games benefit though, and it's more work :)

@@ -244,4 +256,7 @@ class VulkanQueueRunner {
VkDeviceMemory readbackMemory_ = VK_NULL_HANDLE;
VkBuffer readbackBuffer_ = VK_NULL_HANDLE;
VkDeviceSize readbackBufferSize_ = 0;

// TODO: Enable based on compat.ini.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Outdated TODO?

-[Unknown]

Copy link
Owner Author

@hrydgard hrydgard Apr 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed.

@@ -488,6 +495,84 @@ void VulkanQueueRunner::ApplyMGSHack(std::vector<VKRStep *> &steps) {
}
}

void VulkanQueueRunner::ApplySonicHack(std::vector<VKRStep *> &steps) {
// We want to turn a sequence of render(3),render(1),render(6),render(1),render(6),render(1),render(3) to
// render(1), render(1), render(1), render(6), render(6), render(6)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is definitely more gross, heh. Reminds me of a certain other emulator's "skip codes"...

-[Unknown]

@unknownbrackets
Copy link
Collaborator

I checked a few games and indeed this is looking good. In most games it's just slightly faster, but that's still a very good thing. Of course, I assume anisotropic filtering won't apply but that seems likely fine for the use case.

Anyway, I did notice one thing: 3rd Birthday had glitches that were unexpectedly fixed. Only 1/4 of the screen or so had the overlay.

Fixed is great, but it seems to be a state issue because when I stepped through the bloom manually it fixed it, permanently. I wonder if it's related to #10634.

-[Unknown]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants