Vulkan: Write descriptor sets on the render thread #18328

hrydgard · 2023-10-09T11:56:18Z

Very measurable speed boost even on PC, but even more so on for example PowerVR - on one test device, it really smoothed Tekken 6 out (well, except the shader compiles...)

Not super optimized so far (could do batching) and does the simplest possible de-duplication of, but getting work off the main thread is a win.

unknownbrackets · 2023-10-10T05:13:09Z

Common/GPU/Vulkan/VulkanDescSet.cpp

@@ -68,14 +69,17 @@ VkDescriptorSet VulkanDescSetPool::Allocate(int n, const VkDescriptorSetLayout *
 		result = vkAllocateDescriptorSets(vulkan_->GetDevice(), &descAlloc, &desc);
 		_assert_msg_(result == VK_SUCCESS, "Ran out of descriptor space (frag?) and failed to allocate after recreating a descriptor pool. res=%d", (int)result);
 	}
+	usage_++;


There's another one just below. Intentional to have two? Seems probably better in the other place.

-[Unknown]

Oh, this was a merge between two branches where I did the same fix in both..

unknownbrackets · 2023-10-10T05:15:17Z

Common/GPU/Vulkan/VulkanQueueRunner.cpp

@@ -1336,7 +1338,8 @@ void VulkanQueueRunner::PerformRenderPass(const VKRStep &step, VkCommandBuffer c

 		case VKRRenderCommand::DRAW_INDEXED:
 			if (pipelineOK) {
-				vkCmdBindDescriptorSets(cmd, VK_PIPELINE_BIND_POINT_GRAPHICS, pipelineLayout, 0, 1, &c.drawIndexed.ds, c.drawIndexed.numUboOffsets, c.drawIndexed.uboOffsets);
+				VkDescriptorSet set = (*descSets)[c.drawIndexed.descSetIndex].set;


Maybe deserves a _dbg_assert_(set != VK_NULL_HANDLE); here as well, just for symmetry and to ensure no false sense of security when skimming the code?

-[Unknown]

unknownbrackets · 2023-10-10T05:19:12Z

Common/GPU/Vulkan/VulkanRenderManager.cpp

@@ -1462,6 +1477,11 @@ void VulkanRenderManager::Run(VKRRenderThreadTask &task) {
 	}
 	frameData.SubmitPending(vulkan_, FrameSubmitType::Pending, frameDataShared_);

+	// Flush descriptors.
+	double descStart = time_now_d();
+	FlushDescriptors(task.frame);


In theory, we could create these on a separate thread even, and wait for them, somewhat like shaders (of course, they'd need to be continuous, so this would only help if it could complete between the time drawing was enqueued and when the frame was picked up by the renderer.)

Not sure if the thread waits would definitely be worth it, but if this was a decent win, most devices nowadays have more cores than we'll ever be able to effectively use (I've been meaning to move atrac to a thread, for this reason - even if it only helps a tiny bit, I think it should help... does actually show up in profiles these days.)

-[Unknown]

Yep, I thought about that, but it doesn't really seem necessary - there's usually plenty of time left to do these on the render thread without slowing anything down, even on really slow devices. I suppose it slightly increases latency to do them in a big block before submitting draws.. But it might be worth experimenting with - the more parallelism, the more this crowd of CPUs can clock down...

unknownbrackets · 2023-10-10T05:22:05Z

Common/GPU/Vulkan/VulkanRenderManager.cpp

@@ -1625,10 +1645,153 @@ VKRPipelineLayout *VulkanRenderManager::CreatePipelineLayout(BindingType *bindin

 	vulkan_->SetDebugName(layout->descriptorSetLayout, VK_OBJECT_TYPE_DESCRIPTOR_SET_LAYOUT, tag);
 	vulkan_->SetDebugName(layout->pipelineLayout, VK_OBJECT_TYPE_PIPELINE_LAYOUT, tag);
+
+	for (int i = 0; i < VulkanContext::MAX_INFLIGHT_FRAMES; i++) {
+		layout->frameData[i].pool.Create(vulkan_, bindingTypes, (uint32_t)bindingTypesCount, 512);


Is 512 enough? Just asking because we've had to raise these numbers before. I suppose it's for each pipeline... not really sure what the cost is here of increasing it.

Well, maybe it was 512 before...

-[Unknown]

It's not for each pipeline, it's for each pipeline layout, and we only use two - one for UI and one for in-game.

1024 might be a more reasonable starting point, a lot of games end up there, while much fewer ones go all the way to 2048 and beyond (basically GTA).

unknownbrackets · 2023-10-10T05:27:33Z

Common/UI/Context.cpp

@@ -83,8 +83,6 @@ void UIContext::BeginPipeline(Draw::Pipeline *pipeline, Draw::SamplerState *samp
 	// Also clear out any other textures bound.
 	Draw::SamplerState *samplers[3]{ samplerState };
 	draw_->BindSamplerStates(0, 3, samplers);
-	Draw::Texture *textures[2]{};
-	draw_->BindTextures(1, 2, textures);


Hm, what was the reason for removing these? I feel like we've hit errors and problems when bad things are left bound, so I think this was intended to wipe it. Note that this is clearing slots 1 and 2, and RebindTexture() is setting slot 0.

Maybe it's not needed on every BeginPipeline or something, but I'm assuming it was added to fix some error...

-[Unknown]

Oh I had some wacky bugs initially by using the wrong offset in the descriptor writer, this was meant to be a temporary change, oops..

unknownbrackets · 2023-10-10T05:30:52Z

Common/GPU/Vulkan/VulkanRenderManager.cpp

+
+		// This is where we look up to see if we already have an identical descriptor previously in the array.
+		// We could do a simple custom hash map here that doesn't handle collisions, since false positives aren't too bad.
+		// Instead, for now we just check history one item backwards. Good enough, it seems.


Yeah, I'd guess it being a match to the one previous happens a ton. The next most likely would be the one before the most recent string of them (which I think we'd need a local var to track.) But that only happens in some games, where they'll flip flop between textures. But not sure if that happens enough to be worth it...

-[Unknown]

unknownbrackets · 2023-10-10T05:35:29Z

Common/GPU/Vulkan/VulkanRenderManager.cpp

+			}
+		}
+
+		// TODO: Allocate in batches.


How bad is it to overallocate? It'd probably be relatively easy to allocate just in chunks of say 8 or something, exhaust and refill. Or could just estimate a fraction of descSets.size() at the start, and fall back to one-at-a-time if it's somehow over the limit.

Wonder if there's benefit or downside to putting more distance between the allocation and writing (i.e. do we want all writes for the small batches to happen right after the alloc, or better to allocate everything, then do writes in batched chunks...) Well, just pondering, I'm sure the answer is virtually unknowable with all the drivers out there...

-[Unknown]

Yeah, small batch allocations might be the way to go just to reduce function call overhead to basically nothing.

Another variant advocated by ARM for cases like this where we start over every frame, is to simply don't reset and allocate, just stomp over and reuse your already allocated descriptor sets.

Many things to try, but now that this is no longer on the main thread, it just doesn't seem to matter hugely and as you say it's unknowable - so I think just doing small block allocations here is the safest way. I'm thinking 16 - or maybe 8 is enough indeed. I'll do that in a followup though.

…arrays

hrydgard added Vulkan GE emulation Backend-independent GPU issues labels Oct 9, 2023

hrydgard added this to the v1.17.0 milestone Oct 9, 2023

hrydgard mentioned this pull request Oct 9, 2023

Some drawcall submission bug fixes #18329

Merged

hrydgard force-pushed the desc-on-render-thread-2 branch 2 times, most recently from be4631a to 78c3c3b Compare October 9, 2023 19:38

hrydgard marked this pull request as ready for review October 9, 2023 21:20

unknownbrackets reviewed Oct 10, 2023

View reviewed changes

hrydgard added 11 commits October 10, 2023 09:00

Fill in descriptors on the render thread in the PPSSPP UI.

f0ee3b8

Also use the new descriptor mechanism for in-game

af47ad0

Better logging in descpool

84f9c16

Show the desc set write time even in the limited GPU profiler.

018e4ee

Simplest possible de-duplication of descriptor set writes

8a4d84d

Don't forget to update descCount in tess mode

ba4d166

Remove commented out code

9c1c09f

Remove unused code

397745c

Have FrameData structs for each pipeline layout, instead of multiple …

2b0192d

…arrays

Update comments

b38b46d

Address feedback

9fdc7e2

hrydgard force-pushed the desc-on-render-thread-2 branch from 04b8c50 to 9fdc7e2 Compare October 10, 2023 07:05

Prepare VulkanDescSetPool for block allocation

3d949b0

hrydgard merged commit 9767f43 into master Oct 10, 2023

hrydgard deleted the desc-on-render-thread-2 branch October 10, 2023 07:49

hrydgard mentioned this pull request Oct 10, 2023

Vulkan: Allocate descriptors in blocks of 8 instead of individually #18338

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vulkan: Write descriptor sets on the render thread #18328

Vulkan: Write descriptor sets on the render thread #18328

hrydgard commented Oct 9, 2023 •

edited

Loading

unknownbrackets Oct 10, 2023

hrydgard Oct 10, 2023

unknownbrackets Oct 10, 2023

hrydgard Oct 10, 2023

unknownbrackets Oct 10, 2023

hrydgard Oct 10, 2023 •

edited

Loading

unknownbrackets Oct 10, 2023

hrydgard Oct 10, 2023

unknownbrackets Oct 10, 2023

hrydgard Oct 10, 2023 •

edited

Loading

unknownbrackets Oct 10, 2023

unknownbrackets Oct 10, 2023

hrydgard Oct 10, 2023 •

edited

Loading

Vulkan: Write descriptor sets on the render thread #18328

Vulkan: Write descriptor sets on the render thread #18328

Conversation

hrydgard commented Oct 9, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hrydgard Oct 10, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hrydgard Oct 10, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hrydgard Oct 10, 2023 • edited Loading

Choose a reason for hiding this comment

hrydgard commented Oct 9, 2023 •

edited

Loading

hrydgard Oct 10, 2023 •

edited

Loading

hrydgard Oct 10, 2023 •

edited

Loading

hrydgard Oct 10, 2023 •

edited

Loading