Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline layout changes invalidate all bindings on DX12 #1926

Closed
Tracked by #1925
kvark opened this issue Jul 13, 2021 · 6 comments
Closed
Tracked by #1925

Pipeline layout changes invalidate all bindings on DX12 #1926

kvark opened this issue Jul 13, 2021 · 6 comments
Labels

Comments

@kvark
Copy link
Contributor

kvark commented Jul 13, 2021

D3D12 has an interesting command: SetGraphicsRootSignature.

In WebGPU terms, that would be equivalent to something like setPipelineLayout(required GPUPipelineLayout layout).

It's documented that an application have to set the root signature before doing a draw call, doesn't matter if it's done before or after the pipeline setup.

Here is what particularly worries me:

If a root signature is changed on a command list, all previous root arguments become stale and all newly expected arguments must be set before Draw/Dispatch otherwise behavior is undefined. If the root signature is redundantly set to the same one currently set, existing root signature bindings do not become stale.

Perhaps, we could get some clarification from Microsoft about the "behavior is undefined" part? I.e. why it's undefined, and what happens under the hood.

Since WebGPU user doesn't have direct access to this SetGraphicsRootSignature, the user agent has one of the following options:

Option-1

Set the root signature on the pipeline change (setPipeline). Re-bind all the resources in bind groups at this point.

This option has a problem: it makes setPipeline performance difficult to reason about. It would be lightweight on Vulkan/Metal but very heavy on D3D12, and only when the pipeline layout is changing.

Note: On Vulkan specifically, we also nave to re-bind a portion of the pipeline layout, based on the layout compatibility rules.
At least it's possible to control the costs of it: if I know that my pipelines' layouts are only different in bind group number 3, then I can be assured that bind groups 0, 1, and 2 aren't going to be rebound internally when I do the pipeline switch.
On D3D12, however, it appears that any minor change in the pipeline layout leads to a performance cliff.

Option-2

Don't bind any resources at setBindGroup. Instead, lazily bind everything at draw call.

This option has a similar problem: it's difficult to reason about the cost of setBindGroup and draw when any lazy binding is taking place. It puts DX12 at disadvantage, since Vulkan and Metal draw calls aren't encumbered by the lazy state.

@magcius
Copy link

magcius commented Jul 13, 2021

So this to me seems partially like a developer education issue: users shouldn't be switching layouts many times per frame (yes, I know about getBindGroupLayout, I consider it mostly a misfeature). We should teach users that switching pipeline layouts is slow, and if they share pipeline layouts a lot, their code will be faster.

@kvark
Copy link
Contributor Author

kvark commented Jul 13, 2021

You are totally correct. However, I think this is more than just an education issue. The API is good when it surfaces the costs. So, supposing I read the following code:

setPipeline(A);
setBindGroup(1, X);
draw()
setPipeline(B)
setBindGroup(1, Y);
draw();

How can I reason about performance? It would require me to do this series of steps:

  1. look up the pipeline creation
  2. then to look up the pipeline layout passed to the descriptors
  3. look up the pipeline layout creation (since pipeline layouts are compatible by value, not by pointer)
  4. look up the bind group layouts creation in each of those, potentially (since bind group layouts are compatible by value)
  5. compare and see if anything is different

This is very well hidden. Basically, one can't look at the code and say how expensive it is.

If we consider WebGPU on Vulkan and Metal, then the situation is much better: if I see setBindGroup(1, X) I know that bind group at (0) is fully preserved, it sets the upper bound on what is actually getting invalidated in the layout space.

@Kangz
Copy link
Contributor

Kangz commented Jul 15, 2021

In Dawn we chose to go with option 2) and lazily re-bind descriptor tables as needed.

Looking at Vulkan drivers, they also lazily apply descriptor sets at draw time, and setting the pipeline invalidates the descriptor sets.

While this shows that Vulkan's descriptor sets aren't a representative abstraction of what the hardware can do, I think that for WebGPU it is useful to help the developers by making the bindgroup state persistent so 1) they can do less setBindGroup calls and 2) they don't have to figure out on a pipeline change all the things to rebind, and if they have to rebind.

@kvark
Copy link
Contributor Author

kvark commented Jul 15, 2021

Thanks for the links! I find the code fairly convincing.
So for general understanding, we can assume that changing the pipeline invalidates bindings internally, which is even stronger than what DX12 requires with root signature invalidation.
It makes me deeply sad that even the "low level" APIs we base upon are not exposing the hardware in a good way, but fixing this would be a bit of a bigger problem than shipping WebGPU.

@kvark kvark closed this as completed Jul 15, 2021
@kvark
Copy link
Contributor Author

kvark commented Jul 15, 2021

It could be that Mesa's drivers are just not optimized enough.
I got confirmation from NVidia that they take full advantage of Vulkan pipeline compatibility, and are not rebinding everything blindly on a pipeline change.

@Kangz
Copy link
Contributor

Kangz commented Jul 16, 2021

There's probably things your can do on very bindless hardware to decouple the two things. I don't see anything in AMD hardware that would prevent implementing Vulkan in a way that keeps the bindings on pipeline change. However what I think this shows is that drivers decided that the opportunity to do more optimizations (promote some stuff to inline descriptors or other reduction of indirections, etc.) has more value than the pure CPU cost of emitting commands again. CPU perf is no longer the bottleneck: GPU perf is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants