-
Notifications
You must be signed in to change notification settings - Fork 959
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Apparently miscompiled barrier in compute shader #3181
Comments
More digging: I've made a minimal Metal repro that triggers the behavior. It only triggers when buffer-bounds-check-policy is set to ReadZeroSkipWrite, otherwise not. |
Even more curious: if I replace the expression |
And more insight into why it works in Chrome Canary: the tint transform for buffer robustness uses |
The miscompilation has been confirmed by a contact at Apple and my understanding is that they're working on it. That still leaves the question of how to mitigate it on our end. Possibilities include:
I also want to make sure this case is adequately covered by tests. |
I'm investigating what appears to be a miscompiled shader from piet-gpu, and have generated a very reduced test case. There's more detail at linebender/vello#199 but this issue is intended to be self-contained.
Description
My shader does bump allocation with atomics in one thread (WG_SIZE - 1), then stores the result in a workgroup shared variable to broadcast to other threads. In the shader source, a
workgroupBarrier()
separates the write from the read. What I'm seeing is consistent with the barrier being elided altogether.Repro steps
Check out the tree from linebender/vello#199,
cd piet-wgsl
, thencargo run
.Expected vs observed behavior
Correct output is as follows:
Actual output appears to be garbage values (0x4a150041 is typical), at least when run on mac M1 Max.
Extra materials
I also prepared a WebGPU version of this at https://gist.github.com/raphlinus/f32d9602241ea5c99b480e721f1ea834 . It does not repro on Chrome Canary.
I've also looked at the naga output from compiling the shader and don't see anything wrong with that - the barrier is clearly there.
Some perturbations to the test make the behavior go away (for example, removing the last
if
gating the write), but this is common with barrier problems in my experience. Also when I perturb the test so the reader threads are in the same subgroup (or if the workgroup size is reduced to 32, which appears to be the subgroup size), the behavior goes away. If I change the test to store the value in a storage buffer, and use astorageBarrier()
, the behavior goes away; this is the current workaround I'm using in piet-gpu itself.Platform
macOS 12.6, mac M1 Max, wgpu 0.14
The text was updated successfully, but these errors were encountered: