Apparently miscompiled barrier in compute shader #3181

raphlinus · 2022-11-06T00:15:23Z

I'm investigating what appears to be a miscompiled shader from piet-gpu, and have generated a very reduced test case. There's more detail at linebender/vello#199 but this issue is intended to be self-contained.

Description
My shader does bump allocation with atomics in one thread (WG_SIZE - 1), then stores the result in a workgroup shared variable to broadcast to other threads. In the shader source, a workgroupBarrier() separates the write from the read. What I'm seeing is consistent with the barrier being elided altogether.

Repro steps
Check out the tree from linebender/vello#199, cd piet-wgsl, then cargo run.

Expected vs observed behavior
Correct output is as follows:

0: 1
1: 1
2: 1

Actual output appears to be garbage values (0x4a150041 is typical), at least when run on mac M1 Max.

Extra materials
I also prepared a WebGPU version of this at https://gist.github.com/raphlinus/f32d9602241ea5c99b480e721f1ea834 . It does not repro on Chrome Canary.

I've also looked at the naga output from compiling the shader and don't see anything wrong with that - the barrier is clearly there.

Some perturbations to the test make the behavior go away (for example, removing the last if gating the write), but this is common with barrier problems in my experience. Also when I perturb the test so the reader threads are in the same subgroup (or if the workgroup size is reduced to 32, which appears to be the subgroup size), the behavior goes away. If I change the test to store the value in a storage buffer, and use a storageBarrier(), the behavior goes away; this is the current workaround I'm using in piet-gpu itself.

Platform
macOS 12.6, mac M1 Max, wgpu 0.14

The text was updated successfully, but these errors were encountered:

raphlinus · 2022-11-06T17:08:19Z

More digging: I've made a minimal Metal repro that triggers the behavior. It only triggers when buffer-bounds-check-policy is set to ReadZeroSkipWrite, otherwise not.

raphlinus · 2022-11-06T17:11:25Z

Even more curious: if I replace the expression 1 + (_buffer_sizes.size1 - 0 - 4) / 4 with the hardcoded constant 256 (which it evaluates to), I don't see the behavior. If I simplify it to _buffer_sizes.size1 / 4 I still do.

raphlinus · 2022-11-06T17:47:30Z

And more insight into why it works in Chrome Canary: the tint transform for buffer robustness uses min rather than conditionalizing the store. I have a tweak to the Metal repro case if anybody wants to see it, but I don't know that it would be particularly helpful in tracking this down.

raphlinus · 2022-11-07T15:49:31Z

The miscompilation has been confirmed by a contact at Apple and my understanding is that they're working on it. That still leaves the question of how to mitigate it on our end. Possibilities include:

Keep the bletcherous workaround of using a storage buffer rather than a workgroup-shared variable.
Turn off bounds checking altogether, as it actually increases risk.
Patch naga to use min-style bounds enforcement rather than conditionalizing

I also want to make sure this case is adequately covered by tests.

cwfitzgerald added the external: driver-bug A driver is causing the bug, though we may still want to work around it label Nov 7, 2022

raphlinus mentioned this issue Nov 10, 2022

Create test case to cover observed barrier miscompilation gpuweb/cts#1983

Open

armansito mentioned this issue Oct 25, 2023

[msl-out] Incorrect behavior over workgroup memory due to possibly miscompiled barrier in Metal #4500

Open

teoxoy added the api: metal Issues with Metal label Nov 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Apparently miscompiled barrier in compute shader #3181

Apparently miscompiled barrier in compute shader #3181

raphlinus commented Nov 6, 2022

raphlinus commented Nov 6, 2022 •

edited

Loading

raphlinus commented Nov 6, 2022

raphlinus commented Nov 6, 2022

raphlinus commented Nov 7, 2022

Apparently miscompiled barrier in compute shader #3181

Apparently miscompiled barrier in compute shader #3181

Comments

raphlinus commented Nov 6, 2022

raphlinus commented Nov 6, 2022 • edited Loading

raphlinus commented Nov 6, 2022

raphlinus commented Nov 6, 2022

raphlinus commented Nov 7, 2022

raphlinus commented Nov 6, 2022 •

edited

Loading