WebGPURenderer: Workgroup Arrays and Barrier Support #29192

cmhhelgeson · 2024-08-20T23:45:53Z

Description

Add ability to create workgroup and private arrays within compute shaders, which can be used to accelerate compute operations. Ideally could be used for providing pre-written compute operations that are fast and useful out of the box ( bitonic sort, prefix sum ). This would probably be less useful to the end user, though those only targeting WebGPU devices may find some benefit out of using this functionality.

If requested, I can try to provide samples for some of this functionality, like porting the existing WebGPU Bitonic sort sample or doing something with spatial hashing and prefix sums, though this will likely require the ability to query the value of local_invocation_id and workgroup_id within TSL.

Node implementation
Storage Buffer Sample Fix
Bespoke Workgroup Array Sample

github-actions · 2024-08-20T23:48:24Z

📦 Bundle size

Full ESM build, minified and gzipped.

	Before	After	Diff
WebGL	685.24 169.64	685.24 169.64	+0 B +0 B
WebGPU	826.41 221.63	827.96 222.09	+1.55 kB +466 B
WebGPU Nodes	825.99 221.54	827.54 222	+1.55 kB +461 B

🌳 Bundle size after tree-shaking

Minimal build including a renderer, camera, empty scene, and dependencies.

	Before	After	Diff
WebGL	462.02 111.48	462.02 111.48	+0 B +0 B
WebGPU	525.5 141.64	526.17 141.84	+671 B +193 B
WebGPU Nodes	482.15 131.46	482.83 131.65	+671 B +192 B

src/nodes/gpgpu/ScopedArrayNode.js

src/renderers/webgpu/nodes/WGSLNodeBuilder.js

cmhhelgeson · 2024-08-20T23:53:03Z

I'm also not really a fan of the ScopedArrayNode name. Willing to take any suggestions on a name that would make more sense ( ComputeArrayNode, ComputeLocalArrayNode, ComputeAccess, etc)

RenaudRohlinger · 2024-08-22T11:58:42Z

I’m not a big fan of using "Scope" in a node name either. How about WorkgroupInfoNode? It clearly indicates workgroup-level data and aligns with WebGPU and WGSL terminology. Or maybe DomainArrayNode?

For example:

export const workgroupArray = ( type, count ) => nodeObject( new WorkgroupInfoNode( 'Workgroup', type, count ) );
export const privateArray = ( type, count ) => nodeObject( new WorkgroupInfoNode( 'Private', type, count ) );

By the way are you planning on trying your WebGPU SPH Simulation with TSL @cmhhelgeson? 😄

cmhhelgeson · 2024-08-22T15:02:44Z

I’m not a big fan of using "Scope" in a node name either. How about WorkgroupInfoNode? It clearly indicates workgroup-level data and aligns with WebGPU and WGSL terminology. Or maybe DomainArrayNode?

For example:
export const workgroupArray = ( type, count ) => nodeObject( new WorkgroupInfoNode( 'Workgroup', type, count ) );
export const privateArray = ( type, count ) => nodeObject( new WorkgroupInfoNode( 'Private', type, count ) );
By the way are you planning on trying your WebGPU SPH Simulation with TSL @cmhhelgeson? 😄

Naming:

I'll change the name to WorkgroupInfoNode. I'll also remove privateArray for now. I don't really see it's utility when WGSLNodeBuilder already constructs all code within one function body. However, I'll leave the scope property just in case there are other potential workgroup local variable types ( var, etc ). Maybe down the line, we can decide whether we want to rename the class if we create a separate class for workgroup variables holding a single value.

Future Plans

My current plan is:

Get current pull requests merged in
Finish protoplanet port ( people seem to like ports of old samples and InstancedPointsNodeMaterial needs to be fixed)
Finish extant pull requests that can be finished (post-processing, arrayCamera, etc)
Do required reading on subgroups, physics, workgroup sync, maybe atomics for a week to get myself back up to speed.
Move onto new/more creative/more demanding uses of compute like SPH, Instanced Points FLIP, Spatial Hash Collisions with Workgroup or Subgroup Sync, etc.

So TLDR: Yes 😊

sunag · 2024-08-24T01:41:10Z

Would it be complicated to have an example using these features in this PR?

cmhhelgeson · 2024-08-24T02:16:13Z

Would it be complicated to have an example using these features in this PR?

Shouldn't be too complicated, I can write one using invocationLocalIndex.

cmhhelgeson · 2024-08-26T16:39:39Z

Moved to draft until samples are created.

src/nodes/Nodes.js

cmhhelgeson · 2024-08-27T22:58:44Z

@sunag The WebGPUBackend side of the Storage buffer sample is now fixed with the addition of a single workgroupBarrier() call. This call prevents data from being accessed and written to at the same time. This is separate from the addition of a new sample that will complete this pull request.

three.js.examples.-.Google.Chrome.2024-08-27.15-57-26.mp4

examples/webgpu_compute_sort_bitonic.html

cmhhelgeson · 2024-09-03T02:06:51Z

Just wanted to give a brief update since this took longer than originally expected. Two things have happened:

Moving for a new job, so haven't had the time to give PRs proper attention.
Sort seems to work under certain conditions but there's some weirdness with the uniforms that I haven't been able to figure out yet. I'll add more detail below once I've figured out what the exact issue is.

cmhhelgeson · 2024-09-04T02:00:03Z

Sort is now working:

Untitled.video.7.mp4

sunag · 2024-09-06T03:59:46Z

@cmhhelgeson Looks great! I will review and merge it soon, thanks

RenaudRohlinger · 2024-09-06T13:09:26Z

Right now the example mixes local (workgroup) and global swaps, but it might be worth considering completing all the local sorting first before moving on to global sorting.

This could better reflect how bitonic sort is typically optimized for parallel processing:

Phase 1: Perform all local (workgroup) swaps (flip and disperse) within each group.
Phase 2: Once the local sorting is done, proceed to global sorting across workgroups to finalize the order.

This approach could make the example easier to understand by clearly separating the local and global phases, making the sorting process more educational. Just a thought!

cmhhelgeson · 2024-09-06T13:28:04Z

Right now the example mixes local (workgroup) and global swaps, but it might be worth considering completing all the local sorting first before moving on to global sorting.

This could better reflect how bitonic sort is typically optimized for parallel processing:

Phase 1: Perform all local (workgroup) swaps (flip and disperse) within each group. Phase 2: Once the local sorting is done, proceed to global sorting across workgroups to finalize the order.

This approach could make the example easier to understand by clearly separating the local and global phases, making the sorting process more educational. Just a thought!

Maybe I'm misunderstanding your suggestion, but the example already does this. It will peform only local swaps until the span of a swap necessitates that the swap be performed globally. The purpose of the computeAlgo function is to ensure that the correct swap function is executed given the span length. It's not mixing global and local swaps at random.

EDIT: For instance, in the debug panel of the reference implementation, whose code I've ported into TSL, Next Step will always be a Local step until the Next Swap Span exceeds workgroup_size * 2: https://webgpu.github.io/webgpu-samples/sample/bitonicSort

RenaudRohlinger · 2024-09-06T13:48:19Z

Oh I see, the two panels were misleading me. Then all good! I guess having just the left example could be easier to understand? Since the local already swaps to global when needed, demonstrating both features.

cmhhelgeson · 2024-09-06T13:58:52Z

Oh I see, the two panels were misleading me. Then all good! I guess having just the left example could be easier to understand? Since the local already swaps to global when needed, demonstrating both

Maybe we could color code local amd global swaps somehow.

EDIT: There are probably further performance improvements that could come down the line with the implementation of switch statements, but for now, I'll try to determine a way to make to indicate the locality of a sort in an interesting visual manner before closing this out.

…wap clearer. May want to improve the performance of the fragment shader by writing nextAlgo and nextBlockHeight to uniforms on the CPU side

cmhhelgeson · 2024-09-08T23:43:36Z

@RenaudRohlinger The example has been updated to more clearly demonstrate when a local sort or when a global sort is occurring.

RenaudRohlinger · 2024-09-09T01:45:14Z

Probably the best example of a sorting algorithm I've ever seen! 😄

cmhhelgeson · 2024-09-10T16:45:57Z

Is there anything else that needs to be done here?

* init * barrier, private array, workgroup array support * clean * Implement Renaud suggestions * fix * fix storage buffer example with workgroupBarrier() * add tags and other info * add bitonic sort example * update * Rebase branch * try to fix bitonic sort shader * simplify * fix * bitonic sort now works but local swap is slower than global swap : * cleanup * fix rebase issues * Change display and html to make difference between global and local swap clearer. May want to improve the performance of the fragment shader by writing nextAlgo and nextBlockHeight to uniforms on the CPU side * update (ugly?) screenshot * cleanup ---------

github-advanced-security bot found potential problems Aug 20, 2024

View reviewed changes

src/nodes/gpgpu/ScopedArrayNode.js Fixed Show fixed Hide fixed

src/renderers/webgpu/nodes/WGSLNodeBuilder.js Fixed Show fixed Hide fixed

RenaudRohlinger added the WebGPU label Aug 22, 2024

cmhhelgeson force-pushed the workgroup_array_node branch from a4fd875 to 277282d Compare August 25, 2024 22:22

cmhhelgeson marked this pull request as draft August 26, 2024 16:39

cmhhelgeson force-pushed the workgroup_array_node branch from 277282d to 87f076c Compare August 27, 2024 20:11

github-advanced-security bot found potential problems Aug 27, 2024

View reviewed changes

src/nodes/Nodes.js Fixed Show fixed Hide fixed

cmhhelgeson force-pushed the workgroup_array_node branch from 87f076c to c799910 Compare August 27, 2024 20:43

cmhhelgeson force-pushed the workgroup_array_node branch from 6720287 to 7dd9327 Compare August 28, 2024 19:46

github-advanced-security bot found potential problems Aug 28, 2024

View reviewed changes

examples/webgpu_compute_sort_bitonic.html Fixed Show fixed Hide fixed

examples/webgpu_compute_sort_bitonic.html Fixed Show fixed Hide fixed

examples/webgpu_compute_sort_bitonic.html Fixed Show fixed Hide fixed

examples/webgpu_compute_sort_bitonic.html Fixed Show fixed Hide fixed

github-advanced-security bot found potential problems Aug 28, 2024

View reviewed changes

examples/webgpu_compute_sort_bitonic.html Fixed Show fixed Hide fixed

cmhhelgeson force-pushed the workgroup_array_node branch from 45545dc to 3d8aa0d Compare September 2, 2024 21:34

cmhhelgeson added 9 commits September 3, 2024 22:45

init

eb39e85

barrier, private array, workgroup array support

35062e1

clean

4d91956

Implement Renaud suggestions

1d50c46

fix

af905d2

fix storage buffer example with workgroupBarrier()

19e8dee

add tags and other info

bdf8a1f

add bitonic sort example

60d4f77

update

8b3766f

cmhhelgeson added 6 commits September 3, 2024 22:48

Rebase branch

e581285

try to fix bitonic sort shader

41c068b

simplify

d5b942b

fix

969a9b9

bitonic sort now works but local swap is slower than global swap :

d367516

cleanup

a5df40c

cmhhelgeson force-pushed the workgroup_array_node branch from 0425ddf to a5df40c Compare September 4, 2024 05:53

fix rebase issues

53946a5

cmhhelgeson marked this pull request as ready for review September 4, 2024 06:02

sunag added this to the r169 milestone Sep 5, 2024

cmhhelgeson added 2 commits September 6, 2024 11:44

Change display and html to make difference between global and local s…

65c59fe

…wap clearer. May want to improve the performance of the fragment shader by writing nextAlgo and nextBlockHeight to uniforms on the CPU side

update (ugly?) screenshot

4070f52

cleanup

62ce9ae

sunag merged commit 1174d07 into mrdoob:dev Sep 10, 2024
12 checks passed

cmhhelgeson deleted the workgroup_array_node branch September 10, 2024 18:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WebGPURenderer: Workgroup Arrays and Barrier Support #29192

WebGPURenderer: Workgroup Arrays and Barrier Support #29192

cmhhelgeson commented Aug 20, 2024 •

edited

Loading

github-actions bot commented Aug 20, 2024 •

edited

Loading

cmhhelgeson commented Aug 20, 2024

RenaudRohlinger commented Aug 22, 2024

cmhhelgeson commented Aug 22, 2024 •

edited

Loading

sunag commented Aug 24, 2024

cmhhelgeson commented Aug 24, 2024

cmhhelgeson commented Aug 26, 2024

cmhhelgeson commented Aug 27, 2024 •

edited

Loading

cmhhelgeson commented Sep 3, 2024 •

edited

Loading

cmhhelgeson commented Sep 4, 2024

sunag commented Sep 6, 2024

RenaudRohlinger commented Sep 6, 2024

cmhhelgeson commented Sep 6, 2024 •

edited

Loading

RenaudRohlinger commented Sep 6, 2024

cmhhelgeson commented Sep 6, 2024 •

edited

Loading

cmhhelgeson commented Sep 8, 2024 •

edited

Loading

RenaudRohlinger commented Sep 9, 2024

cmhhelgeson commented Sep 10, 2024

WebGPURenderer: Workgroup Arrays and Barrier Support #29192

WebGPURenderer: Workgroup Arrays and Barrier Support #29192

Conversation

cmhhelgeson commented Aug 20, 2024 • edited Loading

github-actions bot commented Aug 20, 2024 • edited Loading

📦 Bundle size

🌳 Bundle size after tree-shaking

cmhhelgeson commented Aug 20, 2024

RenaudRohlinger commented Aug 22, 2024

cmhhelgeson commented Aug 22, 2024 • edited Loading

sunag commented Aug 24, 2024

cmhhelgeson commented Aug 24, 2024

cmhhelgeson commented Aug 26, 2024

cmhhelgeson commented Aug 27, 2024 • edited Loading

cmhhelgeson commented Sep 3, 2024 • edited Loading

cmhhelgeson commented Sep 4, 2024

sunag commented Sep 6, 2024

RenaudRohlinger commented Sep 6, 2024

cmhhelgeson commented Sep 6, 2024 • edited Loading

RenaudRohlinger commented Sep 6, 2024

cmhhelgeson commented Sep 6, 2024 • edited Loading

cmhhelgeson commented Sep 8, 2024 • edited Loading

RenaudRohlinger commented Sep 9, 2024

cmhhelgeson commented Sep 10, 2024

cmhhelgeson commented Aug 20, 2024 •

edited

Loading

github-actions bot commented Aug 20, 2024 •

edited

Loading

cmhhelgeson commented Aug 22, 2024 •

edited

Loading

cmhhelgeson commented Aug 27, 2024 •

edited

Loading

cmhhelgeson commented Sep 3, 2024 •

edited

Loading

cmhhelgeson commented Sep 6, 2024 •

edited

Loading

cmhhelgeson commented Sep 6, 2024 •

edited

Loading

cmhhelgeson commented Sep 8, 2024 •

edited

Loading