Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: WebGPURenderer prototype single uniform buffer update / pass #27388

Open
wants to merge 10 commits into
base: dev
Choose a base branch
from

Conversation

aardgoose
Copy link
Contributor

Prototype mechanism to reduce number of writeBuffer() calls using a single large buffer for all object uniforms groups, which is updated before the renderPass is submitted. As used in some other engines with WebGPU.

All examples run correctly with this PR. Effects greatest with large numbers of objects being rendered. The largest changes are the GPU thread times which are greatly reduced when testing with the webgpu_sprites examples. From 5ms/frame with per object buffer to 2.5ms with single buffer in my brief testing.

No attempt has been made:

  • to synchronize the buffer updating and reading
  • allow buffer resizing or detect buffer overflow
  • recovery of buffer space on object deletion.

@sunag
Copy link
Collaborator

sunag commented Dec 18, 2023

Reduce the number of calls from writeBuffer() is too part of #27134 After configuring, many things called will be reduced, instead of being updated per object, they will be updated per frame.

I like your idea, but I wonder if it wouldn't be better to have this configured in Node and adjusted at setup()?

@sunag
Copy link
Collaborator

sunag commented May 13, 2024

Hi @aardgoose

Do you think about fixing the conflicts? I was thinking about merge this PR soon

@aardgoose
Copy link
Contributor Author

I'll take a look tomorrow.

@RenaudRohlinger
Copy link
Collaborator

Awesome! Is it ready for review @aardgoose? Can you promote it from Draft to PR maybe? 😊

@aardgoose
Copy link
Contributor Author

@RenaudRohlinger will do.

We might want to select specific uniform groups to be managed in this way, which is now possible as the buffer is passed through the NodeBuilder.

An obvious next stage is to look at reclaiming unused buffers, but we need a deallocation mechanism first, when a material is disposed of.

@aardgoose aardgoose marked this pull request as ready for review May 19, 2024 10:35
@mrdoob mrdoob modified the milestones: r165, r166 May 31, 2024
@aardgoose aardgoose force-pushed the singlebuffer branch 2 times, most recently from 34c5527 to 82107c4 Compare June 12, 2024 20:16
@aardgoose
Copy link
Contributor Author

Added lists per extent size (multiple of block size) for freed buffers when objects are removed from the scene graph.. These lists are used for new allocations in preference to free space at the end of the buffer.

Block size is typically 256 bytes (https://web3dsurvey.com/webgpu/limits/minStorageBufferOffsetAlignment).

Added reworked example with continuous removal and addition of new objects and stats demonstrating buffer use. This only uses blocks of 256B or less.

Copy link

github-actions bot commented Jul 14, 2024

📦 Bundle size

Full ESM build, minified and gzipped.

Filesize dev Filesize PR Diff
685.1 kB (169.6 kB) 685.1 kB (169.6 kB) +0 B

🌳 Bundle size after tree-shaking

Minimal build including a renderer, camera, empty scene, and dependencies.

Filesize dev Filesize PR Diff
462 kB (111.4 kB) 462 kB (111.4 kB) +0 B

<script type="module">

import * as THREE from 'three';
import { texture, uv, userData, rangeFog, color, SpriteNodeMaterial, reference } from 'three/tsl';

Check notice

Code scanning / CodeQL

Unused variable, import, function or class Note

Unused imports SpriteNodeMaterial, reference.
@mrdoob mrdoob modified the milestones: r167, r168 Jul 25, 2024
@RenaudRohlinger
Copy link
Collaborator

I've been conducting performance benchmarks and believe this PR could significantly enhance the webgpu_performances.html example, particularly within the WebGL backend. It could potentially boost performance from around 30fps to over 120fps.

Due to the force-push, I'm unable to check out the PR myself. If possible, could you give it a try?

Additionally, to address the performance issues in webgpu_performances.html, I’m considering using gl.bindBufferRange and gl.bufferSubData instead of gl.bufferData( gl.UNIFORM_BUFFER, data, gl.DYNAMIC_DRAW ), it will not solve anything but simply improve overall the UBOs strategy in the WebGL Backend.

While I'm still investigating the exact cause of the performance drop in WebGL, I'm fairly confident this PR addresses a major bottleneck. The issue seems to stem from overwhelming the GPU with hundreds of buffer uploads, or at least CPU-GPU data transfer, which then causes a drop in the subsequent 5-6 frames every 6 frames in the RAF.
Although this PR is more of a great feature that will work as a workaround, it should help significantly. In the long term, implementing a caching system in the UBO logic to prevent unnecessary uploads with more precise range might be the real solution to the WebGLBackend performance issues.

/cc @sunag @Mugen87

@sunag
Copy link
Collaborator

sunag commented Aug 14, 2024

We need to check if the WebGLBackend still has redundant calls. Last time I looked at this, the WebGLRenderer had more state comparators, so it only sends the commands that have actually changed to the WebGL.

I haven't had time to implement UniformGroup on all nodes yet. If we don't do this, we won't be able to achieve optimal performance because the model's matrix groups will be confused with those of the material, causing unnecessary overhead for both backends. I think after this we will be able to implement buffer sharing more safely.

@Spiri0
Copy link
Contributor

Spiri0 commented Oct 16, 2024

@aardgoose This is better than I imagined. This is an entire ecosystem for managing uniform groups. I have to take a look at it in peace, because I see you've already put a lot of energy into it.

@Spiri0
Copy link
Contributor

Spiri0 commented Oct 17, 2024

What I don't see yet is the possibility of bundling uniforms in custom uniformsGroups as a user. This made it possible to separate uniforms that are constantly updated from uniforms that are rarely or not updated at all.
For the visibility check in compute shaders, it makes sense to combine the camera matrices, frustum, ... into a uniformsGroup, while the many meshes whose uniforms do not change or rarely change are in another uniformsGroup.

@Spiri0
Copy link
Contributor

Spiri0 commented Oct 27, 2024

I think this is a very good thing, but there is a lack of parameterization if users want to create UniformGroups themselves. As far as I can see, the UniformsGroup class is intended for use in the backend.

As an illustrative example, here are some uniforms that I use. At the moment I can pass these all individually to the shader.

const frustumStruct = struct({
    cameraProjectionMatrix: cameraProjectionMatrix,
    cameraProjectionMatrixInverse: cameraProjectionMatrixInverse,
    cameraViewMatrix: cameraViewMatrix,
    frustumPlanes: array(frustum, 'vec4'),  //the six frustumplanes and their constants. struct and array are my custom test nodes
    viewport: vec4( viewportSize, cameraNear, cameraFar ),
    cameraPosition: vec4(cameraPosition, 0)
});

But I think a little further ahead where the journey with WebGPU is going. I replaced the classic attributes in my vertex shader with storagebuffers because they give me access to all vertices. With a drawIndirect buffer and an infobuffer which contains the information about the visibility of the instances, I no longer have any attributes at all. The vertex shader is controlled via drawIndirect and my InstanceInfoBuffer, both of which I fill in a compute shader.
It was only afterwards that I realized what this meant for the future, being able to completely replace the attributes with drawIndirect, compute shader and storagebuffers. This is unreal engine level.
Handing over all the uniforms individually and then later having Aardgoose's extension bundle everything together will certainly work, but for clarity in applications and also to separate constantly changing uniforms from those that don't change at all, in my opinion bundling that can be parameterized by the user also makes sense.
So there are two different application areas backend bundling and frontend bundling and both are very useful. I imagine that in the future threejs would also like to provide the indirect visibility check as a buildin functionality and it makes sense to combine the neccessary elements in structs from the start. What is your opinion on this?
Over time I can certainly get more into it to work out my experimental struct and array node properly, but it will take time since I do a lot on the user side

@mrdoob mrdoob modified the milestones: r170, r171 Oct 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants