BatchedMesh: Update Example, some optimization #27202

gkjohnson · 2023-11-17T11:12:39Z

Description

Updates BatchedMesh demo so it can display up to 20,000 geometries
Updates BatchedMesh demo to provide a toggle for per-object frustum culling
Updates the calculation of the texture dimensions so it uses a smaller, non-power-of-two texture
Update BatchedMesh so that the "onBeforeRender" multi-draw buffer regeneration only occurs when visibility changes or a new geometry is added.

OnBeforeRender CPU Performance Metrics

Both perObjectFrustumCulled and sortObjects have performance implications when a lot of meshes are used. With 20,000 geometries on my 2021 M1 Pro takes this long:

	perObjectFrustumCulled = true	perObjectFrustumCulled = false
sortObjects = true	~10ms	~6ms
sortObjects = false	~4.75ms	~0ms

If both sortObjects and perObjectFrustumCulled are false and visibility has been toggled (meaning the multi draw buffer needs to be regenerated) it takes ~0.6ms.

Overdraw Metrics

And here are some performance numbers with the same fields toggled when the camera is zoomed and we have a lot of overdraw happening without sorted elements.

	perObjectFrustumCulled = true	perObjectFrustumCulled = false
sortObjects = true	~105fps	~77fps
sortObjects = false	~88fps	~88fps

I was a bit surprised by the dip when only sortObjects === true but it's possible that the rendering is a bit vertex bound and the GPU time + the CPU time pushes us over the frame boundary.

When the camera is zoomed out, though, having both sortObjects and frustumCulled take more time and the framerate is lower. It's possible we should default these settings to false?

cc @mrdoob

# Conflicts: # src/objects/BatchedMesh.js

github-actions · 2023-11-17T11:15:37Z

📦 Bundle size

Full ESM build, minified and gzipped.

Filesize `dev`	Filesize PR	Diff
668.2 kB (165.8 kB)	668.4 kB (165.9 kB)	+215 B

🌳 Bundle size after tree-shaking

Minimal build including a renderer, camera, empty scene, and dependencies.

Filesize `dev`	Filesize PR	Diff
449.4 kB (108.9 kB)	449.4 kB (108.9 kB)	+0 B

gkjohnson · 2023-11-17T11:16:28Z

In terms of improving the performance of both of these functions - @N8python might have a radix sort implementation that could be used to improve the sort time.

And this would be the user's responsibility but sorting and frustum culling could be performed asynchronously over multiple frames to amortize the time. I think we can save looking into enabling something this for when we see a problem, though.

sciecode · 2023-11-18T20:58:16Z

In terms of improving the performance of both of these functions - @N8python might have a radix sort implementation that could be used to improve the sort time.

I highly discourage from using a naive LSD (least significant digit) radix-sort implementation. This is one of those cases where the theoretical time complexity of the algorithm doesn't translate well to the real-world. LSD radix-sort suffers severely from cache locality incoherence, as soon as the underlying array stop fitting in cache (L2/L3), performance tanks drastically. There's also another consideration, it's not an adaptive algorithm, so it performs much worse than even regular array.sort() on partially ordered data.

A while back I've tried to come up with a 32-bit hybrid sort ( radix MSD / insertion ) that solves both of these problems. It is adaptive ( performs better on partially-ordered data ), stable ( retains original element order on ties ) and cache efficient ( reduces cache misses ).

I've re-written it to accomodate for some regular use cases. Follows an example API:

// ...
const tmp = new Array( array.length );
radixSort( array, {
  reverse: true, // sorts in decreasing order - default: false
  aux: tmp, // optional auxiliar array of same length, prevents internal re-allocation over multiple calls
  get: ( el ) => el.value // optional getter - allows sorting array of objects
});

Follows two tables containing the performance tests results, with mean execution time & performance ratio against native sort(). I've also included an optimized version of the linked radixLSD implementation.

random data - results

(N)	1K	5K	10K	50K	100K	500K	1M
sort	0.190ms	1.159ms	2.592ms	16.533ms	37.794ms	279.764ms	669.860ms
radixLSD	0.031ms 6.166	0.183ms 6.320	0.449ms 5.771	2.644ms 6.253	9.285ms 4.070	333.907ms 0.838	799.146ms 0.838
hybridMSD	0.022ms 8.746	0.211ms 5.493	0.556ms 4.659	2.308ms 7.164	5.255ms 7.192	97.870ms 2.859	263.316ms 2.544

partially ordered ( 99% ) - results

(N)	1K	5K	10K	50K	100K	500K	1M
sort	0.026ms	0.125ms	0.254ms	1.468ms	3.044ms	17.422ms	33.489ms
radixLSD	0.033ms 0.767	0.199ms 0.630	0.449ms 0.566	2.484ms 0.591	6.468ms 0.471	277.133ms 0.063	419.517ms 0.080
hybridMSD	0.017ms 1.491	0.084ms 1.501	0.460ms 0.553	1.437ms 1.022	2.826ms 1.077	13.425ms 1.298	24.021ms 1.394

If it's considered useful, feel free use and modify it, or let me know so I can make the required changes. I believe there are a couple of small optimizations that can still be made. I'll update the linked code if changes are made.

gkjohnson · 2023-11-19T03:26:19Z

Thanks for the insight!

random data - results
sort | 669.860ms
hybridMSD | 263.316ms

partially ordered ( 99% )
sort | 33.489ms
hybridMSD | 24.021ms

I hadn't thought about this but it makes sense. Right now all the elements are resorted from the original insertion order but the most common case would be retaining the sorted order between frames. If there's a simple way to start from the previously sorted order then it could speed things up quite a bit.

If it's considered useful, feel free use and modify it, or let me know so I can make the required changes. I believe there are a couple of small optimizations that can still be made. I'll update the linked code if changes are made.

I'll give this a try in another PR - I think that for now things are good enough for common cases but there's clearly room for improvement in more complex situations.

* visible -> visibility * Remove comment * Improve scale of transform texture * Skip onBeforeRender if possible * Set visibility changed to false * Make sure visibility change flag is toggled on geometry change * Set on geometry change instead of geometry add

gkjohnson added 6 commits November 17, 2023 19:07

visible -> visibility

ecd4dc3

Remove comment

7387cc2

Improve scale of transform texture

c5adaa1

Skip onBeforeRender if possible

a25aa4f

Merge remote-tracking branch 'mrdoob/dev' into batched-example

e964b90

# Conflicts: # src/objects/BatchedMesh.js

Set visibility changed to false

c1c7bb3

gkjohnson requested a review from mrdoob November 17, 2023 11:18

gkjohnson added 2 commits November 17, 2023 21:50

Make sure visibility change flag is toggled on geometry change

2b30d63

Set on geometry change instead of geometry add

ae71d44

gkjohnson added this to the r159 milestone Nov 18, 2023

gkjohnson mentioned this pull request Nov 19, 2023

Skeleton: Compute a smaller texture dimension #27211

Merged

Mugen87 merged commit a90b693 into mrdoob:dev Nov 19, 2023
12 checks passed

gkjohnson deleted the batched-example branch November 19, 2023 11:30

This was referenced Nov 19, 2023

BatchedMesh: Add setCustomSort function, hybrid radix sort implementation to examples #27213

Merged

BatchedMesh: Improve frustum culling performance #27219

Merged

sciecode mentioned this pull request Nov 21, 2023

BatchedMesh: update example, fix depth conversion & auxiliary buffer #27228

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BatchedMesh: Update Example, some optimization #27202

BatchedMesh: Update Example, some optimization #27202

gkjohnson commented Nov 17, 2023 •

edited

Loading

github-actions bot commented Nov 17, 2023 •

edited

Loading

gkjohnson commented Nov 17, 2023

sciecode commented Nov 18, 2023 •

edited

Loading

gkjohnson commented Nov 19, 2023 •

edited

Loading

BatchedMesh: Update Example, some optimization #27202

BatchedMesh: Update Example, some optimization #27202

Conversation

gkjohnson commented Nov 17, 2023 • edited Loading

github-actions bot commented Nov 17, 2023 • edited Loading

📦 Bundle size

🌳 Bundle size after tree-shaking

gkjohnson commented Nov 17, 2023

sciecode commented Nov 18, 2023 • edited Loading

random data - results

partially ordered ( 99% ) - results

gkjohnson commented Nov 19, 2023 • edited Loading

gkjohnson commented Nov 17, 2023 •

edited

Loading

github-actions bot commented Nov 17, 2023 •

edited

Loading

sciecode commented Nov 18, 2023 •

edited

Loading

gkjohnson commented Nov 19, 2023 •

edited

Loading