-
-
Notifications
You must be signed in to change notification settings - Fork 35.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BatchedMesh: Update Example, some optimization #27202
Conversation
📦 Bundle sizeFull ESM build, minified and gzipped.
🌳 Bundle size after tree-shakingMinimal build including a renderer, camera, empty scene, and dependencies.
|
In terms of improving the performance of both of these functions - @N8python might have a radix sort implementation that could be used to improve the sort time. And this would be the user's responsibility but sorting and frustum culling could be performed asynchronously over multiple frames to amortize the time. I think we can save looking into enabling something this for when we see a problem, though. |
I highly discourage from using a naive LSD (least significant digit) radix-sort implementation. This is one of those cases where the theoretical time complexity of the algorithm doesn't translate well to the real-world. LSD radix-sort suffers severely from cache locality incoherence, as soon as the underlying array stop fitting in cache (L2/L3), performance tanks drastically. There's also another consideration, it's not an adaptive algorithm, so it performs much worse than even regular A while back I've tried to come up with a 32-bit hybrid sort ( radix MSD / insertion ) that solves both of these problems. It is adaptive ( performs better on partially-ordered data ), stable ( retains original element order on ties ) and cache efficient ( reduces cache misses ). I've re-written it to accomodate for some regular use cases. Follows an example API: // ...
const tmp = new Array( array.length );
radixSort( array, {
reverse: true, // sorts in decreasing order - default: false
aux: tmp, // optional auxiliar array of same length, prevents internal re-allocation over multiple calls
get: ( el ) => el.value // optional getter - allows sorting array of objects
}); Follows two tables containing the performance tests results, with mean execution time & performance ratio against native random data - results
partially ordered ( 99% ) - results
If it's considered useful, feel free use and modify it, or let me know so I can make the required changes. I believe there are a couple of small optimizations that can still be made. I'll update the linked code if changes are made. |
Thanks for the insight!
I hadn't thought about this but it makes sense. Right now all the elements are resorted from the original insertion order but the most common case would be retaining the sorted order between frames. If there's a simple way to start from the previously sorted order then it could speed things up quite a bit.
I'll give this a try in another PR - I think that for now things are good enough for common cases but there's clearly room for improvement in more complex situations. |
* visible -> visibility * Remove comment * Improve scale of transform texture * Skip onBeforeRender if possible * Set visibility changed to false * Make sure visibility change flag is toggled on geometry change * Set on geometry change instead of geometry add
Related issue: #22376, #27168 (comment)
Description
OnBeforeRender CPU Performance Metrics
Both
perObjectFrustumCulled
andsortObjects
have performance implications when a lot of meshes are used. With 20,000 geometries on my 2021 M1 Pro takes this long:If both sortObjects and perObjectFrustumCulled are false and visibility has been toggled (meaning the multi draw buffer needs to be regenerated) it takes ~0.6ms.
Overdraw Metrics
And here are some performance numbers with the same fields toggled when the camera is zoomed and we have a lot of overdraw happening without sorted elements.
I was a bit surprised by the dip when only sortObjects === true but it's possible that the rendering is a bit vertex bound and the GPU time + the CPU time pushes us over the frame boundary.
When the camera is zoomed out, though, having both sortObjects and frustumCulled take more time and the framerate is lower. It's possible we should default these settings to false?
cc @mrdoob