-
-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New Rend3 4d10795 about 3x slower3 than old Rend3 f2b7df4 on low end GPU #477
Comments
I'm totally mystified by the GPU memory consumption increase. Exactly the same meshes. Exactly the same textures. In the new version, vertices without rigging info are supposed to be smaller. Memory consumption should have decreased. That has to be a bug. |
Memory usage difference is very surprising, there isn't significant changes in how memeory is handled. The performance delta of about 130% on the new machine is about what is to be expected on the current trunk. This is because the new culling shaders are running but not culling anuything. The The difference on 3x the 640 is quite surprising. Would need to figure out why that is. I suspect it's due Kepler having issues with compute shaders and or vertex pulling. |
Something big changed. I'm running the same test on two version of Rend3. Exactly the same number of triangles and textures. For a while, all vertices were carrying full rigging information, even for non-rigged mesh. Which of the versions of Rend3 listed has that? I thought that was fixed and expected GPU memory usage to drop.
The fastest version of Rend3 was 0.2.2. A year ago, my viewer using Rend3 was outperforming all the other Second Life viewers. Two rounds of Rend3 slowdowns later, and several rounds of speedups in the C++ and Unity-based viewers, and they now get higher frame rates than I do. |
I watched your video on occlusion culling. That has potential. If it works. The video seemed to indicate that occlusion groups all had to use the same "material". Does that mean "Material" in the Rend3 sense, or just using the same shader? Each of my objects has its own Rend3 "Material", because every object has its own base color and UV transform. (It's user-created content - no commonality. There's no uniformity in UV transforms.) Occlusion culling may not be a way out of the performance loss. Some special cases, mostly indoor scenes, may improve, but for big outdoor scenes, it may be a net lose. Not sure. It would be useful to be able to turn all that off and go back to fast but dumb mode. |
It is still expected to be lower.
in GpuPowered mode, as long as they have the same shader, they can be batched togehter. In CpuPowered mode, as long as they use the same textures (regardless of the rest of the material) they can be batched together.
This is a potential option, and I know how I would do it. My first priority is to get the bugs out of the gpu-culling code and merge that in to see how the performance delta looks. There are options to make the culling yet faster, which I haven't explored yet |
It's quite possible that the metaverse usage pattern (most instances are unique, GPU objects are constantly being created and removed) is causing problems. There's much more churn than in small-world games with pre-built content.
That might come up with "render-bench", but in that test, it's the same textures and meshes being created and destroyed. So fragmentation shouldn't occur. Memory usage as measured by the NVidia utility goes up to a peak value and stops.
Don't have enough info to evaluate that. So the memory usage increase is still a puzzle. Ref: gfx-rs/wgpu#2447 which is about being able to find out the memory situation from the application level. There are things I can do to cut memory consumption (reduce texture sizes, for example) and need to know when that's needed.
Oh, good. Almost everything in my code uses the same default shader. Any future shaders will be for water, terrain, environment, sky, etc. which are special cases with very few instances. If it was per Rend3 "Material", all occlusion groups would have size 1.
That makes sense. An NVidia 640 doesn't have compute shader hardware. Does running compute shaders without compute shader hardware fall back to CPU-side emulation in the driver, or do you have to emulate on your side? |
Closing this after #593 |
I just revised my "render-bench" program to work with Rend3 4d10795. This is the city of identical buildings, where, after 10 seconds, half of the buildings are deleted, then re-added 10 seconds later. That's the test for the WGPU locking issue. I updated it to help with testing that.
Only the changes necessary to make it run again were made. The frame rate dropped from around 30 FPS to around 9 FPS on a machine with an old NVidia 640 GPU. On the big machine with an NVidia 3070, both old and new versions get around 60 FPS (except when loading new content, which is the WGPU locking issue.)
Both are ready to clone, build with "cargo build --release", and run.
On the NVidia 640 machine,
On the NVidia 3070 machine,
This is unexpected. All the metrics became worse. Am I doing something wrong?
The text was updated successfully, but these errors were encountered: