-
-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Panic in GPU culler for bind group too large. #541
Comments
Running out of GPU memory in mesh creation is now being properly reported to the application level, and the program continues to run. So that worked. Looks like there are other places where that limit can be hit. |
Interesting to note this is only 8 bytes over the limit, I wonder if this is an off-by-a-smidge error. |
I'm operating very close to the limit right now. I create meshes until I hit the bind group limit and get the mesh error. Then I put the failed request on hold. New requests continue to hit the limit, and they, too, get put on hold There's a background task which manages levels of detail, and it will take steps to reduce the memory pressure and redo the failed items, but that's only partly written and not working yet. Once it's all working, it will only hit the limit occasionally. Then it will back off. So if something in the GPU culler needs some bind group space during rendering, it's likely to hit the limit. There are two ways to go at this. 1) Bang into the limit, get an error return, and recover. This requires that all components be able to operate right up to the limit. That's the current implementation. 2) Provide info on how much of the resource is left, so the application can back off before hitting the limit. The current choice is 1). I've figured out how to work with that, and it's going well. With 2), it's necessary to have reliable info about how much of the resource is left. This is apparently difficult. Fragmentation may be an issue. (Does bind group space get fragmented?) It's extremely difficult to get memory info out of the WPGU and below levels, as I understand it. For Vulkan it's listed a proposed enhancement. So, as I understand it, we're stuck with 1). |
Somewhat related: At the 2147483648 limit, my own count of vertices is 37098544. |
I'm getting this too, it happens randomly and I don't believe I am ever near the bind group limit |
So this problem is caused by the result index buffer getting too large - if the total indices in the scene are greater than 2^27, you'll hit this problem. This is one pretty major disadvantage of the culling system as it stands, and I'm currently scheming on how to remove this limit. I can raise it to 2^28 pretty easily as there's currently an off-by-8-bytes situation. But I'm generally concerned about the limitations the culling system has, and the minimal performance benefits, so I may remove it in favor of other culling techniques. |
Sounds good. I've been able to rework things such that hitting the limit is now recoverable. It now tells the level of detail system to cut back on quality. But a higher ceiling would be nice. |
I just built a version of Sharpview where this is a hard error that fails at startup every time even on simple scenes. In addition, the rendered images have random triangles all over the place. These have been rare intermittent problems for months, but now I have a soild repro. 04:14:15 [ERROR] =========> Panic wgpu error: Validation Error Caused by: at file /home/john/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-0.19.1/src/backend/wgpu_core.rs, line 3009 in thread main. This started failing after I changed some visibility of modules in mod.rs files. Didn't even change any code. So it may depend on memory layout. My own code is 100% safe Rust, so short of a compiler error, that shouldn't matter. Saved the bad executable, did cargo clean, and rebuilt. Rebuilt version still fails in the same way. So it wasn't a transient bad compile. This is a relatively simple test scene and is nowhere near the bind limit. I've tried logging into different places in Second Life and OSGrid, and all fail the same way. |
Fails in both debug and release mode in the same way. Just slower in debug. |
Closed by #593 |
Internal panic in GPU culler when bind group is too large.
Rend3 rev = "9065f1e".
The text was updated successfully, but these errors were encountered: