Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What to do about running out of resources. #533

Open
John-Nagle opened this issue Dec 4, 2023 · 8 comments
Open

What to do about running out of resources. #533

John-Nagle opened this issue Dec 4, 2023 · 8 comments

Comments

@John-Nagle
Copy link
Contributor

Everything is going great, and then, suddenly, there's a panic in Rend3 due to hitting the limit on the number of vertices in a bind group or filling GPU memory. This needs to be handled more gracefully.

Suggestions:

  • Error returns for the "add_... " functions. After arcanization, those should be synchronous and callable from any thread, rather than just queuing instructions for the main thread. So, they can return an error if resources are not available. These are the key items, the ones that can use large amounts of memory:

pub fn add_mesh(self: &Arc<Self>, mesh: Mesh) -> Result<MeshHandle, Error>

pub fn add_texture_2d(self: &Arc<Self>, texture: Texture) -> Result<Texture2DHandle, Error>

Now that Sharpview can display more than one Second Life region, if the user goes to a crowded area, the bind group limit is hit, even though the GPU is only about half full.

@John-Nagle
Copy link
Contributor Author

Tried new Rend3, rev "e1cfe1b". Modified code to detect error returns from mesh, texture creation. Just panics for now.
Seems to be basically working.

Many log entries of the form:

05:52:15 [ERROR] AllocationErrorScope dropped without calling end``

which is a message not seen before.

As expected, caught an allocation error with:

=========> Panic Rend3 error: ExceededMaximumBufferSize { max_buffer_size: 2147483648 } at file libscene/src/render/rendutils.rs, line 106 in thread Asset fetch #8.
Backtrace:
 libcommon::common::commonutils::catch_panic::{{closure}}
             at /home/john/projects/sl/SL-test-viewer/libcommon/src/common/commonutils.rs:215:25
 libscene::render::rendutils::convert_rend3_error
             at /home/john/projects/sl/SL-test-viewer/libscene/src/render/rendutils.rs:106:43
 libscene::render::rendutils::convert_rend3_handle_result
             at /home/john/projects/sl/SL-test-viewer/libscene/src/render/rendutils.rs:114:23
 libscene::render::renderregistry::RenderFaceMeshMapping::new
             at /home/john/projects/sl/SL-test-viewer/libscene/src/render/renderregistry.rs:122:35
 libscene::render::renderregistry::RenderFaceMeshGroup::build_render_face_mesh_group

indicating that I hit the bind limit while creating a mesh. As expected.

It would be helpful if the Rend3 errors were Send. They can't be converted into "anyhow" errors, or be copied or cloned, which is inconvenient. It's that "inner" entry that's a WGPU error that causes the trouble, because it's not Send.

@John-Nagle
Copy link
Contributor Author

John-Nagle commented Dec 11, 2023

Subscript out of range error:

06:24:13 [ERROR] Asset request  (AssetRequestTimestamped { request: AssetRequest { content: Mesh(MeshRequest { uuid: 7bdbe80d-6d20-65a2-56fe-9d9d1994e78a }), capability: "http://asset-cdn.glb.agni.lindenlab.com" }, timestamp: Instant { tv_sec: 1378365, tv_nsec: 660984636 } }, 1999998) failed: Mesh load, trouble spot: Region (1808,1199) <67.697105,117.189835,3553.6982>

Caused by:
    ExceededMaximumBufferSize { max_buffer_size: 2147483648 }
06:24:13 [ERROR] Asset request  (AssetRequestTimestamped { request: AssetRequest { content: Mesh(MeshRequest { uuid: 4a881d54-4b74-63e1-c46c-98f8d4265bc8 }), capability: "http://asset-cdn.glb.agni.lindenlab.com" }, timestamp: Instant { tv_sec: 1378363, tv_nsec: 706289265 } }, 999999) failed: Mesh load, trouble spot: Region (1808,1199) <114.52486,129.0136,35.77748>

Caused by:
    ExceededMaximumBufferSize { max_buffer_size: 2147483648 }
06:24:13 [ERROR] =========> Panic index out of bounds: the len is 30438 but the index is 30438 at file /home/john/.cargo/git/checkouts/rend3-e03f89403de3386a/e1cfe1b/rend3/src/managers/mesh.rs, line 213 in thread main.
Backtrace:
 libcommon::common::commonutils::catch_panic::{{closure}}
             at /home/john/projects/sl/SL-test-viewer/libcommon/src/common/commonutils.rs:215:25
 rend3::managers::mesh::MeshManager::remove
             at /home/john/.cargo/git/checkouts/rend3-e03f89403de3386a/e1cfe1b/rend3/src/managers/mesh.rs:213:36
 rend3::renderer::eval::evaluate_instructions
             at /home/john/.cargo/git/checkouts/rend3-e03f89403de3386a/e1cfe1b/rend3/src/renderer/eval.rs:134:21
 rend3::renderer::Renderer::evaluate_instructions
             at /home/john/.cargo/git/checkouts/rend3-e03f89403de3386a/e1cfe1b/rend3/src/renderer/mod.rs:451:9
 <sharpview::AppUi as rend3_framework::App>::handle_event
             at /home/john/projects/sl/SL-test-viewer/sharpview/src/main.rs:555:39
 rend3_framework::async_start::{{closure}}::{{closure}}
             at /home/john/.cargo/git/checkouts/rend3-e03f89403de3386a/e1cfe1b/rend3-framework/src/lib.rs:335:9
 winit::platform_impl::platform::sticky_exit_callback
             at /home/john/.cargo/registry/src/index.crates.io-6f17d22bba15001f/winit-0.28.7/src/platform_impl/linux/mod.rs:884:9
 winit::platform_impl::platform::x11::EventLoop<T>::run_return::single_iteration
             at /home/john/.cargo/registry/src/index.crates.io-6f17d22bba15001f/winit-0.28.7/src/platform_impl/linux/x11/mod.rs:375:21
 winit::platform_impl::platform::x11::EventLoop<T>::run_return
             at /home/john/.cargo/registry/src/index.crates.io-6f17d22bba15001f/winit-0.28.7/src/platform_impl/linux/x11/mod.rs:483:27
 winit::platform_impl::platform::x11::EventLoop<T>::run
             at /home/john/.cargo/registry/src/index.crates.io-6f17d22bba15001f/winit-0.28.7/src/platform_impl/linux/x11/mod.rs:498:25
06:24:13 [WARN] From (462848,306944), message: MsgObjectUpdateCompressed
06:24:13 [WARN] Error applying compressed update: Object #546594690 in [(462848,306944)] at <-0.220425,0.000397,-0.856916>

So, here, Sharpview kept loading meshes until it hit the limit, then treated that as a non-fatal error and kept going. After a few more requests, there was a subscript out of range in the mesh manager.

@cwfitzgerald
Copy link
Member

cwfitzgerald commented Dec 11, 2023

The issue for AllocationErrorScope dropped without calling end has been fixed on trunk.

Will look into the subscript issue.

Interesting, didn't realize wgpu errors weren't send. That's an issue in wgpu, but can fix it here too.

@John-Nagle
Copy link
Contributor Author

Sounds good. The subscript error seems to indicate that repeatedly banging against the limit breaks something.

Now that I can detect out-of-memory errors, I have to do something about them beyond ignoring them. This will take some work. I can switch meshes to a lower level of detail, which was planned anyway and is partly implemented. Thanks for the quick response.

@John-Nagle
Copy link
Contributor Author

Question: if Rend3/WGPU reaches the ExceededMaximumBufferSize, are there internal allocations which will cause crashes? I can get vertex consumption down via the level of detail system, but it may take a second or so for a scan to decide what to remove. Rend3 is still drawing during that period.

I have my own vertex count, and my intent is to reduce my vertex count around 90% of the level that triggered ExceededMaximumBufferSize. I don't want to be hitting the limit constantly. But I have to hit it a few times to discover it.

@cwfitzgerald
Copy link
Member

are there internal allocations which will cause crashes?

There shouldn't be. If you encounter any, I'd consider it a bug.

Sounds good. The subscript error seems to indicate that repeatedly banging against the limit breaks something.

I figured it out, should be a simple fix, will apply it once I'm done with work.

@John-Nagle
Copy link
Contributor Author

Sounds good. I've sketched out a design for my code for running right up to the limit and then backing off. Non-trivial but can be done.

@cwfitzgerald cwfitzgerald mentioned this issue Dec 12, 2023
7 tasks
@cwfitzgerald
Copy link
Member

Alright, #539 should have fixed that issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants