Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to avoid Timeout errors #1565

Open
sotrh opened this issue Jun 27, 2021 · 17 comments
Open

How to avoid Timeout errors #1565

sotrh opened this issue Jun 27, 2021 · 17 comments
Labels
help required We need community help to make this happen. type: bug Something isn't working

Comments

@sotrh
Copy link

sotrh commented Jun 27, 2021

My tutorial will occasionally encounter a SwapChainError::Timeout during normal execution. Currently I'm just kidding the error out, but I was wondering is there a way to avoid this error? Is there someway to sync the simulation with the refresh rate of the screen?

This is related to an issue in my tutorial: sotrh/learn-wgpu#195

@cwfitzgerald cwfitzgerald added help required We need community help to make this happen. type: bug Something isn't working labels Jun 27, 2021
@kvark
Copy link
Member

kvark commented Jun 27, 2021

The syncing should happen automatically on the get_current_texture call. You'd only get a timeout if something goes wrong. I wonder what happens if you bump the timeout value?

@eulertour
Copy link

How do you get or change the timeout value? get_current_texture doesn't seem to exist, do you mean get_current_frame?

@kvark
Copy link
Member

kvark commented Jun 28, 2021

const FRAME_TIMEOUT_MS: u32 = 1000;

@sotrh
Copy link
Author

sotrh commented Jun 28, 2021

Some more context:

Timeout
[2021-06-28T15:34:00Z ERROR gfx_backend_vulkan] 
    VALIDATION [UNASSIGNED-CoreValidation-DrawState-InvalidFence (0x130a4370)] : Validation Error: [ UNASSIGNED-CoreValidation-DrawState-InvalidFence ] Object 0: handle = 0x120000000012, type = VK_OBJECT_TYPE_FENCE; | MessageID = 0x130a4370 | VkFence 0x120000000012[] is already in use by another submission.
    object info: (type: FENCE, hndl: 0x120000000012)

EDIT:

This happens after 5 minutes or so of runtime on my swapchain tutorial

@eulertour
Copy link

I gradually increased the timeout value up to 20000 with no effect. I also get the timeout errors much more often than every 5 minutes, roughly 350 errors for each render with no error.

@kvark
Copy link
Member

kvark commented Jul 1, 2021

What platforms are we talking about here?

@eulertour
Copy link

I'm using Arch Linux with SwayWM which runs under Wayland, but this also happens on bspwm which runs under X.
glxinfo | grep Device shows
Device: AMD Radeon RX 5700 XT (NAVI10, DRM 3.40.0, 5.12.13-arch1-1, LLVM 12.0.0) (0x731f).

@kvark
Copy link
Member

kvark commented Jul 1, 2021

The timeout could happen if our back-pressue doesn't work as expected. I.e. get_current_frame doesn't actually make CPU wait for GPU to process the old frame.

@eulertour
Copy link

I can try more things locally if that'd be helpful.

@eulertour
Copy link

I found that the timeout errors don't occur if I pass PresentMode::Mailbox or PresentMode::Immediate to the SwapChainDescriptor, but after a few seconds the program will crash with

thread 'main' panicked at 'Error in Queue::submit: not enough memory left', /home/devneal/github/wgpu/wgpu/src/backend/direct.rs:114:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Passing RUST_BACKTRACE=1 shows the following backtrace:

stack backtrace:
   0: rust_begin_unwind
             at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/panicking.rs:493:5
   1: std::panicking::begin_panic_fmt
             at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/panicking.rs:435:5
   2: wgpu::backend::direct::Context::handle_error_fatal
             at /home/devneal/github/wgpu/wgpu/src/backend/direct.rs:114:9
   3: <wgpu::backend::direct::Context as wgpu::Context>::queue_submit
             at /home/devneal/github/wgpu/wgpu/src/backend/direct.rs:1951:25
   4: wgpu::Queue::submit
             at /home/devneal/github/wgpu/wgpu/src/lib.rs:2977:9
   5: tutorial3_pipeline::State::render
             at ./src/main.rs:172:9
   6: tutorial3_pipeline::main::{{closure}}
             at ./src/main.rs:218:23
   7: winit::platform_impl::platform::sticky_exit_callback
             at /home/devneal/.cargo/registry/src/github.com-1ecc6299db9ec823/winit-0.24.0/src/platform_impl/linux/mod.rs:736:5
   8: winit::platform_impl::platform::wayland::event_loop::EventLoop<T>::run_return
             at /home/devneal/.cargo/registry/src/github.com-1ecc6299db9ec823/winit-0.24.0/src/platform_impl/linux/wayland/event_loop/mod.rs:376:21
   9: winit::platform_impl::platform::wayland::event_loop::EventLoop<T>::run
             at /home/devneal/.cargo/registry/src/github.com-1ecc6299db9ec823/winit-0.24.0/src/platform_impl/linux/wayland/event_loop/mod.rs:191:9
  10: winit::platform_impl::platform::EventLoop<T>::run
             at /home/devneal/.cargo/registry/src/github.com-1ecc6299db9ec823/winit-0.24.0/src/platform_impl/linux/mod.rs:652:56
  11: winit::event_loop::EventLoop<T>::run
             at /home/devneal/.cargo/registry/src/github.com-1ecc6299db9ec823/winit-0.24.0/src/event_loop.rs:154:9
  12: tutorial3_pipeline::main
             at ./src/main.rs:188:5
  13: core::ops::function::FnOnce::call_once
             at /home/devneal/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

If someone could point to where the differences between PresentMode::Fifo and the other options are implemented that would help.

@Imberflur
Copy link
Contributor

Could this be related? #1218

@eulertour
Copy link

The symptoms are pretty similar, but that's the most I can say.

@kvark
Copy link
Member

kvark commented Jul 6, 2021

@eulertour one big OOM bug was fixed in #1598. Do you still get OOM after it?

@eulertour
Copy link

Running from the latest master (e5142b3) eliminates the OOM errors, so setting PresentMode::Mailbox or PresentMode::Immediate runs without error. PresentMode::Fifo is the same as before with timeout errors but no OOM errors.

@imxood
Copy link

imxood commented Mar 21, 2022

2022-03-21T13:49:21.082118Z  INFO winit::platform_impl::platform::x11::window: Guessed window scale factor: 1.25    
2022-03-21T13:49:21.143371Z  INFO bevy_render::renderer: AdapterInfo { name: "AMD RADV POLARIS10 (ACO)", vendor: 4098, device: 28639, device_type: DiscreteGpu, backend: Vulkan }
thread 'main' panicked at 'Failed to acquire next swap chain texture!: Timeout', /home/maxu/.cargo/registry/src/github.com-1ecc6299db9ec823/bevy_render-0.6.1/src/view/window.rs:161:24
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

WGPU_BACKEND=gl cargo run
this will run successfully

@imxood
Copy link

imxood commented Mar 21, 2022

2022-03-21T13:55:59.193873Z  INFO bevy_render::renderer: AdapterInfo { name: "AMD Radeon RX 590 GME (POLARIS10, DRM 3.42.0, 5.15.1-amd64-desktop, LLVM 11.0.1)", vendor: 4098, device: 0, device_type: DiscreteGpu, backend: Gl }

@blaind
Copy link

blaind commented Nov 12, 2022

I'm having similar problem, and got here from the bevy issue mentioned above.

Adapter Info: Adapter Vulkan AdapterInfo { name: "AMD RADV RENOIR", vendor: 4098, device: 5686, device_type: IntegratedGpu, driver: "radv", driver_info: "Mesa 22.0.5", backend: Vulkan

Could reproduce with the hello-triangle.rs example, but with a few changes (using winit Poll mode + rendering in maineventscleared -event. Reproducible by locking the screen (Ubuntu, Wayland, 22.04).

Increasing FRAME_TIMEOUT_MS to 3000 fixes the problem, (experimented with some values, 2000 still produces the error). See blaind@48463bb NOTE! that branch doesn't emit the bug currently, since it also contains the FRAME_TIMEOUT_MS fix.

Here's a log what happens if the timeout is 1000ms:

[2022-11-12T16:33:49Z TRACE wgpu_core::track::texture] 	tex 0: insert start UNINITIALIZED
[2022-11-12T16:33:49Z DEBUG wgpu_core::device] Create view for texture (0, 116, Vulkan) filters usages to COLOR_TARGET
[2022-11-12T16:33:49Z TRACE wgpu_core::command::render] Encoding render pass begin in command buffer (0, 116, Vulkan)
[2022-11-12T16:33:49Z TRACE wgpu_core::command::render] Merging renderpass into cmd_buf (0, 116, Vulkan)
[2022-11-12T16:33:49Z TRACE wgpu_core::track::texture] 	tex 0: insert start COLOR_TARGET
[2022-11-12T16:33:49Z TRACE wgpu_core::track::texture] 	tex 0: insert start COLOR_TARGET
[2022-11-12T16:33:49Z TRACE wgpu_core::command] Command buffer (0, 116, Vulkan)
[2022-11-12T16:33:49Z TRACE wgpu_core::track::texture] 	tex 0: insert start PRESENT
[2022-11-12T16:33:49Z TRACE wgpu_core::device::queue] Stitching command buffer (0, 116, Vulkan) before submission
[2022-11-12T16:33:49Z TRACE wgpu_core::track::texture] 	tex 0: transition simple UNINITIALIZED -> COLOR_TARGET
[2022-11-12T16:33:49Z TRACE wgpu_core::track::texture] 	tex 0: transition simple COLOR_TARGET -> PRESENT
[2022-11-12T16:33:49Z TRACE wgpu_core::device::queue] Device after submission 116
[2022-11-12T16:33:49Z DEBUG wgpu_core::device::life] Texture view Valid((0, 58, Vulkan)) will be destroyed
[2022-11-12T16:33:49Z TRACE wgpu_core::device::life] Active submission 115 is done
[2022-11-12T16:33:49Z DEBUG wgpu_core::present] Removing swapchain texture Valid((0, 116, Vulkan)) from the device tracker
[2022-11-12T16:33:49Z DEBUG wgpu_core::present] Presented. End of Frame
[2022-11-12T16:33:49Z DEBUG wgpu_core::device] texture view (1, 58, Vulkan) is dropped
EVENT: RedrawEventsCleared
EVENT: NewEvents(Poll)
EVENT: MainEventsCleared
Acquiring texture...
ERR: TIMEOUT, timeout_ns was: 1000000000
thread 'main' panicked at 'Failed to acquire next swap chain texture: Timeout', wgpu/examples/hello-triangle/main.rs:104:22
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
[2022-11-12T16:33:50Z INFO  wgpu_core::hub] Dropping Global
[2022-11-12T16:33:50Z TRACE wgpu_core::device::life] Active submission 116 is done
[2022-11-12T16:33:51Z INFO  wgpu_core::device] Destroying 2 command encoders

Maybe the error is in the driver, which somehow doesn't release previous swapchains fast enough when its doing something internal related to the screen shut down.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help required We need community help to make this happen. type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

7 participants