Render node GPU timing overlay #8067

JMS55 · 2023-03-13T01:54:51Z

Objective

When developing rendering features, it's useful to know how expensive they are. Tracy already lets us measure CPU time per-node, but we don't currently have a way of measuring how long the GPU spends on each render node.
Developers can have bevy open to one side, and their IDE to another. Using shader hot-reloading, they can modify a shader, and see how that affects performance and quality in real time.

Solution

Use wgpu-profiler to insert timestamp queries when WgpuFeatures::TIMESTAMP_QUERY is enabled.
- Query results are put into a resource shared between the main and render world with a mutex.
Add DebugOverlaysPlugin to display a UI showing the timer query results.
- Every 100ms, the plugin reads from the query result resource and calculates aggregate statistics, then updates the UI to show the results.
In addition to adding the plugin, make sure to run your program using the new gpu locking script (NVIDIA only) in order to ensure you get stable results.

TODO

Support displaying nested spans
Support trace_chrome
Update profiling docs

Changelog

Added support for aggregating GPU pass runtimes based on WgpuFeatures::TIMESTAMP_QUERY.
Added bevy_debug_overlays and DebugOverlaysPlugin for displaying GPU running time of a bevy app.

IceSentry · 2023-03-13T03:29:51Z

Why only refresh every second? It seems a bit slow to me.

There's one issue with this, I'm not sure if it's a blocker, but the fact that it uses bevy_ui means that if a game already has a ui it can't reuse this plugin. We would need to be able to draw debug ui on top of everything including other ui nodes.

One thing that would be cool in the future is to have the ability to draw shaders in a bevy_ui node which would make rendering a frametime graph really easy.

JMS55 · 2023-03-13T03:33:34Z

Why only refresh every second? It seems a bit slow to me.

If I refresh every frame, the numbers would move too fast to read. We can tweak how fast it refreshes, maybe it should be 0.5s. I could also make it configurable.

There's one issue with this, I'm not sure if it's a blocker, but the fact that it uses bevy_ui means that if a game already has a ui it can't reuse this plugin. We would need to be able to draw debug ui on top of everything including other ui nodes.

Is this a problem? I haven't tested it with anything that has UI already. We can probably solve this with Z-layers or camera orders or something.

One thing that would be cool in the future is to have the ability to draw shaders in a bevy_ui node which would make rendering a frametime graph really easy.

Agreed. I've also seen users request this feature.

IceSentry · 2023-03-13T04:12:25Z

Yeah, making the timing configurable does seem like the best option.

Is this a problem?

Not necessarily, but it's possible that people do weird thing with their UIs and that enabling this plugin breaks it. Or just that z-layers won't be enough. Probably not worth blocking on it though. Just something to consider for the future I guess.

Wonder if we could just reserve some z-layers for internal stuff?

james7132 · 2023-03-13T05:35:38Z

I'm also not sure a UI display for this is the most apt approach since this is always compiled in. We may want to instead expose spans or counters to profilers instead.

superdump · 2023-03-13T07:34:35Z

You could display every frame and then a sliding average as well. Or a sliding max is perhaps more interesting?

JMS55 · 2023-03-13T15:06:52Z

I'm also not sure a UI display for this is the most apt approach since this is always compiled in. We may want to instead expose spans or counters to profilers instead.

That's basically what this PR does. It optionally allows recording GPU pass durations (not on by default), and then adds a plugin (not added by default) that lets you display the results.

JMS55 · 2023-03-14T04:30:50Z

You could display every frame and then a sliding average as well. Or a sliding max is perhaps more interesting?

Currently it's an exponential moving average, with the UI updated every 300ms (300/16.6 ~= 18 frames @ 60fps). Maybe I should do something like the average of a 20 frame sliding window, with stddev/variance of the window alongside it? Let me know what you think.

crates/bevy_debug_overlays/src/lib.rs

…e-overlay

JMS55 · 2023-05-26T17:44:53Z

Maybe this should be moved to bevy_diagnostics under an optional feature, if that won't cause cyclic dependencies...

…e-overlay

cart · 2023-06-16T19:17:17Z

crates/bevy_render/Cargo.toml

 wgpu-hal = "0.16.0"
+wgpu-profiler = "0.12"


Should we consider making this an optional dependency / putting it behind a feature flag? Deployed games generally won't need / want this.

Makes sense to me.

cart · 2023-06-16T19:20:10Z

docs/profiling.md

@@ -90,6 +90,9 @@ Install [cargo-flamegraph](https://github.com/flamegraph-rs/flamegraph), [enable
 After closing your app, an interactive `svg` file will be produced:
 ![image](https://user-images.githubusercontent.com/2694663/141657609-0089675d-fb6a-4dc4-9a59-871e95e31c8a.png)

+## DebugOverlaysPlugin
+TODO


This should have a quick blurb before merging.

Yep, I'm waiting to write it until everything else in the PR is done and ready to merge.

cart · 2023-06-16T19:26:35Z

crates/bevy_debug_overlays/src/lib.rs

+}
+
+fn draw_gpu_time_overlay(
+    gpu_timers: Res<GpuTimerScopes>,


bevy_diagnostics was built to cover scenarios like this. It is intended to be a generic interface for collecting and reporting changing values over time. The intent being that it allows us to build "common" visualization tools (things like this overlay, timeline graph visualiztions in the editor, etc). Did you consider using that? Were there gaps that prevented you from using it? If so, we should fill those gaps and port this over.

For the UI part specifically, bevy_render depends on on bevy_diagnostics, but bevy_ui depends on bevy_render, so it would be circular dependencies :/.

There's no reason the aggregating statistics part couldn't be on bevy_diagnostics though. We could do something like bevy_diagnostics aggregates the data produced by bevy_render, and then in another crate (a new one? bevy_core_pipeline?) we do the visual UI.

Yeah the visualization logic definitely shouldn't be in bevy_diagnostics, bevy_render, or bevy_ui.

This should be the dep tree:
bevy_render -> bevy_diagnostics
nice_diagnostic_visualizer -> bevy_diagnostics
nice_diagnostic_visualizer -> bevy_ui -> bevy_render

I think we'll probably also want a "diagnostic category label" system so we can dump a bunch of render pass diagnostics in some "gpu timing" category. Then you could do something like DiagnosticVisualizerPlugin::new().show(GpuTimingDiagnostics)

Not necessarily saying that this needs to be done in this PR, but it would make all of this visualization work generically useful / doesn't introduce quality debt. I can also take a stab if you want.

tools/nvidia-gpu-lock.sh

alice-i-cecile

My concerns are resolved, but this isn't quite complete yet. It's in the milestone though: feel free to bump to 0.12 if you won't have the time.

…e-overlay

DGriffin91 · 2023-07-28T07:48:15Z

crates/bevy_debug_overlays/src/lib.rs

+                .find(|a| a.label == timer.label)
+            {
+                Some(a) => {
+                    a.mean_duration += timer_duration / 20.0;


I think this should at least be a constant that GpuTimerScopes also uses. User configurable would be even better.

DGriffin91 · 2023-07-28T07:49:17Z

tools/nvidia-gpu-lock.sh

@@ -0,0 +1,19 @@
+#!/bin/bash
+
+graphics=$(nvidia-smi --query-supported-clocks=graphics --format=csv | sed -n 2p | tr -d -c 0-9)


imo we should have a windows version of this and also something for AMD if possible since this feature just won't work at all without this?

DGriffin91 · 2023-07-28T07:50:13Z

crates/bevy_debug_overlays/src/lib.rs

+        if timer.max_duration - timer.min_duration > 0.05 {
+            writeln!(
+                durations,
+                "{:.3} ({:.3}, {:.3})",


Seems like too little precision, maybe .4?

JMS55 · 2023-08-05T06:22:40Z

A nice visualization technique I've seen: Each top level profiling group/span is associated with a random(?) color, and all sub-spans are the same color (and indented one).

Visualizations I've seen also tend not to sort by timing, and keep the order stable.

JMS55 · 2023-08-13T05:40:59Z

Instead of locking to max clocks, the script should probably lock to GPU base clocks. This is what NSight does.

# Objective It's useful to have access to render pipeline statistics, since they provide more information than FPS alone. For example, the number of drawn triangles can be used to debug culling and LODs. The number of fragment shader invocations can provide a more stable alternative metric than GPU elapsed time. See also: Render node GPU timing overlay #8067, which doesn't provide pipeline statistics, but adds a nice overlay. ## Solution Add `RenderDiagnosticsPlugin`, which enables collecting pipeline statistics and CPU & GPU timings. --- ## Changelog - Add `RenderDiagnosticsPlugin` - Add `RenderContext::diagnostic_recorder` method --------- Co-authored-by: Alice Cecile <[email protected]>

JMS55 added 8 commits March 12, 2023 00:24

Add gpu timers to render graph

4fc067a

Add gpu timers resource

4326f5a

Misc

132d073

Start bevy_debug_overlays

19741c2

Misc

ab93b13

Fix

91a224d

WIP debug overlay UI

3a2f161

Reverse sort list order

1404c42

JMS55 added A-Rendering Drawing game state to the screen A-Diagnostics Logging, crash handling, error reporting and performance analysis labels Mar 13, 2023

JMS55 added this to the 0.11 milestone Mar 13, 2023

JMS55 added the A-Editor Graphical tools to make Bevy games label Mar 13, 2023

JMS55 added 3 commits March 12, 2023 22:54

Major UI improvements

e3f7e3a

Minor UI tweaks

bcef3b4

Remove node from overlay names

7442de9

Improve UI layout

b7945ad

Increase UI update frequency

3e7dfcc

doup reviewed Mar 15, 2023

View reviewed changes

crates/bevy_debug_overlays/src/lib.rs Outdated Show resolved Hide resolved

JMS55 added 5 commits March 15, 2023 14:13

Manually insert gpu timestamps

17d6acc

Misc refactor

dee684f

Add some documentation

092bfd7

Improve aggregation

e5c4ca7

Merge commit 'ae39b07d2627bb31b74b8d1d08d5c7552344d339' into gpu-usag…

bf2e31b

…e-overlay

JMS55 added 6 commits April 26, 2023 16:00

Merge commit '4d54ce14aaee8d7432380df37c41c03c28594b27' into gpu-usag…

25748d9

…e-overlay

Add TODO

43855ba

Merge commit '86aaad743b6842152f92f910f09a85e7c2f8b689' into gpu-usag…

8aa80de

…e-overlay

Fixes

3b53f83

Use default font

4396514

Removed unused import

cfdeefc

JMS55 added 2 commits June 10, 2023 12:16

Merge commit 'c1fd505f9c5c4d9daf3dad844c1bf5ff8ee6820f' into gpu-usag…

f2acd33

…e-overlay

Rebase fix

bbc414f

cart reviewed Jun 16, 2023

View reviewed changes

tools/nvidia-gpu-lock.sh Show resolved Hide resolved

Improve lock script compatibility

1e05986

alice-i-cecile reviewed Jun 19, 2023

View reviewed changes

JMS55 modified the milestones: 0.11, 0.12 Jun 26, 2023

B-head mentioned this pull request Jul 4, 2023

Rework winit runner #9034

Draft

8 tasks

LeshaInc mentioned this pull request Jul 12, 2023

Add pipeline statistics #9135

Merged

Merge commit '0566e73af460b9d128354a20970b1e1e0cc3b719' into gpu-usag…

6df0575

…e-overlay

DGriffin91 reviewed Jul 28, 2023

View reviewed changes

JMS55 removed this from the 0.12 milestone Aug 13, 2023

JMS55 marked this pull request as draft August 15, 2023 17:46

JMS55 mentioned this pull request Mar 8, 2024

Fps overlay #12382

Merged

pablo-lua added the A-Dev-Tools Tools used to debug Bevy applications. label Mar 18, 2024

pablo-lua mentioned this pull request Mar 18, 2024

Move the already existent tools into bevy_dev_tools and creation of new ones #12351

Open

9 tasks

JMS55 closed this Mar 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Render node GPU timing overlay #8067

Render node GPU timing overlay #8067

JMS55 commented Mar 13, 2023 •

edited

Loading

IceSentry commented Mar 13, 2023

JMS55 commented Mar 13, 2023

IceSentry commented Mar 13, 2023 •

edited

Loading

james7132 commented Mar 13, 2023

superdump commented Mar 13, 2023

JMS55 commented Mar 13, 2023

JMS55 commented Mar 14, 2023 •

edited

Loading

JMS55 commented May 26, 2023

cart Jun 16, 2023

JMS55 Jun 16, 2023

cart Jun 16, 2023

JMS55 Jun 16, 2023

cart Jun 16, 2023

JMS55 Jun 16, 2023

cart Jun 16, 2023

cart Jun 16, 2023

alice-i-cecile left a comment

DGriffin91 Jul 28, 2023

DGriffin91 Jul 28, 2023

DGriffin91 Jul 28, 2023

JMS55 commented Aug 5, 2023 •

edited

Loading

JMS55 commented Aug 13, 2023

		@@ -0,0 +1,19 @@
		#!/bin/bash

		graphics=$(nvidia-smi --query-supported-clocks=graphics --format=csv \| sed -n 2p \| tr -d -c 0-9)

Render node GPU timing overlay #8067

Render node GPU timing overlay #8067

Conversation

JMS55 commented Mar 13, 2023 • edited Loading

Objective

Solution

TODO

Changelog

IceSentry commented Mar 13, 2023

JMS55 commented Mar 13, 2023

IceSentry commented Mar 13, 2023 • edited Loading

james7132 commented Mar 13, 2023

superdump commented Mar 13, 2023

JMS55 commented Mar 13, 2023

JMS55 commented Mar 14, 2023 • edited Loading

JMS55 commented May 26, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alice-i-cecile left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JMS55 commented Aug 5, 2023 • edited Loading

JMS55 commented Aug 13, 2023

JMS55 commented Mar 13, 2023 •

edited

Loading

IceSentry commented Mar 13, 2023 •

edited

Loading

JMS55 commented Mar 14, 2023 •

edited

Loading

JMS55 commented Aug 5, 2023 •

edited

Loading