Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Render node GPU timing overlay #8067

Closed
wants to merge 42 commits into from
Closed

Conversation

JMS55
Copy link
Contributor

@JMS55 JMS55 commented Mar 13, 2023

image

Objective

  • When developing rendering features, it's useful to know how expensive they are. Tracy already lets us measure CPU time per-node, but we don't currently have a way of measuring how long the GPU spends on each render node.
  • Developers can have bevy open to one side, and their IDE to another. Using shader hot-reloading, they can modify a shader, and see how that affects performance and quality in real time.

Solution

  • Use wgpu-profiler to insert timestamp queries when WgpuFeatures::TIMESTAMP_QUERY is enabled.
    • Query results are put into a resource shared between the main and render world with a mutex.
  • Add DebugOverlaysPlugin to display a UI showing the timer query results.
    • Every 100ms, the plugin reads from the query result resource and calculates aggregate statistics, then updates the UI to show the results.
  • In addition to adding the plugin, make sure to run your program using the new gpu locking script (NVIDIA only) in order to ensure you get stable results.

TODO

  • Support displaying nested spans
  • Support trace_chrome
  • Update profiling docs

Changelog

  • Added support for aggregating GPU pass runtimes based on WgpuFeatures::TIMESTAMP_QUERY.
  • Added bevy_debug_overlays and DebugOverlaysPlugin for displaying GPU running time of a bevy app.

@JMS55 JMS55 added A-Rendering Drawing game state to the screen A-Diagnostics Logging, crash handling, error reporting and performance analysis labels Mar 13, 2023
@JMS55 JMS55 added this to the 0.11 milestone Mar 13, 2023
@JMS55 JMS55 added the A-Editor Graphical tools to make Bevy games label Mar 13, 2023
@IceSentry
Copy link
Contributor

Why only refresh every second? It seems a bit slow to me.

There's one issue with this, I'm not sure if it's a blocker, but the fact that it uses bevy_ui means that if a game already has a ui it can't reuse this plugin. We would need to be able to draw debug ui on top of everything including other ui nodes.

One thing that would be cool in the future is to have the ability to draw shaders in a bevy_ui node which would make rendering a frametime graph really easy.

@JMS55
Copy link
Contributor Author

JMS55 commented Mar 13, 2023

Why only refresh every second? It seems a bit slow to me.

If I refresh every frame, the numbers would move too fast to read. We can tweak how fast it refreshes, maybe it should be 0.5s. I could also make it configurable.

There's one issue with this, I'm not sure if it's a blocker, but the fact that it uses bevy_ui means that if a game already has a ui it can't reuse this plugin. We would need to be able to draw debug ui on top of everything including other ui nodes.

Is this a problem? I haven't tested it with anything that has UI already. We can probably solve this with Z-layers or camera orders or something.

One thing that would be cool in the future is to have the ability to draw shaders in a bevy_ui node which would make rendering a frametime graph really easy.

Agreed. I've also seen users request this feature.

@IceSentry
Copy link
Contributor

IceSentry commented Mar 13, 2023

Yeah, making the timing configurable does seem like the best option.

Is this a problem?

Not necessarily, but it's possible that people do weird thing with their UIs and that enabling this plugin breaks it. Or just that z-layers won't be enough. Probably not worth blocking on it though. Just something to consider for the future I guess.

Wonder if we could just reserve some z-layers for internal stuff?

@james7132
Copy link
Member

I'm also not sure a UI display for this is the most apt approach since this is always compiled in. We may want to instead expose spans or counters to profilers instead.

@superdump
Copy link
Contributor

You could display every frame and then a sliding average as well. Or a sliding max is perhaps more interesting?

@JMS55
Copy link
Contributor Author

JMS55 commented Mar 13, 2023

I'm also not sure a UI display for this is the most apt approach since this is always compiled in. We may want to instead expose spans or counters to profilers instead.

That's basically what this PR does. It optionally allows recording GPU pass durations (not on by default), and then adds a plugin (not added by default) that lets you display the results.

@JMS55
Copy link
Contributor Author

JMS55 commented Mar 14, 2023

You could display every frame and then a sliding average as well. Or a sliding max is perhaps more interesting?

Currently it's an exponential moving average, with the UI updated every 300ms (300/16.6 ~= 18 frames @ 60fps). Maybe I should do something like the average of a 20 frame sliding window, with stddev/variance of the window alongside it? Let me know what you think.

@JMS55
Copy link
Contributor Author

JMS55 commented May 26, 2023

Maybe this should be moved to bevy_diagnostics under an optional feature, if that won't cause cyclic dependencies...

wgpu-hal = "0.16.0"
wgpu-profiler = "0.12"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we consider making this an optional dependency / putting it behind a feature flag? Deployed games generally won't need / want this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me.

@@ -90,6 +90,9 @@ Install [cargo-flamegraph](https://github.com/flamegraph-rs/flamegraph), [enable
After closing your app, an interactive `svg` file will be produced:
![image](https://user-images.githubusercontent.com/2694663/141657609-0089675d-fb6a-4dc4-9a59-871e95e31c8a.png)

## DebugOverlaysPlugin
TODO
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should have a quick blurb before merging.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I'm waiting to write it until everything else in the PR is done and ready to merge.

}

fn draw_gpu_time_overlay(
gpu_timers: Res<GpuTimerScopes>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bevy_diagnostics was built to cover scenarios like this. It is intended to be a generic interface for collecting and reporting changing values over time. The intent being that it allows us to build "common" visualization tools (things like this overlay, timeline graph visualiztions in the editor, etc). Did you consider using that? Were there gaps that prevented you from using it? If so, we should fill those gaps and port this over.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the UI part specifically, bevy_render depends on on bevy_diagnostics, but bevy_ui depends on bevy_render, so it would be circular dependencies :/.

There's no reason the aggregating statistics part couldn't be on bevy_diagnostics though. We could do something like bevy_diagnostics aggregates the data produced by bevy_render, and then in another crate (a new one? bevy_core_pipeline?) we do the visual UI.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the visualization logic definitely shouldn't be in bevy_diagnostics, bevy_render, or bevy_ui.

This should be the dep tree:
bevy_render -> bevy_diagnostics
nice_diagnostic_visualizer -> bevy_diagnostics
nice_diagnostic_visualizer -> bevy_ui -> bevy_render

I think we'll probably also want a "diagnostic category label" system so we can dump a bunch of render pass diagnostics in some "gpu timing" category. Then you could do something like DiagnosticVisualizerPlugin::new().show(GpuTimingDiagnostics)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily saying that this needs to be done in this PR, but it would make all of this visualization work generically useful / doesn't introduce quality debt. I can also take a stab if you want.

Copy link
Member

@alice-i-cecile alice-i-cecile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concerns are resolved, but this isn't quite complete yet. It's in the milestone though: feel free to bump to 0.12 if you won't have the time.

@JMS55 JMS55 modified the milestones: 0.11, 0.12 Jun 26, 2023
@B-head B-head mentioned this pull request Jul 4, 2023
8 tasks
.find(|a| a.label == timer.label)
{
Some(a) => {
a.mean_duration += timer_duration / 20.0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should at least be a constant that GpuTimerScopes also uses. User configurable would be even better.

@@ -0,0 +1,19 @@
#!/bin/bash

graphics=$(nvidia-smi --query-supported-clocks=graphics --format=csv | sed -n 2p | tr -d -c 0-9)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imo we should have a windows version of this and also something for AMD if possible since this feature just won't work at all without this?

if timer.max_duration - timer.min_duration > 0.05 {
writeln!(
durations,
"{:.3} ({:.3}, {:.3})",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like too little precision, maybe .4?

@JMS55
Copy link
Contributor Author

JMS55 commented Aug 5, 2023

A nice visualization technique I've seen: Each top level profiling group/span is associated with a random(?) color, and all sub-spans are the same color (and indented one).

Visualizations I've seen also tend not to sort by timing, and keep the order stable.

@JMS55
Copy link
Contributor Author

JMS55 commented Aug 13, 2023

Instead of locking to max clocks, the script should probably lock to GPU base clocks. This is what NSight does.

@JMS55 JMS55 removed this from the 0.12 milestone Aug 13, 2023
@JMS55 JMS55 marked this pull request as draft August 15, 2023 17:46
@JMS55 JMS55 mentioned this pull request Mar 8, 2024
github-merge-queue bot pushed a commit that referenced this pull request Mar 17, 2024
# Objective

It's useful to have access to render pipeline statistics, since they
provide more information than FPS alone. For example, the number of
drawn triangles can be used to debug culling and LODs. The number of
fragment shader invocations can provide a more stable alternative metric
than GPU elapsed time.

See also: Render node GPU timing overlay #8067, which doesn't provide
pipeline statistics, but adds a nice overlay.

## Solution

Add `RenderDiagnosticsPlugin`, which enables collecting pipeline
statistics and CPU & GPU timings.

---

## Changelog

- Add `RenderDiagnosticsPlugin`
- Add `RenderContext::diagnostic_recorder` method

---------

Co-authored-by: Alice Cecile <[email protected]>
@pablo-lua pablo-lua added the A-Dev-Tools Tools used to debug Bevy applications. label Mar 18, 2024
@JMS55 JMS55 closed this Mar 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Dev-Tools Tools used to debug Bevy applications. A-Diagnostics Logging, crash handling, error reporting and performance analysis A-Editor Graphical tools to make Bevy games A-Rendering Drawing game state to the screen
Projects
Status: Responded
Status: Done
Development

Successfully merging this pull request may close these issues.

10 participants