Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache System Tracing Spans #9390

Merged
merged 7 commits into from
Sep 13, 2023
Merged

Conversation

hymm
Copy link
Contributor

@hymm hymm commented Aug 9, 2023

Objective

  • Reduce the overhead of tracing by caching the system spans.

Yellow is this pr. Red is main.

image

@hymm hymm requested a review from james7132 August 9, 2023 00:11
@alice-i-cecile alice-i-cecile added C-Performance A change motivated by improving speed, memory usage or compile times A-Diagnostics Logging, crash handling, error reporting and performance analysis labels Aug 9, 2023
Copy link
Member

@james7132 james7132 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huzzah for less overhead! The impact is bigger than I expected. LGTM, just a few nits on implementation.

crates/bevy_ecs/src/schedule/executor/multi_threaded.rs Outdated Show resolved Hide resolved
crates/bevy_ecs/src/schedule/executor/multi_threaded.rs Outdated Show resolved Hide resolved
@james7132
Copy link
Member

Another thought: should we do this for the command application spans too?

@hymm
Copy link
Contributor Author

hymm commented Aug 9, 2023

Another thought: should we do this for the command application spans too?

I'll try it.

@hymm
Copy link
Contributor Author

hymm commented Aug 9, 2023

This is actually getting close to perf with the spans deleted. So it might fix #4892.
image

@superdump
Copy link
Contributor

Oh nice!

@james7132
Copy link
Member

Another, potentially harder, area to check if we can cache spans might be in QueryParIter and friends, but we can leave that to a follow-up PR if need be.

@hymm
Copy link
Contributor Author

hymm commented Aug 10, 2023

caching the commands span helped some more:

image

Another, potentially harder, area to check if we can cache spans might be in QueryParIter and friends, but we can leave that to a follow-up PR if need be.

Probably better to push off to a follow up. I tried to do something quick and reuse the same span for all the tasks, but it ended up running slower. There's probably some contention happening somewhere, so might need to make a separate span for each task.

@github-actions
Copy link
Contributor

Example alien_cake_addict failed to run, please try running it locally and check the result.

@hymm
Copy link
Contributor Author

hymm commented Aug 15, 2023

did something that might be slightly more controversial and moved the system spans into run_unsafe from System instead of in the executor. Felt like it belonged there more with being able to put the span in SystemMeta. perf didnt change.

@github-actions
Copy link
Contributor

Example alien_cake_addict failed to run, please try running it locally and check the result.

Copy link
Member

@aevyrie aevyrie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really nice improvement.

@mockersf
Copy link
Member

I worry there's a trap we're missing here, seems almost too good to be true!

Did you check with the tracing dev if there's no issue? That could even be mentioned in their doc.

@hymm
Copy link
Contributor Author

hymm commented Aug 24, 2023

from this comment #4892 (comment), I assume this is avoiding hitting the registry which they say is the slow thing.

@james7132 james7132 added the S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it label Sep 13, 2023
@alice-i-cecile
Copy link
Member

Really simple and nice!

@alice-i-cecile alice-i-cecile added this pull request to the merge queue Sep 13, 2023
Comment on lines +525 to +529
let task = task.instrument(
self.system_task_metadata[system_index]
.system_task_span
.clone(),
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This clone makes me wonder, do we really need to instrument the Future as well? Normally this is done because Futures can suspend execution during .await points and that would mess up spans, but here we are never awaiting and moreover we're already measuring the run_unsafe internally, which should account for most of the execution time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The instrumentation here does help measure the additional scheduler overhead, which is thankfully very low right now. However, I do see how this both adds even more profiling overhead and how it might not be all that useful to the typical user. Not sure how to best approach toggling this on or off though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My worry is that the time needed for the instrumentation itself here may be in the same order of magniture as the one it is measuring (in addition to the one already measured inside the run_unsafe). That is, it is measuring how much time is needed to run the catch_unwind, doing a non-blocking send on a channel and checking if the catch_unwind returned an error, all of which are pretty fast.

Merged via the queue into bevyengine:main with commit 324c057 Sep 13, 2023
20 of 21 checks passed
Copy link
Contributor

@djeedai djeedai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor doc could be improved.

@@ -62,6 +62,9 @@ struct SystemTaskMetadata {
is_send: bool,
/// Is `true` if the system is exclusive.
is_exclusive: bool,
/// Cached tracing span for system task
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Cached tracing span for system task
/// Tracing span for system task, cached for performance.

github-merge-queue bot pushed a commit that referenced this pull request Sep 30, 2023
# Objective
We cached system spans in #9390, but another common span seen in most
Bevy apps when enabling tracing are Query::par_iter(_mut) related spans.

## Solution
Cache them in QueryState. The one downside to this is that we pay for
the memory for every Query(State) instantiated, not just those that are
used for parallel iteration, but this shouldn't be a significant cost
unless the app is creating hundreds of thousands of Query(State)s
regularly.

## Metrics
Tested against `cargo run --profile stress-test --features trace_tracy
--example many_cubes`. Yellow is this PR, red is main.

`sync_simple_transforms`:


![image](https://github.com/bevyengine/bevy/assets/3137680/d60f6d69-5586-4424-9d78-aac78992aacd)

`check_visibility`:


![image](https://github.com/bevyengine/bevy/assets/3137680/096a58d2-a330-4a32-b806-09cd524e6e15)

Full frame:


![image](https://github.com/bevyengine/bevy/assets/3137680/3b088cf8-9487-4bc7-a308-026e172d6672)
@hymm hymm deleted the cache-system-spans branch October 5, 2023 16:12
@cart cart mentioned this pull request Oct 13, 2023
43 tasks
regnarock pushed a commit to regnarock/bevy that referenced this pull request Oct 13, 2023
# Objective
We cached system spans in bevyengine#9390, but another common span seen in most
Bevy apps when enabling tracing are Query::par_iter(_mut) related spans.

## Solution
Cache them in QueryState. The one downside to this is that we pay for
the memory for every Query(State) instantiated, not just those that are
used for parallel iteration, but this shouldn't be a significant cost
unless the app is creating hundreds of thousands of Query(State)s
regularly.

## Metrics
Tested against `cargo run --profile stress-test --features trace_tracy
--example many_cubes`. Yellow is this PR, red is main.

`sync_simple_transforms`:


![image](https://github.com/bevyengine/bevy/assets/3137680/d60f6d69-5586-4424-9d78-aac78992aacd)

`check_visibility`:


![image](https://github.com/bevyengine/bevy/assets/3137680/096a58d2-a330-4a32-b806-09cd524e6e15)

Full frame:


![image](https://github.com/bevyengine/bevy/assets/3137680/3b088cf8-9487-4bc7-a308-026e172d6672)
rdrpenguin04 pushed a commit to rdrpenguin04/bevy that referenced this pull request Jan 9, 2024
# Objective

- Reduce the overhead of tracing by caching the system spans.

Yellow is this pr. Red is main.


![image](https://github.com/bevyengine/bevy/assets/2180432/fe9bb7c2-ae9a-4522-80a9-75a943a562b6)
rdrpenguin04 pushed a commit to rdrpenguin04/bevy that referenced this pull request Jan 9, 2024
# Objective
We cached system spans in bevyengine#9390, but another common span seen in most
Bevy apps when enabling tracing are Query::par_iter(_mut) related spans.

## Solution
Cache them in QueryState. The one downside to this is that we pay for
the memory for every Query(State) instantiated, not just those that are
used for parallel iteration, but this shouldn't be a significant cost
unless the app is creating hundreds of thousands of Query(State)s
regularly.

## Metrics
Tested against `cargo run --profile stress-test --features trace_tracy
--example many_cubes`. Yellow is this PR, red is main.

`sync_simple_transforms`:


![image](https://github.com/bevyengine/bevy/assets/3137680/d60f6d69-5586-4424-9d78-aac78992aacd)

`check_visibility`:


![image](https://github.com/bevyengine/bevy/assets/3137680/096a58d2-a330-4a32-b806-09cd524e6e15)

Full frame:


![image](https://github.com/bevyengine/bevy/assets/3137680/3b088cf8-9487-4bc7-a308-026e172d6672)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Diagnostics Logging, crash handling, error reporting and performance analysis C-Performance A change motivated by improving speed, memory usage or compile times S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants