Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add inline(never) to bench systems #9824

Merged
merged 1 commit into from
Oct 2, 2023

Conversation

nicopap
Copy link
Contributor

@nicopap nicopap commented Sep 16, 2023

Objective

It is difficult to inspect the generated assembly of benchmark systems using a tool such as cargo-asm

Solution

Mark the related functions as #[inline(never)]. This way, you can pass the module name as argument to cargo-asm to get the generated assembly for the given function.

It may have as side effect to make benchmarks a bit more predictable and useful too. As it prevents inlining where in bevy no inlining could possibly take place.

Measurements

Following the recommendations in https://easyperf.net/blog/2019/08/02/Perf-measurement-environment-on-Linux, I

  1. Put my CPU in "AMD ECO" mode, which surprisingly is the equivalent of disabling turboboost, giving more consistent performances
  2. Disabled all hyperthreading cores using echo 0 > /sys/devices/system/cpu/cpu{11,12…}/online
  3. Set the scaling governor to performance
  4. Manually disabled AMD boost with echo 0 > /sys/devices/system/cpu/cpufreq/boost
  5. Set the nice level of the criterion benchmark using cargo bench … & sudo renice -n -5 -p $! ; fg
  6. Not running any other program than the benchmarks (outside of system daemons and the X11 server)

With this setup, running multiple times the same benchmarks on main gives me a lot of "regression" and "improvement" messages, which is absurd given that no code changed.

On this branch, there is still some spurious performance change detection, but they are much less frequent.

This only accounts for iter_simple and iter_frag benchmarks of course.

Why? Because then it becomes easier to inspect generated ASM using a
tool like `cargo-asm`.
@nicopap nicopap added the C-Usability A targeted quality-of-life change that makes Bevy easier to use label Sep 16, 2023
@nicopap nicopap marked this pull request as ready for review September 16, 2023 07:53
Copy link
Contributor

@atlv24 atlv24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes a lot of sense and is well-justified

Copy link
Member

@james7132 james7132 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Do we need #[no_mangle] to make it easier to find the symbols?

@james7132 james7132 added the S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it label Oct 2, 2023
@nicopap
Copy link
Contributor Author

nicopap commented Oct 2, 2023

cargo-asm is capable of demangling. so for this specific use-case #[no_mangle] is not needed. Though it might be interesting for usage with other tools.

@alice-i-cecile alice-i-cecile added this pull request to the merge queue Oct 2, 2023
Merged via the queue into bevyengine:main with commit 47409c8 Oct 2, 2023
24 checks passed
ameknite pushed a commit to ameknite/bevy that referenced this pull request Oct 3, 2023
# Objective

It is difficult to inspect the generated assembly of benchmark systems
using a tool such as `cargo-asm`

## Solution

Mark the related functions as `#[inline(never)]`. This way, you can pass
the module name as argument to `cargo-asm` to get the generated assembly
for the given function.

It may have as side effect to make benchmarks a bit more predictable and
useful too. As it prevents inlining where in bevy no inlining could
possibly take place.

### Measurements

Following the recommendations in
<https://easyperf.net/blog/2019/08/02/Perf-measurement-environment-on-Linux>,
I

1. Put my CPU in "AMD ECO" mode, which surprisingly is the equivalent of
disabling turboboost, giving more consistent performances
2. Disabled all hyperthreading cores using `echo 0 >
/sys/devices/system/cpu/cpu{11,12…}/online`
3. Set the scaling governor to `performance`
4. Manually disabled AMD boost with `echo 0 >
/sys/devices/system/cpu/cpufreq/boost`
5. Set the nice level of the criterion benchmark using `cargo bench … &
sudo renice -n -5 -p $! ; fg`
6. Not running any other program than the benchmarks (outside of system
daemons and the X11 server)

With this setup, running multiple times the same benchmarks on `main`
gives me a lot of "regression" and "improvement" messages, which is
absurd given that no code changed.

On this branch, there is still some spurious performance change
detection, but they are much less frequent.

This only accounts for `iter_simple` and `iter_frag` benchmarks of
course.
ameknite pushed a commit to ameknite/bevy that referenced this pull request Oct 3, 2023
# Objective

It is difficult to inspect the generated assembly of benchmark systems
using a tool such as `cargo-asm`

## Solution

Mark the related functions as `#[inline(never)]`. This way, you can pass
the module name as argument to `cargo-asm` to get the generated assembly
for the given function.

It may have as side effect to make benchmarks a bit more predictable and
useful too. As it prevents inlining where in bevy no inlining could
possibly take place.

### Measurements

Following the recommendations in
<https://easyperf.net/blog/2019/08/02/Perf-measurement-environment-on-Linux>,
I

1. Put my CPU in "AMD ECO" mode, which surprisingly is the equivalent of
disabling turboboost, giving more consistent performances
2. Disabled all hyperthreading cores using `echo 0 >
/sys/devices/system/cpu/cpu{11,12…}/online`
3. Set the scaling governor to `performance`
4. Manually disabled AMD boost with `echo 0 >
/sys/devices/system/cpu/cpufreq/boost`
5. Set the nice level of the criterion benchmark using `cargo bench … &
sudo renice -n -5 -p $! ; fg`
6. Not running any other program than the benchmarks (outside of system
daemons and the X11 server)

With this setup, running multiple times the same benchmarks on `main`
gives me a lot of "regression" and "improvement" messages, which is
absurd given that no code changed.

On this branch, there is still some spurious performance change
detection, but they are much less frequent.

This only accounts for `iter_simple` and `iter_frag` benchmarks of
course.
ameknite pushed a commit to ameknite/bevy that referenced this pull request Oct 3, 2023
# Objective

It is difficult to inspect the generated assembly of benchmark systems
using a tool such as `cargo-asm`

## Solution

Mark the related functions as `#[inline(never)]`. This way, you can pass
the module name as argument to `cargo-asm` to get the generated assembly
for the given function.

It may have as side effect to make benchmarks a bit more predictable and
useful too. As it prevents inlining where in bevy no inlining could
possibly take place.

### Measurements

Following the recommendations in
<https://easyperf.net/blog/2019/08/02/Perf-measurement-environment-on-Linux>,
I

1. Put my CPU in "AMD ECO" mode, which surprisingly is the equivalent of
disabling turboboost, giving more consistent performances
2. Disabled all hyperthreading cores using `echo 0 >
/sys/devices/system/cpu/cpu{11,12…}/online`
3. Set the scaling governor to `performance`
4. Manually disabled AMD boost with `echo 0 >
/sys/devices/system/cpu/cpufreq/boost`
5. Set the nice level of the criterion benchmark using `cargo bench … &
sudo renice -n -5 -p $! ; fg`
6. Not running any other program than the benchmarks (outside of system
daemons and the X11 server)

With this setup, running multiple times the same benchmarks on `main`
gives me a lot of "regression" and "improvement" messages, which is
absurd given that no code changed.

On this branch, there is still some spurious performance change
detection, but they are much less frequent.

This only accounts for `iter_simple` and `iter_frag` benchmarks of
course.
regnarock pushed a commit to regnarock/bevy that referenced this pull request Oct 13, 2023
# Objective

It is difficult to inspect the generated assembly of benchmark systems
using a tool such as `cargo-asm`

## Solution

Mark the related functions as `#[inline(never)]`. This way, you can pass
the module name as argument to `cargo-asm` to get the generated assembly
for the given function.

It may have as side effect to make benchmarks a bit more predictable and
useful too. As it prevents inlining where in bevy no inlining could
possibly take place.

### Measurements

Following the recommendations in
<https://easyperf.net/blog/2019/08/02/Perf-measurement-environment-on-Linux>,
I

1. Put my CPU in "AMD ECO" mode, which surprisingly is the equivalent of
disabling turboboost, giving more consistent performances
2. Disabled all hyperthreading cores using `echo 0 >
/sys/devices/system/cpu/cpu{11,12…}/online`
3. Set the scaling governor to `performance`
4. Manually disabled AMD boost with `echo 0 >
/sys/devices/system/cpu/cpufreq/boost`
5. Set the nice level of the criterion benchmark using `cargo bench … &
sudo renice -n -5 -p $! ; fg`
6. Not running any other program than the benchmarks (outside of system
daemons and the X11 server)

With this setup, running multiple times the same benchmarks on `main`
gives me a lot of "regression" and "improvement" messages, which is
absurd given that no code changed.

On this branch, there is still some spurious performance change
detection, but they are much less frequent.

This only accounts for `iter_simple` and `iter_frag` benchmarks of
course.
rdrpenguin04 pushed a commit to rdrpenguin04/bevy that referenced this pull request Jan 9, 2024
# Objective

It is difficult to inspect the generated assembly of benchmark systems
using a tool such as `cargo-asm`

## Solution

Mark the related functions as `#[inline(never)]`. This way, you can pass
the module name as argument to `cargo-asm` to get the generated assembly
for the given function.

It may have as side effect to make benchmarks a bit more predictable and
useful too. As it prevents inlining where in bevy no inlining could
possibly take place.

### Measurements

Following the recommendations in
<https://easyperf.net/blog/2019/08/02/Perf-measurement-environment-on-Linux>,
I

1. Put my CPU in "AMD ECO" mode, which surprisingly is the equivalent of
disabling turboboost, giving more consistent performances
2. Disabled all hyperthreading cores using `echo 0 >
/sys/devices/system/cpu/cpu{11,12…}/online`
3. Set the scaling governor to `performance`
4. Manually disabled AMD boost with `echo 0 >
/sys/devices/system/cpu/cpufreq/boost`
5. Set the nice level of the criterion benchmark using `cargo bench … &
sudo renice -n -5 -p $! ; fg`
6. Not running any other program than the benchmarks (outside of system
daemons and the X11 server)

With this setup, running multiple times the same benchmarks on `main`
gives me a lot of "regression" and "improvement" messages, which is
absurd given that no code changed.

On this branch, there is still some spurious performance change
detection, but they are much less frequent.

This only accounts for `iter_simple` and `iter_frag` benchmarks of
course.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-Usability A targeted quality-of-life change that makes Bevy easier to use S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants