Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eliminate extensions allocations #2840

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

SUPERCILEX
Copy link
Contributor

@SUPERCILEX SUPERCILEX commented Jan 1, 2024

Fixes #428

What does this do?

Pools allocations of extensions (implicitly using the fact that AnyMap is stored in a pool) by reusing Boxes.

Why can't this be safe code?

It'd be nice to do this:

enum BoxedExt {
    Live(dyn std::any::Any + Sync + Send),
    Dead,
}

Sadly you can't do that because the dyn is unsized. Instead, we manually re-implement the vtable in unsafe code.

Miri has been run on the entire repo with this change:

RUSTFLAGS="--cfg nightly" MIRIFLAGS=-Zmiri-backtrace=full cargo miri t --workspace

Performance

Tested on https://github.com/SUPERCILEX/ftzz.

Before:

image

After:

image

The increased fragmentation is a little surprising and a bummer, but that might just be in my application as there were no other allocations to remove.

Random notes

There's no need to implement cache eviction because the HashMap is keyed by types which means the maximum amount of memory used is pre-determined at compile time.

I'm using strict provenance APIs for miri testing as I need to use usize to ptr casts which is no bueno.

@SUPERCILEX
Copy link
Contributor Author

Seems to have failed for an unrelated reason: #2842

@davidbarsky
Copy link
Member

davidbarsky commented Jan 3, 2024

Oh wow, thanks for opening this PR and sorry for the delay in responding. I haven't gone over the benchmarks yet (I'd like to rerun them on a less noisy machine...), but they seem to be very promising:

Benchmarking results
tracing/tracing-subscriber on  ext-alloc [$] is 📦 v0.3.0 via 🦀 v1.74.0
at 01:37:55 PM ❯ cargo +nightly bench --features=env-filter event
   Compiling tracing-subscriber v0.3.0 (/Users/dbarsky/Developer/tracing/tracing-subscriber)
   Compiling tracing-mock v0.2.0 (/Users/dbarsky/Developer/tracing/tracing-mock)
    Finished bench [optimized] target(s) in 5.98s
     Running benches/enter.rs (/Users/dbarsky/Developer/tracing/target/release/deps/enter-2a504e389e1a34aa)
WARNING: HTML report generation will become a non-default optional feature in Criterion.rs 0.4.0.
This feature is being moved to cargo-criterion (https://github.com/bheisler/cargo-criterion) and will be optional in a future version of Criterion.rs. To silence this warning, either switch to cargo-criterion or enable the 'html_reports' feature in your Cargo.toml.

WARNING: In Criterion.rs 0.4.0, running criterion benchmarks outside of cargo-criterion will become a default optional feature.
The statistical analysis and reporting is being moved to cargo-criterion (https://github.com/bheisler/cargo-criterion) and will be optional in a future version of Criterion.rs. To silence this warning, either switch to cargo-criterion or enable the 'cargo_bench_support' feature in your Cargo.toml.

Gnuplot not found, using plotters backend
     Running benches/filter.rs (/Users/dbarsky/Developer/tracing/target/release/deps/filter-129f2e31f8caea56)
WARNING: HTML report generation will become a non-default optional feature in Criterion.rs 0.4.0.
This feature is being moved to cargo-criterion (https://github.com/bheisler/cargo-criterion) and will be optional in a future version of Criterion.rs. To silence this warning, either switch to cargo-criterion or enable the 'html_reports' feature in your Cargo.toml.

WARNING: In Criterion.rs 0.4.0, running criterion benchmarks outside of cargo-criterion will become a default optional feature.
The statistical analysis and reporting is being moved to cargo-criterion (https://github.com/bheisler/cargo-criterion) and will be optional in a future version of Criterion.rs. To silence this warning, either switch to cargo-criterion or enable the 'cargo_bench_support' feature in your Cargo.toml.

Gnuplot not found, using plotters backend
     Running benches/filter_log.rs (/Users/dbarsky/Developer/tracing/target/release/deps/filter_log-8a039b6535b4594b)
WARNING: HTML report generation will become a non-default optional feature in Criterion.rs 0.4.0.
This feature is being moved to cargo-criterion (https://github.com/bheisler/cargo-criterion) and will be optional in a future version of Criterion.rs. To silence this warning, either switch to cargo-criterion or enable the 'html_reports' feature in your Cargo.toml.

WARNING: In Criterion.rs 0.4.0, running criterion benchmarks outside of cargo-criterion will become a default optional feature.
The statistical analysis and reporting is being moved to cargo-criterion (https://github.com/bheisler/cargo-criterion) and will be optional in a future version of Criterion.rs. To silence this warning, either switch to cargo-criterion or enable the 'cargo_bench_support' feature in your Cargo.toml.

Gnuplot not found, using plotters backend
     Running benches/fmt.rs (/Users/dbarsky/Developer/tracing/target/release/deps/fmt-5825a19aa1a8f04b)
WARNING: HTML report generation will become a non-default optional feature in Criterion.rs 0.4.0.
This feature is being moved to cargo-criterion (https://github.com/bheisler/cargo-criterion) and will be optional in a future version of Criterion.rs. To silence this warning, either switch to cargo-criterion or enable the 'html_reports' feature in your Cargo.toml.

WARNING: In Criterion.rs 0.4.0, running criterion benchmarks outside of cargo-criterion will become a default optional feature.
The statistical analysis and reporting is being moved to cargo-criterion (https://github.com/bheisler/cargo-criterion) and will be optional in a future version of Criterion.rs. To silence this warning, either switch to cargo-criterion or enable the 'cargo_bench_support' feature in your Cargo.toml.

Gnuplot not found, using plotters backend
event/root/single_threaded/1
                        time:   [504.97 ns 508.09 ns 511.70 ns]
                        thrpt:  [1.9543 Melem/s 1.9681 Melem/s 1.9803 Melem/s]
                 change:
                        time:   [-1.5972% -1.0250% -0.5214%] (p = 0.00 < 0.05)
                        thrpt:  [+0.5242% +1.0356% +1.6232%]
                        Change within noise threshold.
event/root/multithreaded/1
                        time:   [10.656 µs 10.851 µs 11.119 µs]
                        thrpt:  [89.936 Kelem/s 92.157 Kelem/s 93.846 Kelem/s]
                 change:
                        time:   [-53.423% -51.753% -50.042%] (p = 0.00 < 0.05)
                        thrpt:  [+100.17% +107.27% +114.70%]
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low mild
  8 (8.00%) high mild
  1 (1.00%) high severe
event/unique_parent/single_threaded/1
                        time:   [799.32 ns 800.50 ns 801.75 ns]
                        thrpt:  [1.2473 Melem/s 1.2492 Melem/s 1.2511 Melem/s]
                 change:
                        time:   [-1.6751% -1.1944% -0.7508%] (p = 0.00 < 0.05)
                        thrpt:  [+0.7565% +1.2089% +1.7036%]
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) high mild
event/unique_parent/multithreaded/1
                        time:   [24.005 µs 24.512 µs 25.009 µs]
                        thrpt:  [39.985 Kelem/s 40.796 Kelem/s 41.657 Kelem/s]
                 change:
                        time:   [+46.586% +51.273% +56.129%] (p = 0.00 < 0.05)
                        thrpt:  [-35.950% -33.894% -31.781%]
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
event/shared_parent/multithreaded/1
                        time:   [13.038 µs 13.342 µs 13.736 µs]
                        thrpt:  [72.802 Kelem/s 74.954 Kelem/s 76.698 Kelem/s]
                 change:
                        time:   [-40.476% -37.473% -34.227%] (p = 0.00 < 0.05)
                        thrpt:  [+52.039% +59.930% +68.000%]
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  8 (8.00%) high severe
event/multi-parent/multithreaded/1
                        time:   [17.437 µs 17.644 µs 17.835 µs]
                        thrpt:  [56.068 Kelem/s 56.677 Kelem/s 57.348 Kelem/s]
                 change:
                        time:   [-23.232% -19.392% -15.405%] (p = 0.00 < 0.05)
                        thrpt:  [+18.210% +24.058% +30.263%]
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  3 (3.00%) low severe
  1 (1.00%) low mild
  2 (2.00%) high mild
  2 (2.00%) high severe
event/root/single_threaded/10
                        time:   [5.2370 µs 5.2491 µs 5.2614 µs]
                        thrpt:  [1.9006 Melem/s 1.9051 Melem/s 1.9095 Melem/s]
                 change:
                        time:   [+2.1132% +2.5586% +2.9976%] (p = 0.00 < 0.05)
                        thrpt:  [-2.9104% -2.4948% -2.0695%]
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
event/root/multithreaded/10
                        time:   [20.794 µs 20.930 µs 21.093 µs]
                        thrpt:  [474.09 Kelem/s 477.78 Kelem/s 480.92 Kelem/s]
                 change:
                        time:   [-28.254% -26.113% -23.949%] (p = 0.00 < 0.05)
                        thrpt:  [+31.491% +35.341% +39.380%]
                        Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
  2 (2.00%) low severe
  8 (8.00%) low mild
  1 (1.00%) high mild
  4 (4.00%) high severe
event/unique_parent/single_threaded/10
                        time:   [8.1886 µs 8.2119 µs 8.2361 µs]
                        thrpt:  [1.2142 Melem/s 1.2177 Melem/s 1.2212 Melem/s]
                 change:
                        time:   [-1.4775% -1.0266% -0.5883%] (p = 0.00 < 0.05)
                        thrpt:  [+0.5918% +1.0372% +1.4997%]
                        Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
event/unique_parent/multithreaded/10
                        time:   [31.111 µs 31.496 µs 31.931 µs]
                        thrpt:  [313.18 Kelem/s 317.50 Kelem/s 321.43 Kelem/s]
                 change:
                        time:   [+0.6886% +3.6686% +6.9823%] (p = 0.02 < 0.05)
                        thrpt:  [-6.5266% -3.5388% -0.6839%]
                        Change within noise threshold.
Found 9 outliers among 100 measurements (9.00%)
  1 (1.00%) low severe
  7 (7.00%) high mild
  1 (1.00%) high severe
event/shared_parent/multithreaded/10
                        time:   [31.020 µs 31.400 µs 31.703 µs]
                        thrpt:  [315.43 Kelem/s 318.48 Kelem/s 322.37 Kelem/s]
                 change:
                        time:   [-19.896% -17.836% -15.984%] (p = 0.00 < 0.05)
                        thrpt:  [+19.025% +21.707% +24.838%]
                        Performance has improved.
Found 18 outliers among 100 measurements (18.00%)
  6 (6.00%) low severe
  4 (4.00%) low mild
  2 (2.00%) high mild
  6 (6.00%) high severe
event/multi-parent/multithreaded/10
                        time:   [55.046 µs 55.661 µs 56.363 µs]
                        thrpt:  [177.42 Kelem/s 179.66 Kelem/s 181.67 Kelem/s]
                 change:
                        time:   [-39.343% -35.478% -31.532%] (p = 0.00 < 0.05)
                        thrpt:  [+46.053% +54.985% +64.860%]
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  3 (3.00%) low severe
  2 (2.00%) low mild
  4 (4.00%) high mild
  4 (4.00%) high severe
event/root/single_threaded/50
                        time:   [26.178 µs 26.244 µs 26.306 µs]
                        thrpt:  [1.9007 Melem/s 1.9052 Melem/s 1.9100 Melem/s]
                 change:
                        time:   [+2.4375% +3.0112% +3.5159%] (p = 0.00 < 0.05)
                        thrpt:  [-3.3965% -2.9231% -2.3795%]
                        Performance has regressed.
event/root/multithreaded/50
                        time:   [47.489 µs 47.806 µs 48.079 µs]
                        thrpt:  [1.0400 Melem/s 1.0459 Melem/s 1.0529 Melem/s]
                 change:
                        time:   [-2.5381% -0.8321% +0.9569%] (p = 0.38 > 0.05)
                        thrpt:  [-0.9478% +0.8391% +2.6041%]
                        No change in performance detected.
Found 13 outliers among 100 measurements (13.00%)
  4 (4.00%) low severe
  3 (3.00%) low mild
  3 (3.00%) high mild
  3 (3.00%) high severe
event/unique_parent/single_threaded/50
                        time:   [41.218 µs 41.307 µs 41.402 µs]
                        thrpt:  [1.2077 Melem/s 1.2105 Melem/s 1.2131 Melem/s]
                 change:
                        time:   [-1.1814% -0.6921% -0.2034%] (p = 0.01 < 0.05)
                        thrpt:  [+0.2038% +0.6969% +1.1955%]
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe
event/unique_parent/multithreaded/50
                        time:   [66.945 µs 69.091 µs 70.728 µs]
                        thrpt:  [706.93 Kelem/s 723.68 Kelem/s 746.88 Kelem/s]
                 change:
                        time:   [-7.7719% -3.0993% +2.1468%] (p = 0.21 > 0.05)
                        thrpt:  [-2.1017% +3.1984% +8.4268%]
                        No change in performance detected.
Found 16 outliers among 100 measurements (16.00%)
  13 (13.00%) low severe
  3 (3.00%) low mild
event/shared_parent/multithreaded/50
                        time:   [92.784 µs 97.080 µs 101.25 µs]
                        thrpt:  [493.82 Kelem/s 515.04 Kelem/s 538.89 Kelem/s]
                 change:
                        time:   [+9.8937% +15.335% +21.256%] (p = 0.00 < 0.05)
                        thrpt:  [-17.530% -13.296% -9.0030%]
                        Performance has regressed.
Found 23 outliers among 100 measurements (23.00%)
  15 (15.00%) low severe
  6 (6.00%) low mild
  2 (2.00%) high mild
event/multi-parent/multithreaded/50
                        time:   [472.43 µs 473.79 µs 475.08 µs]
                        thrpt:  [105.25 Kelem/s 105.53 Kelem/s 105.84 Kelem/s]
                 change:
                        time:   [-2.3126% -1.9288% -1.5301%] (p = 0.00 < 0.05)
                        thrpt:  [+1.5539% +1.9667% +2.3674%]
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  2 (2.00%) high mild

(This was after I ran the same set of benchmarks on the master branch, to be clear.)

@hawkw
Copy link
Member

hawkw commented Jan 6, 2024

Thanks for working on this, this is something I had been planning on doing for a long time but never had the time for. I'm really excited to review this PR!

@SUPERCILEX
Copy link
Contributor Author

Did a bit of cleanup, the PR should be in a nicer state now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

subscriber: Performance Improvements to fmt::Layer
3 participants