Eliminate extensions allocations #2840

SUPERCILEX · 2024-01-01T02:45:46Z

Fixes #428

What does this do?

Pools allocations of extensions (implicitly using the fact that AnyMap is stored in a pool) by reusing Boxes.

Why can't this be safe code?

It'd be nice to do this:

enum BoxedExt {
    Live(dyn std::any::Any + Sync + Send),
    Dead,
}

Sadly you can't do that because the dyn is unsized. Instead, we manually re-implement the vtable in unsafe code.

Miri has been run on the entire repo with this change:

RUSTFLAGS="--cfg nightly" MIRIFLAGS=-Zmiri-backtrace=full cargo miri t --workspace

Performance

Tested on https://github.com/SUPERCILEX/ftzz.

Before:

After:

The increased fragmentation is a little surprising and a bummer, but that might just be in my application as there were no other allocations to remove.

Random notes

There's no need to implement cache eviction because the HashMap is keyed by types which means the maximum amount of memory used is pre-determined at compile time.

I'm using strict provenance APIs for miri testing as I need to use usize to ptr casts which is no bueno.

SUPERCILEX · 2024-01-03T18:39:06Z

Seems to have failed for an unrelated reason: #2842

davidbarsky · 2024-01-03T18:43:45Z

Oh wow, thanks for opening this PR and sorry for the delay in responding. I haven't gone over the benchmarks yet (I'd like to rerun them on a less noisy machine...), but they seem to be very promising:

Benchmarking results

tracing/tracing-subscriber on  ext-alloc [$] is 📦 v0.3.0 via 🦀 v1.74.0
at 01:37:55 PM ❯ cargo +nightly bench --features=env-filter event
   Compiling tracing-subscriber v0.3.0 (/Users/dbarsky/Developer/tracing/tracing-subscriber)
   Compiling tracing-mock v0.2.0 (/Users/dbarsky/Developer/tracing/tracing-mock)
    Finished bench [optimized] target(s) in 5.98s
     Running benches/enter.rs (/Users/dbarsky/Developer/tracing/target/release/deps/enter-2a504e389e1a34aa)
WARNING: HTML report generation will become a non-default optional feature in Criterion.rs 0.4.0.
This feature is being moved to cargo-criterion (https://github.com/bheisler/cargo-criterion) and will be optional in a future version of Criterion.rs. To silence this warning, either switch to cargo-criterion or enable the 'html_reports' feature in your Cargo.toml.

WARNING: In Criterion.rs 0.4.0, running criterion benchmarks outside of cargo-criterion will become a default optional feature.
The statistical analysis and reporting is being moved to cargo-criterion (https://github.com/bheisler/cargo-criterion) and will be optional in a future version of Criterion.rs. To silence this warning, either switch to cargo-criterion or enable the 'cargo_bench_support' feature in your Cargo.toml.

Gnuplot not found, using plotters backend
     Running benches/filter.rs (/Users/dbarsky/Developer/tracing/target/release/deps/filter-129f2e31f8caea56)
WARNING: HTML report generation will become a non-default optional feature in Criterion.rs 0.4.0.
This feature is being moved to cargo-criterion (https://github.com/bheisler/cargo-criterion) and will be optional in a future version of Criterion.rs. To silence this warning, either switch to cargo-criterion or enable the 'html_reports' feature in your Cargo.toml.

WARNING: In Criterion.rs 0.4.0, running criterion benchmarks outside of cargo-criterion will become a default optional feature.
The statistical analysis and reporting is being moved to cargo-criterion (https://github.com/bheisler/cargo-criterion) and will be optional in a future version of Criterion.rs. To silence this warning, either switch to cargo-criterion or enable the 'cargo_bench_support' feature in your Cargo.toml.

Gnuplot not found, using plotters backend
     Running benches/filter_log.rs (/Users/dbarsky/Developer/tracing/target/release/deps/filter_log-8a039b6535b4594b)
WARNING: HTML report generation will become a non-default optional feature in Criterion.rs 0.4.0.
This feature is being moved to cargo-criterion (https://github.com/bheisler/cargo-criterion) and will be optional in a future version of Criterion.rs. To silence this warning, either switch to cargo-criterion or enable the 'html_reports' feature in your Cargo.toml.

WARNING: In Criterion.rs 0.4.0, running criterion benchmarks outside of cargo-criterion will become a default optional feature.
The statistical analysis and reporting is being moved to cargo-criterion (https://github.com/bheisler/cargo-criterion) and will be optional in a future version of Criterion.rs. To silence this warning, either switch to cargo-criterion or enable the 'cargo_bench_support' feature in your Cargo.toml.

Gnuplot not found, using plotters backend
     Running benches/fmt.rs (/Users/dbarsky/Developer/tracing/target/release/deps/fmt-5825a19aa1a8f04b)
WARNING: HTML report generation will become a non-default optional feature in Criterion.rs 0.4.0.
This feature is being moved to cargo-criterion (https://github.com/bheisler/cargo-criterion) and will be optional in a future version of Criterion.rs. To silence this warning, either switch to cargo-criterion or enable the 'html_reports' feature in your Cargo.toml.

WARNING: In Criterion.rs 0.4.0, running criterion benchmarks outside of cargo-criterion will become a default optional feature.
The statistical analysis and reporting is being moved to cargo-criterion (https://github.com/bheisler/cargo-criterion) and will be optional in a future version of Criterion.rs. To silence this warning, either switch to cargo-criterion or enable the 'cargo_bench_support' feature in your Cargo.toml.

Gnuplot not found, using plotters backend
event/root/single_threaded/1
                        time:   [504.97 ns 508.09 ns 511.70 ns]
                        thrpt:  [1.9543 Melem/s 1.9681 Melem/s 1.9803 Melem/s]
                 change:
                        time:   [-1.5972% -1.0250% -0.5214%] (p = 0.00 < 0.05)
                        thrpt:  [+0.5242% +1.0356% +1.6232%]
                        Change within noise threshold.
event/root/multithreaded/1
                        time:   [10.656 µs 10.851 µs 11.119 µs]
                        thrpt:  [89.936 Kelem/s 92.157 Kelem/s 93.846 Kelem/s]
                 change:
                        time:   [-53.423% -51.753% -50.042%] (p = 0.00 < 0.05)
                        thrpt:  [+100.17% +107.27% +114.70%]
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low mild
  8 (8.00%) high mild
  1 (1.00%) high severe
event/unique_parent/single_threaded/1
                        time:   [799.32 ns 800.50 ns 801.75 ns]
                        thrpt:  [1.2473 Melem/s 1.2492 Melem/s 1.2511 Melem/s]
                 change:
                        time:   [-1.6751% -1.1944% -0.7508%] (p = 0.00 < 0.05)
                        thrpt:  [+0.7565% +1.2089% +1.7036%]
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) high mild
event/unique_parent/multithreaded/1
                        time:   [24.005 µs 24.512 µs 25.009 µs]
                        thrpt:  [39.985 Kelem/s 40.796 Kelem/s 41.657 Kelem/s]
                 change:
                        time:   [+46.586% +51.273% +56.129%] (p = 0.00 < 0.05)
                        thrpt:  [-35.950% -33.894% -31.781%]
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
event/shared_parent/multithreaded/1
                        time:   [13.038 µs 13.342 µs 13.736 µs]
                        thrpt:  [72.802 Kelem/s 74.954 Kelem/s 76.698 Kelem/s]
                 change:
                        time:   [-40.476% -37.473% -34.227%] (p = 0.00 < 0.05)
                        thrpt:  [+52.039% +59.930% +68.000%]
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  8 (8.00%) high severe
event/multi-parent/multithreaded/1
                        time:   [17.437 µs 17.644 µs 17.835 µs]
                        thrpt:  [56.068 Kelem/s 56.677 Kelem/s 57.348 Kelem/s]
                 change:
                        time:   [-23.232% -19.392% -15.405%] (p = 0.00 < 0.05)
                        thrpt:  [+18.210% +24.058% +30.263%]
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  3 (3.00%) low severe
  1 (1.00%) low mild
  2 (2.00%) high mild
  2 (2.00%) high severe
event/root/single_threaded/10
                        time:   [5.2370 µs 5.2491 µs 5.2614 µs]
                        thrpt:  [1.9006 Melem/s 1.9051 Melem/s 1.9095 Melem/s]
                 change:
                        time:   [+2.1132% +2.5586% +2.9976%] (p = 0.00 < 0.05)
                        thrpt:  [-2.9104% -2.4948% -2.0695%]
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
event/root/multithreaded/10
                        time:   [20.794 µs 20.930 µs 21.093 µs]
                        thrpt:  [474.09 Kelem/s 477.78 Kelem/s 480.92 Kelem/s]
                 change:
                        time:   [-28.254% -26.113% -23.949%] (p = 0.00 < 0.05)
                        thrpt:  [+31.491% +35.341% +39.380%]
                        Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
  2 (2.00%) low severe
  8 (8.00%) low mild
  1 (1.00%) high mild
  4 (4.00%) high severe
event/unique_parent/single_threaded/10
                        time:   [8.1886 µs 8.2119 µs 8.2361 µs]
                        thrpt:  [1.2142 Melem/s 1.2177 Melem/s 1.2212 Melem/s]
                 change:
                        time:   [-1.4775% -1.0266% -0.5883%] (p = 0.00 < 0.05)
                        thrpt:  [+0.5918% +1.0372% +1.4997%]
                        Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
event/unique_parent/multithreaded/10
                        time:   [31.111 µs 31.496 µs 31.931 µs]
                        thrpt:  [313.18 Kelem/s 317.50 Kelem/s 321.43 Kelem/s]
                 change:
                        time:   [+0.6886% +3.6686% +6.9823%] (p = 0.02 < 0.05)
                        thrpt:  [-6.5266% -3.5388% -0.6839%]
                        Change within noise threshold.
Found 9 outliers among 100 measurements (9.00%)
  1 (1.00%) low severe
  7 (7.00%) high mild
  1 (1.00%) high severe
event/shared_parent/multithreaded/10
                        time:   [31.020 µs 31.400 µs 31.703 µs]
                        thrpt:  [315.43 Kelem/s 318.48 Kelem/s 322.37 Kelem/s]
                 change:
                        time:   [-19.896% -17.836% -15.984%] (p = 0.00 < 0.05)
                        thrpt:  [+19.025% +21.707% +24.838%]
                        Performance has improved.
Found 18 outliers among 100 measurements (18.00%)
  6 (6.00%) low severe
  4 (4.00%) low mild
  2 (2.00%) high mild
  6 (6.00%) high severe
event/multi-parent/multithreaded/10
                        time:   [55.046 µs 55.661 µs 56.363 µs]
                        thrpt:  [177.42 Kelem/s 179.66 Kelem/s 181.67 Kelem/s]
                 change:
                        time:   [-39.343% -35.478% -31.532%] (p = 0.00 < 0.05)
                        thrpt:  [+46.053% +54.985% +64.860%]
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  3 (3.00%) low severe
  2 (2.00%) low mild
  4 (4.00%) high mild
  4 (4.00%) high severe
event/root/single_threaded/50
                        time:   [26.178 µs 26.244 µs 26.306 µs]
                        thrpt:  [1.9007 Melem/s 1.9052 Melem/s 1.9100 Melem/s]
                 change:
                        time:   [+2.4375% +3.0112% +3.5159%] (p = 0.00 < 0.05)
                        thrpt:  [-3.3965% -2.9231% -2.3795%]
                        Performance has regressed.
event/root/multithreaded/50
                        time:   [47.489 µs 47.806 µs 48.079 µs]
                        thrpt:  [1.0400 Melem/s 1.0459 Melem/s 1.0529 Melem/s]
                 change:
                        time:   [-2.5381% -0.8321% +0.9569%] (p = 0.38 > 0.05)
                        thrpt:  [-0.9478% +0.8391% +2.6041%]
                        No change in performance detected.
Found 13 outliers among 100 measurements (13.00%)
  4 (4.00%) low severe
  3 (3.00%) low mild
  3 (3.00%) high mild
  3 (3.00%) high severe
event/unique_parent/single_threaded/50
                        time:   [41.218 µs 41.307 µs 41.402 µs]
                        thrpt:  [1.2077 Melem/s 1.2105 Melem/s 1.2131 Melem/s]
                 change:
                        time:   [-1.1814% -0.6921% -0.2034%] (p = 0.01 < 0.05)
                        thrpt:  [+0.2038% +0.6969% +1.1955%]
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe
event/unique_parent/multithreaded/50
                        time:   [66.945 µs 69.091 µs 70.728 µs]
                        thrpt:  [706.93 Kelem/s 723.68 Kelem/s 746.88 Kelem/s]
                 change:
                        time:   [-7.7719% -3.0993% +2.1468%] (p = 0.21 > 0.05)
                        thrpt:  [-2.1017% +3.1984% +8.4268%]
                        No change in performance detected.
Found 16 outliers among 100 measurements (16.00%)
  13 (13.00%) low severe
  3 (3.00%) low mild
event/shared_parent/multithreaded/50
                        time:   [92.784 µs 97.080 µs 101.25 µs]
                        thrpt:  [493.82 Kelem/s 515.04 Kelem/s 538.89 Kelem/s]
                 change:
                        time:   [+9.8937% +15.335% +21.256%] (p = 0.00 < 0.05)
                        thrpt:  [-17.530% -13.296% -9.0030%]
                        Performance has regressed.
Found 23 outliers among 100 measurements (23.00%)
  15 (15.00%) low severe
  6 (6.00%) low mild
  2 (2.00%) high mild
event/multi-parent/multithreaded/50
                        time:   [472.43 µs 473.79 µs 475.08 µs]
                        thrpt:  [105.25 Kelem/s 105.53 Kelem/s 105.84 Kelem/s]
                 change:
                        time:   [-2.3126% -1.9288% -1.5301%] (p = 0.00 < 0.05)
                        thrpt:  [+1.5539% +1.9667% +2.3674%]
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  2 (2.00%) high mild

(This was after I ran the same set of benchmarks on the master branch, to be clear.)

Signed-off-by: Alex Saveau <[email protected]>

hawkw · 2024-01-06T19:41:21Z

Thanks for working on this, this is something I had been planning on doing for a long time but never had the time for. I'm really excited to review this PR!

Signed-off-by: Alex Saveau <[email protected]>

SUPERCILEX · 2024-01-06T20:57:59Z

Did a bit of cleanup, the PR should be in a nicer state now.

SUPERCILEX requested review from hawkw, davidbarsky and a team as code owners January 1, 2024 02:45

SUPERCILEX mentioned this pull request Jan 1, 2024

Low/Minimal Allocation version of tracing_tracy nagisa/rust_tracy_client#42

Closed

SUPERCILEX force-pushed the ext-alloc branch from 2c83f67 to 7e60010 Compare January 1, 2024 07:06

Eliminate extensions allocations

42f47b2

Signed-off-by: Alex Saveau <[email protected]>

SUPERCILEX force-pushed the ext-alloc branch from 7e60010 to 42f47b2 Compare January 6, 2024 19:34

SUPERCILEX added 2 commits January 6, 2024 12:51

Tighten boxed_entry abstractions

6f9d75c

Signed-off-by: Alex Saveau <[email protected]>

Make remove actually deallocate

0ba8b1b

Signed-off-by: Alex Saveau <[email protected]>

SUPERCILEX force-pushed the ext-alloc branch from 5fb6840 to 0ba8b1b Compare January 6, 2024 20:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eliminate extensions allocations #2840

Eliminate extensions allocations #2840

SUPERCILEX commented Jan 1, 2024 •

edited

Loading

SUPERCILEX commented Jan 3, 2024

davidbarsky commented Jan 3, 2024 •

edited

Loading

hawkw commented Jan 6, 2024

SUPERCILEX commented Jan 6, 2024

Eliminate extensions allocations #2840

Are you sure you want to change the base?

Eliminate extensions allocations #2840

Conversation

SUPERCILEX commented Jan 1, 2024 • edited Loading

What does this do?

Why can't this be safe code?

Performance

Random notes

SUPERCILEX commented Jan 3, 2024

davidbarsky commented Jan 3, 2024 • edited Loading

hawkw commented Jan 6, 2024

SUPERCILEX commented Jan 6, 2024

SUPERCILEX commented Jan 1, 2024 •

edited

Loading

davidbarsky commented Jan 3, 2024 •

edited

Loading