Skip to content

Commit

Permalink
perf(turbo-tasks): Call .shrink_to_fit() on common collection types…
Browse files Browse the repository at this point in the history
… when constructing a cell (#72113)

Cell contents are immutable once constructed, so there's no chance that they'll grow in size again. Common collections can be shrunk to avoid storing empty spare capacity in this case (note: if they're already correctly sized, `shrink_to_fit` bails out early).

**Result:** This gives a ~1.4% decrease in top-line peak memory consumption, for a theoretical CPU/Wall time cost that's too small to measure.

**Inspiration:** The inspiration for this was vercel/turborepo#2873, which decreased task storage (not the top-line memory usage?) by ~14%, vercel/turborepo#8657, and other similar optimization PRs.

## Additional Opportunities

- There may be more places where cell are constructed (e.g. persistent storage deserialization) where a cell's `SharedReference` is constructed that is not currently captured by this.
  - Depending on the library used, deserialization may already construct exact-sized collections.
  - As an additional example, manually constructing a `ReadRef` and converting it into a cell skips this optimization because `ReadRef::cell` internally uses the type-erased shared-reference `raw_cell` API which is incompatible with this optimization. We could special-case that in the `ReadRef::new_owned` constructor (not in `ReadRef::new_arc` though), but nobody should be manually constructing `ReadRef`s.

- We still don't use `shrink_to_fit` on `RcStr` types. Some of these are in-place extended (when they have a refcount of 1) with `RcStr::map`, so we probably don't want to be too aggressive about this to avoid `O(n^2)` time complexity blowups.

## Memory Benchmark Setup

```bash
cd ~/next.js
cargo run -p next-build-test --release -- generate ~/shadcn-ui/apps/www/ > ~/shadcn-ui/apps/www/project_options.json
pnpm pack-next --project ~/shadcn-ui/apps/www/
```

```bash
cd ~/shadcn-ui
pnpm i
cd ~/shadcn-ui/apps/www/
heaptrack ~/next.js/target/release/next-build-test run sequential 1 1 '/sink'
heaptrack --analyze ~/shadcn-ui/apps/www/heaptrack.next-build-test.3604648.zst
```

### Memory Before (canary branch)

First Run:

```
peak heap memory consumption: 3.23G
peak RSS (including heaptrack overhead): 4.75G
```

Second Run:

```
peak heap memory consumption: 3.23G
peak RSS (including heaptrack overhead): 4.75G
```

### Memory After (this PR)

First Run:

```
peak heap memory consumption: 3.18G
peak RSS (including heaptrack overhead): 4.74G
```

Second Run:

```
peak heap memory consumption: 3.19G
peak RSS (including heaptrack overhead): 4.73G
```

This is about a 1.4% decrease in top-line memory consumption.

## Wall Time with `hyperfine` (Counter-Metric)

This is theoretically a time-memory tradeoff, as we'll spend some time `memcpy`ing things into smaller allocations, though in some cases reducing memory usage can improve cache locality, so it's not always obvious.

```
hyperfine --warmup 3 -r 30 --time-unit millisecond '~/next.js/target/release/next-build-test run sequential 1 1 /sink'
```

This benchmark is slow and takes about 30 minutes to run.

Before:

```
Benchmark 1: ~/next.js/target/release/next-build-test run sequential 1 1 /sink
  Time (mean ± σ):     56387.5 ms ± 212.6 ms    [User: 107807.5 ms, System: 9509.8 ms]
  Range (min … max):   55934.4 ms … 56872.9 ms    30 runs
```

After:

```
Benchmark 1: ~/next.js/target/release/next-build-test run sequential 1 1 /sink
  Time (mean ± σ):     56020.9 ms ± 235.4 ms    [User: 107483.8 ms, System: 9371.8 ms]
  Range (min … max):   55478.2 ms … 56563.6 ms    30 runs
```

This is a ~0.65% *reduction* in wall time. This is small enough (<2 standard deviations) to likely just be noise.

## Wall Time with `turbopack-bench` (Counter-Metric)

```
cargo bench -p turbopack-bench -p turbopack-cli -- "hmr_to_eval/Turbopack CSR"
```

Gives:

```
bench_hmr_to_eval/Turbopack CSR/1000 modules
                        time:   [15.123 ms 15.208 ms 15.343 ms]
                        change: [-0.8471% +0.4882% +1.9719%] (p = 0.55 > 0.05)
                        No change in performance detected.
```

Using https://github.com/bgw/benchmark-scripts/

In practice, it's not really possible to measure changes in wall time <1%, so this is within "noise" territory (as noted in the criterion output).

Closes PACK-3361
  • Loading branch information
bgw authored Nov 1, 2024
1 parent a44a0d9 commit 5d3bc2b
Show file tree
Hide file tree
Showing 17 changed files with 357 additions and 54 deletions.
35 changes: 33 additions & 2 deletions turbopack/crates/turbo-tasks-macros-shared/src/primitive_input.rs
Original file line number Diff line number Diff line change
@@ -1,16 +1,47 @@
use proc_macro2::Span;
use syn::{
parse::{Parse, ParseStream},
Result, Type,
punctuated::Punctuated,
spanned::Spanned,
Meta, Result, Token, Type,
};

#[derive(Debug)]
pub struct PrimitiveInput {
pub ty: Type,
pub manual_shrink_to_fit: Option<Span>,
}

impl Parse for PrimitiveInput {
fn parse(input: ParseStream) -> Result<Self> {
let ty: Type = input.parse()?;
Ok(PrimitiveInput { ty })
let mut parsed_input = PrimitiveInput {
ty,
manual_shrink_to_fit: None,
};
if input.parse::<Option<Token![,]>>()?.is_some() {
let punctuated: Punctuated<Meta, Token![,]> = input.parse_terminated(Meta::parse)?;
for meta in punctuated {
match (
meta.path()
.get_ident()
.map(ToString::to_string)
.as_deref()
.unwrap_or_default(),
&meta,
) {
("manual_shrink_to_fit", Meta::Path(_)) => {
parsed_input.manual_shrink_to_fit = Some(meta.span())
}
(_, meta) => {
return Err(syn::Error::new_spanned(
meta,
"unexpected token, expected: \"manual_shrink_to_fit\"",
))
}
}
}
}
Ok(parsed_input)
}
}
2 changes: 2 additions & 0 deletions turbopack/crates/turbo-tasks-macros/src/derive/mod.rs
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
mod deterministic_hash_macro;
mod key_value_pair_macro;
mod resolved_value_macro;
mod shrink_to_fit_macro;
mod task_input_macro;
mod trace_raw_vcs_macro;
mod value_debug_format_macro;
Expand All @@ -9,6 +10,7 @@ mod value_debug_macro;
pub use deterministic_hash_macro::derive_deterministic_hash;
pub use key_value_pair_macro::derive_key_value_pair;
pub use resolved_value_macro::derive_resolved_value;
pub use shrink_to_fit_macro::derive_shrink_to_fit;
use syn::{spanned::Spanned, Attribute, Meta, MetaList, NestedMeta};
pub use task_input_macro::derive_task_input;
pub use trace_raw_vcs_macro::derive_trace_raw_vcs;
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
use proc_macro::TokenStream;
use proc_macro2::TokenStream as TokenStream2;
use quote::quote;
use syn::{parse_macro_input, DeriveInput, FieldsNamed, FieldsUnnamed};
use turbo_tasks_macros_shared::{generate_exhaustive_destructuring, match_expansion};

pub fn derive_shrink_to_fit(input: TokenStream) -> TokenStream {
let derive_input = parse_macro_input!(input as DeriveInput);
let ident = &derive_input.ident;
let (impl_generics, ty_generics, where_clause) = derive_input.generics.split_for_impl();

let shrink_items = match_expansion(&derive_input, &shrink_named, &shrink_unnamed, &shrink_unit);
quote! {
impl #impl_generics turbo_tasks::ShrinkToFit for #ident #ty_generics #where_clause {
fn shrink_to_fit(&mut self) {
#shrink_items
}
}
}
.into()
}

fn shrink_named(_ident: TokenStream2, fields: &FieldsNamed) -> (TokenStream2, TokenStream2) {
let (captures, fields_idents) = generate_exhaustive_destructuring(fields.named.iter());
(
captures,
quote! {
{#(
turbo_tasks::macro_helpers::ShrinkToFitDerefSpecialization::new(
#fields_idents,
).shrink_to_fit();
)*}
},
)
}

fn shrink_unnamed(_ident: TokenStream2, fields: &FieldsUnnamed) -> (TokenStream2, TokenStream2) {
let (captures, fields_idents) = generate_exhaustive_destructuring(fields.unnamed.iter());
(
captures,
quote! {
{#(
turbo_tasks::macro_helpers::ShrinkToFitDerefSpecialization::new(
#fields_idents,
).shrink_to_fit();
)*}
},
)
}

fn shrink_unit(_ident: TokenStream2) -> TokenStream2 {
quote! { { } }
}
5 changes: 5 additions & 0 deletions turbopack/crates/turbo-tasks-macros/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,11 @@ pub fn derive_trace_raw_vcs_attr(input: TokenStream) -> TokenStream {
derive::derive_trace_raw_vcs(input)
}

#[proc_macro_derive(ShrinkToFit, attributes(turbo_tasks))]
pub fn derive_shrink_to_fit(input: TokenStream) -> TokenStream {
derive::derive_shrink_to_fit(input)
}

#[proc_macro_derive(ResolvedValue, attributes(turbo_tasks))]
pub fn derive_resolved_value_attr(input: TokenStream) -> TokenStream {
derive::derive_resolved_value(input)
Expand Down
12 changes: 11 additions & 1 deletion turbopack/crates/turbo-tasks-macros/src/primitive_macro.rs
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,16 @@ pub fn primitive(input: TokenStream) -> TokenStream {
.into();
};

let value_shrink_to_fit_impl = if input.manual_shrink_to_fit.is_none() {
Some(quote! {
impl turbo_tasks::ShrinkToFit for #ty {
fn shrink_to_fit(&mut self) {}
}
})
} else {
None
};

let value_debug_impl = quote! {
#[turbo_tasks::value_impl]
impl turbo_tasks::debug::ValueDebug for #ty {
Expand Down Expand Up @@ -62,7 +72,7 @@ pub fn primitive(input: TokenStream) -> TokenStream {
#value_type_and_register

#value_debug_impl

#value_shrink_to_fit_impl
#value_default_impl
}
.into()
Expand Down
62 changes: 28 additions & 34 deletions turbopack/crates/turbo-tasks-macros/src/value_macro.rs
Original file line number Diff line number Diff line change
Expand Up @@ -349,49 +349,46 @@ pub fn value(args: TokenStream, input: TokenStream) -> TokenStream {
quote! {}
};

let derive = match serialization_mode {
let mut struct_attributes = vec![quote! {
#[derive(turbo_tasks::ShrinkToFit, turbo_tasks::trace::TraceRawVcs)]
}];
match serialization_mode {
SerializationMode::Auto | SerializationMode::AutoForInput => {
struct_attributes.push(quote! {
#[derive(
turbo_tasks::macro_helpers::serde::Serialize,
turbo_tasks::macro_helpers::serde::Deserialize,
)]
#[serde(crate = "turbo_tasks::macro_helpers::serde")]
})
}
SerializationMode::None | SerializationMode::Custom | SerializationMode::CustomForInput => {
quote! {
#[derive(turbo_tasks::trace::TraceRawVcs)]
}
}
SerializationMode::Auto | SerializationMode::AutoForInput => quote! {
#[derive(
turbo_tasks::trace::TraceRawVcs,
turbo_tasks::macro_helpers::serde::Serialize,
turbo_tasks::macro_helpers::serde::Deserialize,
)]
#[serde(crate = "turbo_tasks::macro_helpers::serde")]
},
};
let debug_derive = if inner_type.is_some() {
if inner_type.is_some() {
// Transparent structs have their own manual `ValueDebug` implementation.
quote! {
struct_attributes.push(quote! {
#[repr(transparent)]
}
});
} else {
quote! {
struct_attributes.push(quote! {
#[derive(
turbo_tasks::debug::ValueDebugFormat,
turbo_tasks::debug::internal::ValueDebug,
)]
}
};
let eq_derive = if manual_eq {
quote!()
} else {
quote!(
});
}
if !manual_eq {
struct_attributes.push(quote! {
#[derive(PartialEq, Eq)]
)
};
let resolved_derive = if let Some(span) = resolved {
quote_spanned!(
});
}
if let Some(span) = resolved {
struct_attributes.push(quote_spanned! {
span =>
#[derive(turbo_tasks::ResolvedValue)]
)
} else {
quote!()
};
});
}

let new_value_type = match serialization_mode {
SerializationMode::None => quote! {
Expand Down Expand Up @@ -449,10 +446,7 @@ pub fn value(args: TokenStream, input: TokenStream) -> TokenStream {
);

let expanded = quote! {
#derive
#debug_derive
#eq_derive
#resolved_derive
#(#struct_attributes)*
#item

impl #ident {
Expand Down
1 change: 1 addition & 0 deletions turbopack/crates/turbo-tasks-memory/tests/shrink_to_fit.rs
27 changes: 27 additions & 0 deletions turbopack/crates/turbo-tasks-testing/tests/shrink_to_fit.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
#![feature(arbitrary_self_types)]
#![feature(arbitrary_self_types_pointers)]
#![allow(clippy::needless_return)] // tokio macro-generated code doesn't respect this

use anyhow::Result;
use turbo_tasks::Vc;
use turbo_tasks_testing::{register, run, Registration};

static REGISTRATION: Registration = register!();

#[turbo_tasks::value(transparent)]
struct Wrapper(Vec<u32>);

#[tokio::test]
async fn test_shrink_to_fit() -> Result<()> {
run(&REGISTRATION, || async {
// `Vec::shrink_to_fit` is implicitly called when a cell is constructed.
let a: Vc<Wrapper> = Vc::cell(Vec::with_capacity(100));
assert_eq!(a.await?.capacity(), 0);

let b: Vc<Wrapper> = Vc::local_cell(Vec::with_capacity(100));
assert_eq!(b.await?.capacity(), 0);

Ok(())
})
.await
}
2 changes: 2 additions & 0 deletions turbopack/crates/turbo-tasks/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,7 @@ mod read_ref;
pub mod registry;
mod scope;
mod serialization_invalidation;
mod shrink_to_fit;
pub mod small_duration;
mod state;
pub mod task;
Expand Down Expand Up @@ -111,6 +112,7 @@ pub use read_ref::ReadRef;
use rustc_hash::FxHasher;
pub use scope::scope;
pub use serialization_invalidation::SerializationInvalidator;
pub use shrink_to_fit::ShrinkToFit;
pub use state::{State, TransientState};
pub use task::{task_input::TaskInput, SharedReference, TypedSharedReference};
pub use trait_ref::{IntoTraitRef, TraitRef};
Expand Down
63 changes: 61 additions & 2 deletions turbopack/crates/turbo-tasks/src/macro_helpers.rs
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
//! Runtime helpers for [turbo-tasks-macro].
use std::ops::{Deref, DerefMut};

pub use async_trait::async_trait;
pub use once_cell::sync::{Lazy, OnceCell};
pub use serde;
Expand All @@ -9,8 +11,8 @@ pub use super::{
manager::{find_cell_by_type, notify_scheduled_tasks, spawn_detached_for_testing},
};
use crate::{
debug::ValueDebugFormatString, task::TaskOutput, RawVc, ResolvedValue, TaskInput,
TaskPersistence, Vc,
debug::ValueDebugFormatString, shrink_to_fit::ShrinkToFit, task::TaskOutput, RawVc,
ResolvedValue, TaskInput, TaskPersistence, Vc,
};

#[inline(never)]
Expand Down Expand Up @@ -56,3 +58,60 @@ macro_rules! stringify_path {
stringify!($path)
};
}

/// A wrapper type that uses the [autoderef specialization hack][autoderef] to call
/// [`ShrinkToFit::shrink_to_fit`] on types that implement [`ShrinkToFit`].
///
/// This uses a a no-op method [`ShrinkToFitFallbackNoop::shrink_to_fit`] on types that do not
/// implement [`ShrinkToFit`].
///
/// This is used by the derive macro for [`ShrinkToFit`], which is called by the
/// [turbo_tasks::value][crate::value] macro.
///
/// [autoderef]: http://lukaskalbertodt.github.io/2019/12/05/generalized-autoref-based-specialization.html
pub struct ShrinkToFitDerefSpecialization<'a, T> {
inner: ShrinkToFitFallbackNoop<'a, T>,
}

impl<'a, T> ShrinkToFitDerefSpecialization<'a, T> {
pub fn new(real: &'a mut T) -> Self {
Self {
inner: ShrinkToFitFallbackNoop { real },
}
}
}

impl<T> ShrinkToFitDerefSpecialization<'_, T>
where
T: ShrinkToFit,
{
pub fn shrink_to_fit(&mut self) {
// call the real `ShrinkToFit::shrink_to_fit` method
self.inner.real.shrink_to_fit()
}
}

impl<'a, T> Deref for ShrinkToFitDerefSpecialization<'a, T> {
type Target = ShrinkToFitFallbackNoop<'a, T>;

fn deref(&self) -> &Self::Target {
&self.inner
}
}

impl<T> DerefMut for ShrinkToFitDerefSpecialization<'_, T> {
fn deref_mut(&mut self) -> &mut Self::Target {
&mut self.inner
}
}

// Implements `ShrinkToFit` using a no-op `ShrinkToFit::shrink_to_fit` method.
pub struct ShrinkToFitFallbackNoop<'a, T> {
real: &'a mut T,
}

impl<T> ShrinkToFitFallbackNoop<'_, T> {
/// A no-op function called as part of [`ShrinkToFitDerefSpecialization`] when `T` does not
/// implement [`ShrinkToFit`].
pub fn shrink_to_fit(&mut self) {}
}
Loading

0 comments on commit 5d3bc2b

Please sign in to comment.