perf(turbo-tasks): Call .shrink_to_fit() on common collection types…

… when constructing a cell (#72113) Cell contents are immutable once constructed, so there's no chance that they'll grow in size again. Common collections can be shrunk to avoid storing empty spare capacity in this case (note: if they're already correctly sized, `shrink_to_fit` bails out early). **Result:** This gives a ~1.4% decrease in top-line peak memory consumption, for a theoretical CPU/Wall time cost that's too small to measure. **Inspiration:** The inspiration for this was vercel/turborepo#2873, which decreased task storage (not the top-line memory usage?) by ~14%, vercel/turborepo#8657, and other similar optimization PRs. ## Additional Opportunities - There may be more places where cell are constructed (e.g. persistent storage deserialization) where a cell's `SharedReference` is constructed that is not currently captured by this. - Depending on the library used, deserialization may already construct exact-sized collections. - As an additional example, manually constructing a `ReadRef` and converting it into a cell skips this optimization because `ReadRef::cell` internally uses the type-erased shared-reference `raw_cell` API which is incompatible with this optimization. We could special-case that in the `ReadRef::new_owned` constructor (not in `ReadRef::new_arc` though), but nobody should be manually constructing `ReadRef`s. - We still don't use `shrink_to_fit` on `RcStr` types. Some of these are in-place extended (when they have a refcount of 1) with `RcStr::map`, so we probably don't want to be too aggressive about this to avoid `O(n^2)` time complexity blowups. ## Memory Benchmark Setup ```bash cd ~/next.js cargo run -p next-build-test --release -- generate ~/shadcn-ui/apps/www/ > ~/shadcn-ui/apps/www/project_options.json pnpm pack-next --project ~/shadcn-ui/apps/www/ ``` ```bash cd ~/shadcn-ui pnpm i cd ~/shadcn-ui/apps/www/ heaptrack ~/next.js/target/release/next-build-test run sequential 1 1 '/sink' heaptrack --analyze ~/shadcn-ui/apps/www/heaptrack.next-build-test.3604648.zst ``` ### Memory Before (canary branch) First Run: ``` peak heap memory consumption: 3.23G peak RSS (including heaptrack overhead): 4.75G ``` Second Run: ``` peak heap memory consumption: 3.23G peak RSS (including heaptrack overhead): 4.75G ``` ### Memory After (this PR) First Run: ``` peak heap memory consumption: 3.18G peak RSS (including heaptrack overhead): 4.74G ``` Second Run: ``` peak heap memory consumption: 3.19G peak RSS (including heaptrack overhead): 4.73G ``` This is about a 1.4% decrease in top-line memory consumption. ## Wall Time with `hyperfine` (Counter-Metric) This is theoretically a time-memory tradeoff, as we'll spend some time `memcpy`ing things into smaller allocations, though in some cases reducing memory usage can improve cache locality, so it's not always obvious. ``` hyperfine --warmup 3 -r 30 --time-unit millisecond '~/next.js/target/release/next-build-test run sequential 1 1 /sink' ``` This benchmark is slow and takes about 30 minutes to run. Before: ``` Benchmark 1: ~/next.js/target/release/next-build-test run sequential 1 1 /sink Time (mean ± σ): 56387.5 ms ± 212.6 ms [User: 107807.5 ms, System: 9509.8 ms] Range (min … max): 55934.4 ms … 56872.9 ms 30 runs ``` After: ``` Benchmark 1: ~/next.js/target/release/next-build-test run sequential 1 1 /sink Time (mean ± σ): 56020.9 ms ± 235.4 ms [User: 107483.8 ms, System: 9371.8 ms] Range (min … max): 55478.2 ms … 56563.6 ms 30 runs ``` This is a ~0.65% *reduction* in wall time. This is small enough (<2 standard deviations) to likely just be noise. ## Wall Time with `turbopack-bench` (Counter-Metric) ``` cargo bench -p turbopack-bench -p turbopack-cli -- "hmr_to_eval/Turbopack CSR" ``` Gives: ``` bench_hmr_to_eval/Turbopack CSR/1000 modules time: [15.123 ms 15.208 ms 15.343 ms] change: [-0.8471% +0.4882% +1.9719%] (p = 0.55 > 0.05) No change in performance detected. ``` Using https://github.com/bgw/benchmark-scripts/ In practice, it's not really possible to measure changes in wall time <1%, so this is within "noise" territory (as noted in the criterion output). Closes PACK-3361
vercel · Nov 1, 2024 · 5d3bc2b · 5d3bc2b
1 parent a44a0d9
commit 5d3bc2b
Show file tree

Hide file tree

Showing 17 changed files with 357 additions and 54 deletions.
diff --git a/turbopack/crates/turbo-tasks-backend/tests/shrink_to_fit.rs b/turbopack/crates/turbo-tasks-backend/tests/shrink_to_fit.rs
@@ -0,0 +1 @@
+../../turbo-tasks-testing/tests/shrink_to_fit.rs
diff --git a/turbopack/crates/turbo-tasks-macros-shared/src/primitive_input.rs b/turbopack/crates/turbo-tasks-macros-shared/src/primitive_input.rs
@@ -1,16 +1,47 @@
+use proc_macro2::Span;
 use syn::{
     parse::{Parse, ParseStream},
-    Result, Type,
+    punctuated::Punctuated,
+    spanned::Spanned,
+    Meta, Result, Token, Type,
 };
 
 #[derive(Debug)]
 pub struct PrimitiveInput {
     pub ty: Type,
+    pub manual_shrink_to_fit: Option<Span>,
 }
 
 impl Parse for PrimitiveInput {
     fn parse(input: ParseStream) -> Result<Self> {
         let ty: Type = input.parse()?;
-        Ok(PrimitiveInput { ty })
+        let mut parsed_input = PrimitiveInput {
+            ty,
+            manual_shrink_to_fit: None,
+        };
+        if input.parse::<Option<Token![,]>>()?.is_some() {
+            let punctuated: Punctuated<Meta, Token![,]> = input.parse_terminated(Meta::parse)?;
+            for meta in punctuated {
+                match (
+                    meta.path()
+                        .get_ident()
+                        .map(ToString::to_string)
+                        .as_deref()
+                        .unwrap_or_default(),
+                    &meta,
+                ) {
+                    ("manual_shrink_to_fit", Meta::Path(_)) => {
+                        parsed_input.manual_shrink_to_fit = Some(meta.span())
+                    }
+                    (_, meta) => {
+                        return Err(syn::Error::new_spanned(
+                            meta,
+                            "unexpected token, expected: \"manual_shrink_to_fit\"",
+                        ))
+                    }
+                }
+            }
+        }
+        Ok(parsed_input)
     }
 }
diff --git a/turbopack/crates/turbo-tasks-macros/src/derive/mod.rs b/turbopack/crates/turbo-tasks-macros/src/derive/mod.rs
@@ -1,6 +1,7 @@
 mod deterministic_hash_macro;
 mod key_value_pair_macro;
 mod resolved_value_macro;
+mod shrink_to_fit_macro;
 mod task_input_macro;
 mod trace_raw_vcs_macro;
 mod value_debug_format_macro;
@@ -9,6 +10,7 @@ mod value_debug_macro;
 pub use deterministic_hash_macro::derive_deterministic_hash;
 pub use key_value_pair_macro::derive_key_value_pair;
 pub use resolved_value_macro::derive_resolved_value;
+pub use shrink_to_fit_macro::derive_shrink_to_fit;
 use syn::{spanned::Spanned, Attribute, Meta, MetaList, NestedMeta};
 pub use task_input_macro::derive_task_input;
 pub use trace_raw_vcs_macro::derive_trace_raw_vcs;

diff --git a/turbopack/crates/turbo-tasks-macros/src/derive/shrink_to_fit_macro.rs b/turbopack/crates/turbo-tasks-macros/src/derive/shrink_to_fit_macro.rs
@@ -0,0 +1,53 @@
+use proc_macro::TokenStream;
+use proc_macro2::TokenStream as TokenStream2;
+use quote::quote;
+use syn::{parse_macro_input, DeriveInput, FieldsNamed, FieldsUnnamed};
+use turbo_tasks_macros_shared::{generate_exhaustive_destructuring, match_expansion};
+
+pub fn derive_shrink_to_fit(input: TokenStream) -> TokenStream {
+    let derive_input = parse_macro_input!(input as DeriveInput);
+    let ident = &derive_input.ident;
+    let (impl_generics, ty_generics, where_clause) = derive_input.generics.split_for_impl();
+
+    let shrink_items = match_expansion(&derive_input, &shrink_named, &shrink_unnamed, &shrink_unit);
+    quote! {
+        impl #impl_generics turbo_tasks::ShrinkToFit for #ident #ty_generics #where_clause {
+            fn shrink_to_fit(&mut self) {
+                #shrink_items
+            }
+        }
+    }
+    .into()
+}
+
+fn shrink_named(_ident: TokenStream2, fields: &FieldsNamed) -> (TokenStream2, TokenStream2) {
+    let (captures, fields_idents) = generate_exhaustive_destructuring(fields.named.iter());
+    (
+        captures,
+        quote! {
+            {#(
+                turbo_tasks::macro_helpers::ShrinkToFitDerefSpecialization::new(
+                    #fields_idents,
+                ).shrink_to_fit();
+            )*}
+        },
+    )
+}
+
+fn shrink_unnamed(_ident: TokenStream2, fields: &FieldsUnnamed) -> (TokenStream2, TokenStream2) {
+    let (captures, fields_idents) = generate_exhaustive_destructuring(fields.unnamed.iter());
+    (
+        captures,
+        quote! {
+            {#(
+                turbo_tasks::macro_helpers::ShrinkToFitDerefSpecialization::new(
+                    #fields_idents,
+                ).shrink_to_fit();
+            )*}
+        },
+    )
+}
+
+fn shrink_unit(_ident: TokenStream2) -> TokenStream2 {
+    quote! { { } }
+}
diff --git a/turbopack/crates/turbo-tasks-macros/src/lib.rs b/turbopack/crates/turbo-tasks-macros/src/lib.rs
@@ -22,6 +22,11 @@ pub fn derive_trace_raw_vcs_attr(input: TokenStream) -> TokenStream {
     derive::derive_trace_raw_vcs(input)
 }
 
+#[proc_macro_derive(ShrinkToFit, attributes(turbo_tasks))]
+pub fn derive_shrink_to_fit(input: TokenStream) -> TokenStream {
+    derive::derive_shrink_to_fit(input)
+}
+
 #[proc_macro_derive(ResolvedValue, attributes(turbo_tasks))]
 pub fn derive_resolved_value_attr(input: TokenStream) -> TokenStream {
     derive::derive_resolved_value(input)

diff --git a/turbopack/crates/turbo-tasks-macros/src/primitive_macro.rs b/turbopack/crates/turbo-tasks-macros/src/primitive_macro.rs
@@ -16,6 +16,16 @@ pub fn primitive(input: TokenStream) -> TokenStream {
         .into();
     };
 
+    let value_shrink_to_fit_impl = if input.manual_shrink_to_fit.is_none() {
+        Some(quote! {
+            impl turbo_tasks::ShrinkToFit for #ty {
+                fn shrink_to_fit(&mut self) {}
+            }
+        })
+    } else {
+        None
+    };
+
     let value_debug_impl = quote! {
         #[turbo_tasks::value_impl]
         impl turbo_tasks::debug::ValueDebug for #ty {
@@ -62,7 +72,7 @@ pub fn primitive(input: TokenStream) -> TokenStream {
         #value_type_and_register
 
         #value_debug_impl
-
+        #value_shrink_to_fit_impl
         #value_default_impl
     }
     .into()

diff --git a/turbopack/crates/turbo-tasks-macros/src/value_macro.rs b/turbopack/crates/turbo-tasks-macros/src/value_macro.rs
@@ -349,49 +349,46 @@ pub fn value(args: TokenStream, input: TokenStream) -> TokenStream {
         quote! {}
     };
 
-    let derive = match serialization_mode {
+    let mut struct_attributes = vec![quote! {
+        #[derive(turbo_tasks::ShrinkToFit, turbo_tasks::trace::TraceRawVcs)]
+    }];
+    match serialization_mode {
+        SerializationMode::Auto | SerializationMode::AutoForInput => {
+            struct_attributes.push(quote! {
+                #[derive(
+                    turbo_tasks::macro_helpers::serde::Serialize,
+                    turbo_tasks::macro_helpers::serde::Deserialize,
+                )]
+                #[serde(crate = "turbo_tasks::macro_helpers::serde")]
+            })
+        }
         SerializationMode::None | SerializationMode::Custom | SerializationMode::CustomForInput => {
-            quote! {
-                #[derive(turbo_tasks::trace::TraceRawVcs)]
-            }
         }
-        SerializationMode::Auto | SerializationMode::AutoForInput => quote! {
-            #[derive(
-                turbo_tasks::trace::TraceRawVcs,
-                turbo_tasks::macro_helpers::serde::Serialize,
-                turbo_tasks::macro_helpers::serde::Deserialize,
-            )]
-            #[serde(crate = "turbo_tasks::macro_helpers::serde")]
-        },
     };
-    let debug_derive = if inner_type.is_some() {
+    if inner_type.is_some() {
         // Transparent structs have their own manual `ValueDebug` implementation.
-        quote! {
+        struct_attributes.push(quote! {
             #[repr(transparent)]
-        }
+        });
     } else {
-        quote! {
+        struct_attributes.push(quote! {
             #[derive(
                 turbo_tasks::debug::ValueDebugFormat,
                 turbo_tasks::debug::internal::ValueDebug,
             )]
-        }
-    };
-    let eq_derive = if manual_eq {
-        quote!()
-    } else {
-        quote!(
+        });
+    }
+    if !manual_eq {
+        struct_attributes.push(quote! {
             #[derive(PartialEq, Eq)]
-        )
-    };
-    let resolved_derive = if let Some(span) = resolved {
-        quote_spanned!(
+        });
+    }
+    if let Some(span) = resolved {
+        struct_attributes.push(quote_spanned! {
             span =>
             #[derive(turbo_tasks::ResolvedValue)]
-        )
-    } else {
-        quote!()
-    };
+        });
+    }
 
     let new_value_type = match serialization_mode {
         SerializationMode::None => quote! {
@@ -449,10 +446,7 @@ pub fn value(args: TokenStream, input: TokenStream) -> TokenStream {
     );
 
     let expanded = quote! {
-        #derive
-        #debug_derive
-        #eq_derive
-        #resolved_derive
+        #(#struct_attributes)*
         #item
 
         impl #ident {

diff --git a/turbopack/crates/turbo-tasks-memory/tests/shrink_to_fit.rs b/turbopack/crates/turbo-tasks-memory/tests/shrink_to_fit.rs
@@ -0,0 +1 @@
+../../turbo-tasks-testing/tests/shrink_to_fit.rs
diff --git a/turbopack/crates/turbo-tasks-testing/tests/shrink_to_fit.rs b/turbopack/crates/turbo-tasks-testing/tests/shrink_to_fit.rs
@@ -0,0 +1,27 @@
+#![feature(arbitrary_self_types)]
+#![feature(arbitrary_self_types_pointers)]
+#![allow(clippy::needless_return)] // tokio macro-generated code doesn't respect this
+
+use anyhow::Result;
+use turbo_tasks::Vc;
+use turbo_tasks_testing::{register, run, Registration};
+
+static REGISTRATION: Registration = register!();
+
+#[turbo_tasks::value(transparent)]
+struct Wrapper(Vec<u32>);
+
+#[tokio::test]
+async fn test_shrink_to_fit() -> Result<()> {
+    run(&REGISTRATION, || async {
+        // `Vec::shrink_to_fit` is implicitly called when a cell is constructed.
+        let a: Vc<Wrapper> = Vc::cell(Vec::with_capacity(100));
+        assert_eq!(a.await?.capacity(), 0);
+
+        let b: Vc<Wrapper> = Vc::local_cell(Vec::with_capacity(100));
+        assert_eq!(b.await?.capacity(), 0);
+
+        Ok(())
+    })
+    .await
+}
diff --git a/turbopack/crates/turbo-tasks/src/lib.rs b/turbopack/crates/turbo-tasks/src/lib.rs
@@ -66,6 +66,7 @@ mod read_ref;
 pub mod registry;
 mod scope;
 mod serialization_invalidation;
+mod shrink_to_fit;
 pub mod small_duration;
 mod state;
 pub mod task;
@@ -111,6 +112,7 @@ pub use read_ref::ReadRef;
 use rustc_hash::FxHasher;
 pub use scope::scope;
 pub use serialization_invalidation::SerializationInvalidator;
+pub use shrink_to_fit::ShrinkToFit;
 pub use state::{State, TransientState};
 pub use task::{task_input::TaskInput, SharedReference, TypedSharedReference};
 pub use trait_ref::{IntoTraitRef, TraitRef};

diff --git a/turbopack/crates/turbo-tasks/src/macro_helpers.rs b/turbopack/crates/turbo-tasks/src/macro_helpers.rs
@@ -1,4 +1,6 @@
 //! Runtime helpers for [turbo-tasks-macro].
+use std::ops::{Deref, DerefMut};
+
 pub use async_trait::async_trait;
 pub use once_cell::sync::{Lazy, OnceCell};
 pub use serde;
@@ -9,8 +11,8 @@ pub use super::{
     manager::{find_cell_by_type, notify_scheduled_tasks, spawn_detached_for_testing},
 };
 use crate::{
-    debug::ValueDebugFormatString, task::TaskOutput, RawVc, ResolvedValue, TaskInput,
-    TaskPersistence, Vc,
+    debug::ValueDebugFormatString, shrink_to_fit::ShrinkToFit, task::TaskOutput, RawVc,
+    ResolvedValue, TaskInput, TaskPersistence, Vc,
 };
 
 #[inline(never)]
@@ -56,3 +58,60 @@ macro_rules! stringify_path {
         stringify!($path)
     };
 }
+
+/// A wrapper type that uses the [autoderef specialization hack][autoderef] to call
+/// [`ShrinkToFit::shrink_to_fit`] on types that implement [`ShrinkToFit`].
+///
+/// This uses a a no-op method [`ShrinkToFitFallbackNoop::shrink_to_fit`] on types that do not
+/// implement [`ShrinkToFit`].
+///
+/// This is used by the derive macro for [`ShrinkToFit`], which is called by the
+/// [turbo_tasks::value][crate::value] macro.
+///
+/// [autoderef]: http://lukaskalbertodt.github.io/2019/12/05/generalized-autoref-based-specialization.html
+pub struct ShrinkToFitDerefSpecialization<'a, T> {
+    inner: ShrinkToFitFallbackNoop<'a, T>,
+}
+
+impl<'a, T> ShrinkToFitDerefSpecialization<'a, T> {
+    pub fn new(real: &'a mut T) -> Self {
+        Self {
+            inner: ShrinkToFitFallbackNoop { real },
+        }
+    }
+}
+
+impl<T> ShrinkToFitDerefSpecialization<'_, T>
+where
+    T: ShrinkToFit,
+{
+    pub fn shrink_to_fit(&mut self) {
+        // call the real `ShrinkToFit::shrink_to_fit` method
+        self.inner.real.shrink_to_fit()
+    }
+}
+
+impl<'a, T> Deref for ShrinkToFitDerefSpecialization<'a, T> {
+    type Target = ShrinkToFitFallbackNoop<'a, T>;
+
+    fn deref(&self) -> &Self::Target {
+        &self.inner
+    }
+}
+
+impl<T> DerefMut for ShrinkToFitDerefSpecialization<'_, T> {
+    fn deref_mut(&mut self) -> &mut Self::Target {
+        &mut self.inner
+    }
+}
+
+// Implements `ShrinkToFit` using a no-op `ShrinkToFit::shrink_to_fit` method.
+pub struct ShrinkToFitFallbackNoop<'a, T> {
+    real: &'a mut T,
+}
+
+impl<T> ShrinkToFitFallbackNoop<'_, T> {
+    /// A no-op function called as part of [`ShrinkToFitDerefSpecialization`] when `T` does not
+    /// implement [`ShrinkToFit`].
+    pub fn shrink_to_fit(&mut self) {}
+}