Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing caching for HRTB projection equality bounds (for<'x> T: Trait<'x, Assoc = ...>). #99188

Open
eddyb opened this issue Jul 12, 2022 · 19 comments
Labels
A-associated-items Area: Associated items (types, constants & functions) A-higher-ranked Area: Higher-ranked things (e.g., lifetimes, types, trait bounds aka HRTBs) A-trait-system Area: Trait system I-compiletime Issue: Problems and improvements with respect to compile times. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-types Relevant to the types team, which will review and decide on the PR/issue.

Comments

@eddyb
Copy link
Member

eddyb commented Jul 12, 2022

Originally reduced from a production application using the tower crate - see @fasterthanlime's reduced repro repo for more background (though note that it exhibits several other failure modes as well)

trait Trait<'a> {
    type A;
    type B;
    type C;
    type D;

    fn method() {}
}

impl<T> Trait<'_> for &'_ T
where
    for<'x> T: Trait<'x, A = (), B = (), C = (), D = ()>,
{
    type A = ();
    type B = ();
    type C = ();
    type D = ();
}

impl Trait<'_> for () {
    type A = ();
    type B = ();
    type C = ();
    type D = ();
}

pub fn main() {
    #[cfg(depth = "7")]
    <&&&&&&&() as Trait>::method();
    #[cfg(depth = "8")]
    <&&&&&&&&() as Trait>::method();
    #[cfg(depth = "9")]
    <&&&&&&&&&() as Trait>::method();
    #[cfg(depth = "10")]
    <&&&&&&&&&&() as Trait>::method();
}

The above example currently takes an exponential amount of time to compile, based on the type depth:

$ curl -O https://gist.githubusercontent.com/eddyb/cd4221f14fff265280d135ddce5c9712/raw/a17cbf00af1894756d2b1bdd2e838cdd4db2bbe2/proj-exp.rs
$ command time -f 'took %Us' rustc proj-exp.rs --emit=metadata --cfg 'depth = "7"'
took 0.74s
$ command time -f 'took %Us' rustc proj-exp.rs --emit=metadata --cfg 'depth = "8"'
took 2.98s
$ command time -f 'took %Us' rustc proj-exp.rs --emit=metadata --cfg 'depth = "9"'
took 11.92s
$ command time -f 'took %Us' rustc proj-exp.rs --emit=metadata --cfg 'depth = "10"'
took 51.10s

With every extra type layer, the time increases by ~4x, and that aligns well with there being 4 associated types.

While this example is a bit silly, it doesn't take more than two associated types (both constrained at once) to cause the issue (although at higher depth or with a larger constant factor from having additional bounds).

And I'm guessing it might be one of the main causes for applications built with the tower crate to experience long compile times (its main trait, Service, has Response and Error as two associated types), in large part because that's the kind of codebase this was originally reduced from (as per the note at the top of this issue).


The reason I suspect caching is a combination of factors:

  • instrumenting evaluate_predicate_recursively found an exponential ramp in terms of duplicates (i.e. the number of times each unique obligation shows up), and many of them were ProjectionPredicates
    • this was done via -Z self-profile, though RUSTC_LOG might also work (and not require compiler changes)
  • many of those ProjectionPredicates' had a bound lifetime listed in their Binder
  • attempting to create a ProjectionCacheKey returns None iff there are bound variables
    • the comment is a bit worrying, given that it e.g. talks about "escaping regions" but checks for bounds ones
    • (as an aside, ProjectionCacheKey not holding a ParamEnv is probably risky long-term)
    • I've talked to @nikomatsakis and we're unsure if this is a historical artifact - my best guess is that it might have something to do with the fact that normalizing under binders didn't use to work (maybe a micro-opt to not even bother trying to cache those cases? were they that common?)
  • replacing the for<'x> T: Trait<'x, ... with T: Trait<'static, ... removes the exponential slowdown

So the next step was to to try always caching (warning: this is actually an unsound quick hack, ProjectionCacheKey should be modified to carry a Binder<ProjectionTy> instead of a ProjectionTy):

diff --git a/compiler/rustc_trait_selection/src/traits/project.rs b/compiler/rustc_trait_selection/src/traits/project.rs
index b3e7fbb3578..5d5f2f67842 100644
--- a/compiler/rustc_trait_selection/src/traits/project.rs
+++ b/compiler/rustc_trait_selection/src/traits/project.rs
@@ -2123,7 +2123,7 @@ fn from_poly_projection_predicate(
         let infcx = selcx.infcx();
         // We don't do cross-snapshot caching of obligations with escaping regions,
         // so there's no cache key to use
-        predicate.no_bound_vars().map(|predicate| {
+        Some(predicate.skip_binder()).map(|predicate| {
             ProjectionCacheKey::new(
                 // We don't attempt to match up with a specific type-variable state
                 // from a specific call to `opt_normalize_projection_type` - if

However, the above hack does not appear to help and we're unsure why - the main other kind of Predicate that evaluate_predicate_recursively appears to process is TraitPredicate, which IIUC is almost always locally cached?

(EDIT: turns out that the caching is more complex than initially assumed and requires further changes - see #99188 (comment) for a success)

Then again, there are other reductions that don't involve ProjectionPredicate and still cause their own exponential curves, so there might be several different caching issues.


I primarily wanted to open this issue on its own because there's definitely something weird with ProjectionCacheKey (even if a proper fix might be way more involved than just it), and there's a clear correlation between the number of associated types (being constrained) and the exponential curve.

cc @rust-lang/types

@eddyb eddyb added A-trait-system Area: Trait system I-compiletime Issue: Problems and improvements with respect to compile times. A-associated-items Area: Associated items (types, constants & functions) labels Jul 12, 2022
@fasterthanlime
Copy link
Contributor

👋 I'm interested in tackling this since I did the original discovery work (with @eddyb and @BoxyUwU's help). I'm gonna need a few pointers going in though 😌

@jackh726
Copy link
Member

This is weird. I think this is one of those cases where I wish we had between logging tools for capturing how traits are solved.

What are the projection predicates that we try to cache? Probably add a debug call to from_poly_projection_predicate and then aggregate all those?

I wonder if this is caused by the for<'x> T: Trait<'x, A = (), B = (), C = (), D = ()> bound. I could see a world where we have to solve for<'x> T: Trait<'x> and for<'a> <T as Trait<'x>>::A = () (which then requires us to prove that for<'a> T: Trait<'x> again), etc.

Then, the caching might not be in projection predicates, but instead around proving for<'x> T: Trait<'x>

@fasterthanlime
Copy link
Contributor

Probably add a debug call to from_poly_projection_predicate and then aggregate all those?

I've noticed a lot of tracing usage in rustc is just "log events" (debug!("something happened");) rather than spans (let span = debug_span!("doing something"); let _guard = span.enter() or the tracing::instrument macro). This makes it hard to track parent/child relationships.

When I started looking into this, I adjusted the rustc_log crate to start sending spans jaeger via tracing-opentelemetry and opentelemetry-jaeger (instead of just to stdout via tracing-tree). It worked relatively well and I could see it being very useful for tracking this, but using spans instead of just log events will make that endeavor a lot more fruitful.

@jackh726
Copy link
Member

Well, I think we're getting better at adding in tracing::instrument annotations. It's slow moving, thought.

I have wanted to try out https://pernos.co/ for myself. Niko has tried it and liked it. Haven't convinced myself it's worth the expense right now.

Anyways, if you'd like to find some time to chat in sync about this, maybe that could work. It's hard to know exactly where to start probing without a RUSTC_LOG=rustc_trait_selection though.

@fasterthanlime
Copy link
Contributor

Anyways, if you'd like to find some time to chat in sync about this, maybe that could work. It's hard to know exactly where to start probing without a RUSTC_LOG=rustc_trait_selection though.

I don't really know for sure when I'll be available for sync chat in the next couple weeks but I did just post a full write-up about looking for that bug, that does include some tracing work, in case that's helpful: https://fasterthanli.me/articles/when-rustc-explodes

Happy to share any of the materials I've collected there anywhere that's helpful.

@jackh726
Copy link
Member

Just quickly skimmed to the RUSTC_LOG section. Those _s are inference variables. I wonder if these are the same or different. Running rustc with -Zverbose should help identify that. (That would explain why caching fails)

@eddyb
Copy link
Member Author

eddyb commented Jul 13, 2022

I was looking into why we get all that messy output, and it turns out that Predicate (which includes a Binder<PredicateKind>) and Binder<{Trait,Projection,...}Predicate> all implement fmt::Display.

So really, the problem is all in Obligation<O>, not the O it contains:

impl<'tcx, O: fmt::Debug> fmt::Debug for traits::Obligation<'tcx, O> {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
if ty::tls::with(|tcx| tcx.sess.verbose()) {
write!(
f,
"Obligation(predicate={:?}, cause={:?}, param_env={:?}, depth={})",
self.predicate, self.cause, self.param_env, self.recursion_depth
)
} else {
write!(f, "Obligation(predicate={:?}, depth={})", self.predicate, self.recursion_depth)
}
}
}

An easy thing to try is to just change predicate={:?} to {} (i.e. also remove the prefix, it's just noise), though maybe we want a fmt::Display impl for Obligation<O>, instead?

Also, there have to be better ways to print something than Foo(a={a}, b={b}).

@eddyb
Copy link
Member Author

eddyb commented Jul 13, 2022

Just quickly skimmed to the RUSTC_LOG section. Those _s are inference variables. I wonder if these are the same or different.

I would expect those to be created in the leaves, by the uncached calls.

If you look further down, there's a list of evaluate_predicate_recursively inputs and their multiplicity, of which these are the last dozen lines (i.e. the most frequently duplicated ones):

   1706 Obligation(predicate=Binder(ProjectionPredicate(ProjectionTy { substs: [&(), ReLateBound(DebruijnIndex(0), BoundRegion { var: 0, kind: BrNamed(DefId(0:12 ~ main[b0d6]::{impl#0}::'x), 'x) })], item_def_id: DefId(0:5 ~ main[b0d6]::Trait::A) }, Ty(())), [Region(BrNamed(DefId(0:12 ~ main[b0d6]::{impl#0}::'x), 'x))]), depth=?)
   1706 Obligation(predicate=Binder(ProjectionPredicate(ProjectionTy { substs: [&(), ReLateBound(DebruijnIndex(0), BoundRegion { var: 0, kind: BrNamed(DefId(0:12 ~ main[b0d6]::{impl#0}::'x), 'x) })], item_def_id: DefId(0:6 ~ main[b0d6]::Trait::B) }, Ty(())), [Region(BrNamed(DefId(0:12 ~ main[b0d6]::{impl#0}::'x), 'x))]), depth=?)
   1706 Obligation(predicate=Binder(ProjectionPredicate(ProjectionTy { substs: [&(), ReLateBound(DebruijnIndex(0), BoundRegion { var: 0, kind: BrNamed(DefId(0:12 ~ main[b0d6]::{impl#0}::'x), 'x) })], item_def_id: DefId(0:7 ~ main[b0d6]::Trait::C) }, Ty(())), [Region(BrNamed(DefId(0:12 ~ main[b0d6]::{impl#0}::'x), 'x))]), depth=?)
   1706 Obligation(predicate=Binder(ProjectionPredicate(ProjectionTy { substs: [&(), ReLateBound(DebruijnIndex(0), BoundRegion { var: 0, kind: BrNamed(DefId(0:12 ~ main[b0d6]::{impl#0}::'x), 'x) })], item_def_id: DefId(0:8 ~ main[b0d6]::Trait::D) }, Ty(())), [Region(BrNamed(DefId(0:12 ~ main[b0d6]::{impl#0}::'x), 'x))]), depth=?)
   1706 Obligation(predicate=Binder(TraitPredicate(<&() as std::marker::Sized>, polarity:Positive), []), depth=?)
   1706 Obligation(predicate=Binder(TraitPredicate(<&() as Trait<'x>>, polarity:Positive), [Region(BrNamed(DefId(0:12 ~ main[b0d6]::{impl#0}::'x), 'x))]), depth=?)
   6826 Obligation(predicate=Binder(ProjectionPredicate(ProjectionTy { substs: [(), ReLateBound(DebruijnIndex(0), BoundRegion { var: 0, kind: BrNamed(DefId(0:12 ~ main[b0d6]::{impl#0}::'x), 'x) })], item_def_id: DefId(0:5 ~ main[b0d6]::Trait::A) }, Ty(())), [Region(BrNamed(DefId(0:12 ~ main[b0d6]::{impl#0}::'x), 'x))]), depth=?)
   6826 Obligation(predicate=Binder(ProjectionPredicate(ProjectionTy { substs: [(), ReLateBound(DebruijnIndex(0), BoundRegion { var: 0, kind: BrNamed(DefId(0:12 ~ main[b0d6]::{impl#0}::'x), 'x) })], item_def_id: DefId(0:6 ~ main[b0d6]::Trait::B) }, Ty(())), [Region(BrNamed(DefId(0:12 ~ main[b0d6]::{impl#0}::'x), 'x))]), depth=?)
   6826 Obligation(predicate=Binder(ProjectionPredicate(ProjectionTy { substs: [(), ReLateBound(DebruijnIndex(0), BoundRegion { var: 0, kind: BrNamed(DefId(0:12 ~ main[b0d6]::{impl#0}::'x), 'x) })], item_def_id: DefId(0:7 ~ main[b0d6]::Trait::C) }, Ty(())), [Region(BrNamed(DefId(0:12 ~ main[b0d6]::{impl#0}::'x), 'x))]), depth=?)
   6826 Obligation(predicate=Binder(ProjectionPredicate(ProjectionTy { substs: [(), ReLateBound(DebruijnIndex(0), BoundRegion { var: 0, kind: BrNamed(DefId(0:12 ~ main[b0d6]::{impl#0}::'x), 'x) })], item_def_id: DefId(0:8 ~ main[b0d6]::Trait::D) }, Ty(())), [Region(BrNamed(DefId(0:12 ~ main[b0d6]::{impl#0}::'x), 'x))]), depth=?)
   6826 Obligation(predicate=Binder(TraitPredicate(<() as Trait<'x>>, polarity:Positive), [Region(BrNamed(DefId(0:12 ~ main[b0d6]::{impl#0}::'x), 'x))]), depth=?)
   6828 Obligation(predicate=Binder(TraitPredicate(<() as std::marker::Sized>, polarity:Positive), []), depth=?)

My rough understanding is that if evaluate_predicate_recursively were to cache all predicates for some specific level, the next would only show up once, without the 4x factor.

However, what that list is missing is the exact nesting of evaluate_predicate_recursivelys, but the blog post shows some of that too, near the end.

Frankly, we should be able to just use a text-based nesting, and just read off the log without separate tools, once the printing is fixed, none of this has any reason to be as noisy as it is today.

@leijurv
Copy link
Contributor

leijurv commented Jul 14, 2022

Related to #20304?

@fasterthanlime
Copy link
Contributor

An easy thing to try is to just change predicate={:?} to {} (i.e. also remove the prefix, it's just noise), though maybe we want a fmt::Display impl for Obligation<O>, instead?

I applied this suggestion and this made the tracing-tree output more readable.

I generated outputs for levels 1 through 4 with commands like:

RUSTC_LOG_COLOR=never RUSTC_LOG="rustc_trait_selection::traits::select[evaluate_predicate_recursively]=debug" rustc +stage1 src/main.rs --emit=metadata --cfg 'depth = "4"' 2>&1 | tee /shared/depth4.txt

The depth 4 output is 25MB so it wouldn't upload on GitHub Gist.

@eddyb
Copy link
Member Author

eddyb commented Jul 14, 2022

Okay this is a bit confusing, but AFAICT the problem only starts to show up at depth 3.
&T in fn main causes T: Trait<A = (), ...> bounds, so the depths are off by 1.

And "depth 2" is still pretty limited in that it contains &(): Trait<...> bounds, each of which needs (): Trait<...> but since the latter cause no further bounds to be processed, it's indistinguishable from caching.

Depth 3 investigation

(click to open the cleaned up output for depth 3)
$ curl -s https://gist.githubusercontent.com/fasterthanlime/c3f5ab67fd6b12d198ef2dd841c31115/raw/756847298cd40474e6df1c8440805c8e884a0995/depth3.txt | rg 'evaluate_predicate_recursively' | sed -E 's/([0-9].* DEBUG )?rustc_\S*::\S* //;s/obligation=Obligation\(//;s/, depth=.*//;s/[┐│├─]/  /g;s/^  //'
(): Sized
(): Sized
(): Sized
for<'x> <() as Trait<'x>>::A == ()
for<'x> <() as Trait<'x>>::B == ()
for<'x> <() as Trait<'x>>::C == ()
for<'x> <() as Trait<'x>>::D == ()
for<'x> (): Trait<'x>
(): Termination
for<'x> <&&() as Trait<'x>>::A == ()
       for<'x> <&() as Trait<'x>>::A == ()
             for<'x> <() as Trait<'x>>::A == ()
             for<'x> <() as Trait<'x>>::B == ()
             for<'x> <() as Trait<'x>>::C == ()
             for<'x> <() as Trait<'x>>::D == ()
             (): Sized
             for<'x> (): Trait<'x>
       for<'x> <&() as Trait<'x>>::B == ()
             for<'x> <() as Trait<'x>>::A == ()
             for<'x> <() as Trait<'x>>::B == ()
             for<'x> <() as Trait<'x>>::C == ()
             for<'x> <() as Trait<'x>>::D == ()
             (): Sized
             for<'x> (): Trait<'x>
       for<'x> <&() as Trait<'x>>::C == ()
             for<'x> <() as Trait<'x>>::A == ()
             for<'x> <() as Trait<'x>>::B == ()
             for<'x> <() as Trait<'x>>::C == ()
             for<'x> <() as Trait<'x>>::D == ()
             (): Sized
             for<'x> (): Trait<'x>
       for<'x> <&() as Trait<'x>>::D == ()
             for<'x> <() as Trait<'x>>::A == ()
             for<'x> <() as Trait<'x>>::B == ()
             for<'x> <() as Trait<'x>>::C == ()
             for<'x> <() as Trait<'x>>::D == ()
             (): Sized
             for<'x> (): Trait<'x>
       &(): Sized
       for<'x> &(): Trait<'x>
                      for<'x> <() as Trait<'x>>::A == ()
                      for<'x> <() as Trait<'x>>::B == ()
                      for<'x> <() as Trait<'x>>::C == ()
                      for<'x> <() as Trait<'x>>::D == ()
                      (): Sized
                      for<'x> (): Trait<'x>
for<'x> <&&() as Trait<'x>>::B == ()
       for<'x> <&() as Trait<'x>>::A == ()
             for<'x> <() as Trait<'x>>::A == ()
             for<'x> <() as Trait<'x>>::B == ()
             for<'x> <() as Trait<'x>>::C == ()
             for<'x> <() as Trait<'x>>::D == ()
             (): Sized
             for<'x> (): Trait<'x>
       for<'x> <&() as Trait<'x>>::B == ()
             for<'x> <() as Trait<'x>>::A == ()
             for<'x> <() as Trait<'x>>::B == ()
             for<'x> <() as Trait<'x>>::C == ()
             for<'x> <() as Trait<'x>>::D == ()
             (): Sized
             for<'x> (): Trait<'x>
       for<'x> <&() as Trait<'x>>::C == ()
             for<'x> <() as Trait<'x>>::A == ()
             for<'x> <() as Trait<'x>>::B == ()
             for<'x> <() as Trait<'x>>::C == ()
             for<'x> <() as Trait<'x>>::D == ()
             (): Sized
             for<'x> (): Trait<'x>
       for<'x> <&() as Trait<'x>>::D == ()
             for<'x> <() as Trait<'x>>::A == ()
             for<'x> <() as Trait<'x>>::B == ()
             for<'x> <() as Trait<'x>>::C == ()
             for<'x> <() as Trait<'x>>::D == ()
             (): Sized
             for<'x> (): Trait<'x>
       &(): Sized
       for<'x> &(): Trait<'x>
for<'x> <&&() as Trait<'x>>::C == ()
       for<'x> <&() as Trait<'x>>::A == ()
             for<'x> <() as Trait<'x>>::A == ()
             for<'x> <() as Trait<'x>>::B == ()
             for<'x> <() as Trait<'x>>::C == ()
             for<'x> <() as Trait<'x>>::D == ()
             (): Sized
             for<'x> (): Trait<'x>
       for<'x> <&() as Trait<'x>>::B == ()
             for<'x> <() as Trait<'x>>::A == ()
             for<'x> <() as Trait<'x>>::B == ()
             for<'x> <() as Trait<'x>>::C == ()
             for<'x> <() as Trait<'x>>::D == ()
             (): Sized
             for<'x> (): Trait<'x>
       for<'x> <&() as Trait<'x>>::C == ()
             for<'x> <() as Trait<'x>>::A == ()
             for<'x> <() as Trait<'x>>::B == ()
             for<'x> <() as Trait<'x>>::C == ()
             for<'x> <() as Trait<'x>>::D == ()
             (): Sized
             for<'x> (): Trait<'x>
       for<'x> <&() as Trait<'x>>::D == ()
             for<'x> <() as Trait<'x>>::A == ()
             for<'x> <() as Trait<'x>>::B == ()
             for<'x> <() as Trait<'x>>::C == ()
             for<'x> <() as Trait<'x>>::D == ()
             (): Sized
             for<'x> (): Trait<'x>
       &(): Sized
       for<'x> &(): Trait<'x>
for<'x> <&&() as Trait<'x>>::D == ()
       for<'x> <&() as Trait<'x>>::A == ()
             for<'x> <() as Trait<'x>>::A == ()
             for<'x> <() as Trait<'x>>::B == ()
             for<'x> <() as Trait<'x>>::C == ()
             for<'x> <() as Trait<'x>>::D == ()
             (): Sized
             for<'x> (): Trait<'x>
       for<'x> <&() as Trait<'x>>::B == ()
             for<'x> <() as Trait<'x>>::A == ()
             for<'x> <() as Trait<'x>>::B == ()
             for<'x> <() as Trait<'x>>::C == ()
             for<'x> <() as Trait<'x>>::D == ()
             (): Sized
             for<'x> (): Trait<'x>
       for<'x> <&() as Trait<'x>>::C == ()
             for<'x> <() as Trait<'x>>::A == ()
             for<'x> <() as Trait<'x>>::B == ()
             for<'x> <() as Trait<'x>>::C == ()
             for<'x> <() as Trait<'x>>::D == ()
             (): Sized
             for<'x> (): Trait<'x>
       for<'x> <&() as Trait<'x>>::D == ()
             for<'x> <() as Trait<'x>>::A == ()
             for<'x> <() as Trait<'x>>::B == ()
             for<'x> <() as Trait<'x>>::C == ()
             for<'x> <() as Trait<'x>>::D == ()
             (): Sized
             for<'x> (): Trait<'x>
       &(): Sized
       for<'x> &(): Trait<'x>
&&(): Sized
for<'x> &&(): Trait<'x>
                for<'x> <&() as Trait<'x>>::A == ()
                      for<'x> <() as Trait<'x>>::A == ()
                      for<'x> <() as Trait<'x>>::B == ()
                      for<'x> <() as Trait<'x>>::C == ()
                      for<'x> <() as Trait<'x>>::D == ()
                      (): Sized
                      for<'x> (): Trait<'x>
                for<'x> <&() as Trait<'x>>::B == ()
                      for<'x> <() as Trait<'x>>::A == ()
                      for<'x> <() as Trait<'x>>::B == ()
                      for<'x> <() as Trait<'x>>::C == ()
                      for<'x> <() as Trait<'x>>::D == ()
                      (): Sized
                      for<'x> (): Trait<'x>
                for<'x> <&() as Trait<'x>>::C == ()
                      for<'x> <() as Trait<'x>>::A == ()
                      for<'x> <() as Trait<'x>>::B == ()
                      for<'x> <() as Trait<'x>>::C == ()
                      for<'x> <() as Trait<'x>>::D == ()
                      (): Sized
                      for<'x> (): Trait<'x>
                for<'x> <&() as Trait<'x>>::D == ()
                      for<'x> <() as Trait<'x>>::A == ()
                      for<'x> <() as Trait<'x>>::B == ()
                      for<'x> <() as Trait<'x>>::C == ()
                      for<'x> <() as Trait<'x>>::D == ()
                      (): Sized
                      for<'x> (): Trait<'x>
                &(): Sized
                for<'x> &(): Trait<'x>
for<'x> <&() as Trait<'x>>::A == ()
       for<'x> <() as Trait<'x>>::A == ()
       for<'x> <() as Trait<'x>>::B == ()
       for<'x> <() as Trait<'x>>::C == ()
       for<'x> <() as Trait<'x>>::D == ()
       (): Sized
       for<'x> (): Trait<'x>
for<'x> <&() as Trait<'x>>::B == ()
       for<'x> <() as Trait<'x>>::A == ()
       for<'x> <() as Trait<'x>>::B == ()
       for<'x> <() as Trait<'x>>::C == ()
       for<'x> <() as Trait<'x>>::D == ()
       (): Sized
       for<'x> (): Trait<'x>
for<'x> <&() as Trait<'x>>::C == ()
       for<'x> <() as Trait<'x>>::A == ()
       for<'x> <() as Trait<'x>>::B == ()
       for<'x> <() as Trait<'x>>::C == ()
       for<'x> <() as Trait<'x>>::D == ()
       (): Sized
       for<'x> (): Trait<'x>
for<'x> <&() as Trait<'x>>::D == ()
       for<'x> <() as Trait<'x>>::A == ()
       for<'x> <() as Trait<'x>>::B == ()
       for<'x> <() as Trait<'x>>::C == ()
       for<'x> <() as Trait<'x>>::D == ()
       (): Sized
       for<'x> (): Trait<'x>
&(): Sized
for<'x> &(): Trait<'x>

We can further simplify that output by removing non-projection bounds (and their children), and ::{C,D}:

for<'x> <() as Trait<'x>>::A == ()
for<'x> <() as Trait<'x>>::B == ()
for<'x> <&&() as Trait<'x>>::A == ()
    for<'x> <&() as Trait<'x>>::A == ()
        for<'x> <() as Trait<'x>>::A == ()
        for<'x> <() as Trait<'x>>::B == ()
    for<'x> <&() as Trait<'x>>::B == ()
        for<'x> <() as Trait<'x>>::A == ()
        for<'x> <() as Trait<'x>>::B == ()
for<'x> <&&() as Trait<'x>>::B == ()
    for<'x> <&() as Trait<'x>>::A == ()
        for<'x> <() as Trait<'x>>::A == ()
        for<'x> <() as Trait<'x>>::B == ()
    for<'x> <&() as Trait<'x>>::B == ()
        for<'x> <() as Trait<'x>>::A == ()
        for<'x> <() as Trait<'x>>::B == ()
for<'x> <&() as Trait<'x>>::A == ()
    for<'x> <() as Trait<'x>>::A == ()
    for<'x> <() as Trait<'x>>::B == ()
for<'x> <&() as Trait<'x>>::B == ()
    for<'x> <() as Trait<'x>>::A == ()
    for<'x> <() as Trait<'x>>::B == ()

Now that's still very messy, but mostly because of the syntax, and we can strip most of it out:

<()>::{A,B}
<&&()>::A
    <&()>::A
        <()>::{A,B}
    <&()>::B
        <()>::{A,B}
<&&()>::B
    <&()>::A
        <()>::{A,B}
    <&()>::B
        <()>::{A,B}
<&()>::A
    <()>::{A,B}
<&()>::B
    <()>::{A,B}

By removing all the leaves, we can find the "uncached" set (in the sense of evaluate_predicate_recursively recursing instead of being a leaf, though we can't 100% be sure where it's recursing from):

<&&()>::A
    <&()>::A
    <&()>::B
<&&()>::B
    <&()>::A
    <&()>::B
<&()>::A
<&()>::B

And that's the heart of the problem: there are 3 instances of each of <&()>::{A,B} (which tbc is the shorthand for for<'x> <&() as Trait<'x>>::{A,B} == ()), but once the first one completes, subsequent ones don't stop doing work (i.e. they each go deeper into evaluate_predicate_recursively for (): Trait<...> bounds).


Now this is all very messy and frankly hard to keep track of in the full log. What about 1 associated type?

for<'x> <() as Trait<'x>>::A == ()
for<'x> <&&() as Trait<'x>>::A == ()
    for<'x> <&() as Trait<'x>>::A == ()
        for<'x> <() as Trait<'x>>::A == ()
for<'x> <&() as Trait<'x>>::A == ()
    for<'x> <() as Trait<'x>>::A == ()

Now hang on a minute, that's triangular! (i.e. quadratic)

(As of posting this I'm still trying to demo that the single associated type case is non-linear, but it's tricky)

@eddyb
Copy link
Member Author

eddyb commented Jul 14, 2022

Single associated type example:

#![recursion_limit = "100000"]

trait Trait<'a> {
    type Assoc;

    fn method() {}
}

impl<T> Trait<'_> for (T,)
where
    for<'x> T: Trait<'x, Assoc = ()>,
{
    type Assoc = ();
}

impl Trait<'_> for () {
    type Assoc = ();
}

type _4<T> = ((((T,),),),);
type _16<T> = _4<_4<_4<_4<T>>>>;
type _64<T> = _16<_16<_16<_16<T>>>>;
type _128<T> = _64<_64<T>>;

pub fn main() {
    // Something that takes long enough to be measurable.
    type X<T> = _128<T>;

    #[cfg(factor = "1")]
    <X<()> as Trait>::method();
    #[cfg(factor = "2")]
    <X<X<()>> as Trait>::method();
    #[cfg(factor = "3")]
    <X<X<X<()>>> as Trait>::method();
    #[cfg(factor = "4")]
    <X<X<X<X<()>>>> as Trait>::method();
}
$ curl -O https://gist.githubusercontent.com/eddyb/d17d5303c544c19f78569392b635c813/raw/0aa1c947c91450f3d2e227b9d58d9ea19fe6ebf3/proj-tri.rs
$ command time -f 'took %Us' rustc proj-tri.rs --emit=metadata --cfg 'factor = "1"'
took 0.24s
$ command time -f 'took %Us' rustc proj-tri.rs --emit=metadata --cfg 'factor = "2"'
took 1.02s
$ command time -f 'took %Us' rustc proj-tri.rs --emit=metadata --cfg 'factor = "3"'
took 2.83s
$ command time -f 'took %Us' rustc proj-tri.rs --emit=metadata --cfg 'factor = "4"'
took 6.48s
Depth Time Relative time
X * 1 0.24s 1.00x
X * 2 1.02s 4.25x = 22.08x
X * 3 2.83s 11.79x = 32.25x
X * 4 6.48s 27.00x = 42.38x

So not only is it at least quadratic, but other effects accumulate and it could plausibly reach cubic (though it may take minutes/hours) and higher powers (not sure if this makes it "super-polynomial" - hard to know without figuring out the asymptote, i.e. do the powers converge?).

(Out of curiosity I also tried X * 16 and it comes out to 1809.29x = 162.71x but at 434.23s absolute time that's over 5 minutes of waiting, so I doubt there's much value in plotting the power anyway)

@eddyb
Copy link
Member Author

eddyb commented Jul 17, 2022

Okay I kept missing it but I was using some of the gnarly printing as an example for why we should replace Debug with Display almost everywhere, and @BoxyUwU pointed out RePlaceholder!

image

So for<'x> <T as Trait<'x>>::Assoc (where 'x is ReLateBound referring back to for<'x>) gets turned into <T as Trait<'x!n>>::Assoc (or w/e the pretty placeholder syntax is - where 'x!n is RePlaceholder in universe Un), because it's being normalized under a Binder (i.e. what poly_project_and_unify_type handles around its call to project_and_unify_type).

That's why the hack to remove the restriction does nothing: it's not getting cached in the polymorphic form!

I wonder if this is a fundamental limitation of normalizing under binders (cc @jackh726), or if we can actually have ProjectionCacheEntry::insert_ty get the originally Binder form instead of the placeholder'd one (and maybe have bound vars anonymized, so cache reuse is more likely?).

EDIT: so the code I had been looking at before was this:

// If the result is something that we can cache, then mark this
// entry as 'complete'. This will allow us to skip evaluating the
// subobligations at all the next time we evaluate the projection
// predicate.
self.infcx
.inner
.borrow_mut()
.projection_cache()
.complete(key, eval_rslt);

But that does nothing (other than print an info! message which would be easy to read) unless there is already a cache entry (presumably inserted by insert_ty called from deeper in project):

/// Mark the relevant projection cache key as having its derived obligations
/// complete, so they won't have to be re-computed (this is OK to do in a
/// snapshot - if the snapshot is rolled back, the obligations will be
/// marked as incomplete again).
pub fn complete(&mut self, key: ProjectionCacheKey<'tcx>, result: EvaluationResult) {
let mut map = self.map();
match map.get(&key) {
Some(&ProjectionCacheEntry::NormalizedTy { ref ty, complete: _ }) => {
info!("ProjectionCacheEntry::complete({:?}) - completing {:?}", key, ty);
let mut ty = ty.clone();
if result.must_apply_considering_regions() {
ty.obligations = vec![];
}
map.insert(key, ProjectionCacheEntry::NormalizedTy { ty, complete: Some(result) });
}
ref value => {
// Type inference could "strand behind" old cache entries. Leave
// them alone for now.
info!("ProjectionCacheEntry::complete({:?}) - ignoring {:?}", key, value);
}
};
}

@eddyb
Copy link
Member Author

eddyb commented Jul 17, 2022

So I made a few changes (branch is at eddyb@1ac2dca / eddyb@218932c / eddyb@63f5e68 at the time of posting), and then used RUSTC_LOG=rustc_infer::traits,rustc_trait_selection::traits to re-acquire depth = "3" data.

For viewing it I recommend further filtering, e.g.:
curl -s https://gist.githubusercontent.com/eddyb/0d1e41f2734f617054303927095c49b2/raw/e9ff12c2750f5a26850abb8bbec43721e9960d03/proj-ext-log-depth-3 | rg 'evaluate_predicate_recursively|project_and|rustc_infer' | less

Locally with RUSTC_LOG_COLOR=always I've taken a screenshot of the part I find relevant:
image

You can see how insert and complete take different keys.

Note that because we currently hide RePlaceholder vs ReLateBound and emit for<'x> for either, you won't be able to see that distinction without a patch like mine in eddyb@1ac2dca.

@eddyb
Copy link
Member Author

eddyb commented Jul 17, 2022

Pushed another commit (eddyb@20e2693 - note that if you want to cherry-pick you'll also need eddyb@63f5e68), with a hacky fix attempt (also cache under the original key if the result has no placeholders anywhere).

before:

$ command time -f 'took %Us' rustc +A proj-exp.rs --emit=metadata --cfg 'depth = "8"'
took 1.72s
$ command time -f 'took %Us' rustc +A proj-exp.rs --emit=metadata --cfg 'depth = "9"'
took 7.32s

after:

$ command time -f 'took %Us' rustc +A proj-exp.rs --emit=metadata --cfg 'depth = "8"'
took 0.05s
$ command time -f 'took %Us' rustc +A proj-exp.rs --emit=metadata --cfg 'depth = "9"'
took 0.05s

So finally we have a confirmed hypothesis! (though no idea if we can actually land such a fix)

@fasterthanlime
Copy link
Contributor

fasterthanlime commented Jul 17, 2022

I tried this on the real-world codebase where this bug was originally discovered.

cargo check

Without the fix (nightly 2022-06-08):

$ cargo clean -p fp-http && time cargo check -p fp-http
    Checking fp-http v0.1.0 (/home/amos/work/fly-proxy/crates/fp-http)
    Finished dev [unoptimized + debuginfo] target(s) in 13.32s
cargo check -p fp-http  12.45s user 0.92s system 100% cpu 13.370 total

With @eddyb's "hacky fix":

$ cargo +stage1 clean -p fp-http && time cargo +stage1 check -p fp-http
    Checking fp-http v0.1.0 (/home/amos/work/fly-proxy/crates/fp-http)
    Finished dev [unoptimized + debuginfo] target(s) in 4.28s
cargo +stage1 check -p fp-http  4.17s user 0.13s system 100% cpu 4.300 total

Note: the post-workaround version of that code (just don't have tower services that borrow anything) typechecks in 1.28s.

cargo build

Without the fix (nightly 2022-06-08):

$ cargo clean -p fp-http && time cargo build -p fp-http
   Compiling fp-http v0.1.0 (/home/amos/work/fly-proxy/crates/fp-http)
    Finished dev [unoptimized + debuginfo] target(s) in 1m 03s
cargo build -p fp-http  65.89s user 3.04s system 108% cpu 1:03.71 total

With @eddyb's "hacky fix":

$ cargo +stage1 clean -p fp-http && time cargo +stage1 build -p fp-http
   Compiling fp-http v0.1.0 (/home/amos/work/fly-proxy/crates/fp-http)
    Finished dev [unoptimized + debuginfo] target(s) in 46.68s
cargo +stage1 build -p fp-http  50.31s user 0.85s system 109% cpu 46.694 total

The post-workaround version of the code builds in 4.1s. Maybe other unfortunate cache interactions during codegen?

@eddyb
Copy link
Member Author

eddyb commented Jul 18, 2022

Removing --emit=metadata on the reduction doesn't show any noticeable slowdown between depths either.

So it's not just codegen in general, but something else in your crate that is somehow codegen-only.
Can you get another -Z self-profile? (Maybe re-apply that local patch of yours that collects eval_predicate_recursively?)
Could also try various stages of the reduction, too. But I suspect you might need to start from your proprietary codebase again, using the compiler w/ my hacky fix to filter out what's already fixed by it.

(This may be the first time that e.g. impl Trait actually comes into play, or maybe instead of T: for<'x> Trait<'x, Foo = MyFoo>, you have T: for<'x> Trait<'x, Foo = MyFoo<'x>> which my hacky fix doesn't cover)


Oh and one last thing: unless you build a "release" rustc (I've done this in separate workdirs sometimes for benchmarking), you won't actually get comparable numbers between nightly and a local build.
There's two main approaches I can think of:

  1. always compare between local builds (before/after a patch etc.)
    • not very convenient, though you may be able to use multiple workdirs
      • remember to symlink config.toml between workdirs to avoid accidental desync!
    • one trick that I never looked into, but might be quite handy, is making toolchain copies:
      $ mkdir toolchain-123 # maybe put the git hash in the name?
      $ DESTDIR=toolchain-123 ./x.py install --stage 1 compiler/rustc library/std
      $ # `/usr/local` is the default, can override with `[install] prefix = "..."` in `config.toml`
      $ rustup toolchain link 123 toolchain-123/usr/local
      $ rustc +123 -V
      rustc 1.64.0-dev
    • the "distro experience" is basically DESTDIR=pkg-tmp-dir make install which should pretty much result in DESTDIR=pkg-tmp-dir ./x.py install --stage 2 (and then pkg-tmp-dir's contents go into a deb/rpm/etc.)
  2. produce local builds that are comparable to nightly
    • this can be trickier but sometimes necessary (e.g. if benchmarking against many nightlies)
    • a distro would use ./configure tho the user profile would also work
      • (if you have a config.toml don't also run ./configure, I'm more talking about fresh workdirs)
    • optional: ./configure --release-channel=nightly / [rust] channel = "nightly" in config.toml
      • it can affect some things, but by the looks of it, only stable/beta are significantly different, nightly only seems to differ from dev (the default) by enabling git version info
    • sadly it looks like those options disable LLVM CI, so you'd end up with a whole local build of LLVM, so it might be better to stick with the compiler profile config.toml (likely the and just disable debug-assertions (sadly incremental could also affect performance)
    • overall this is all a nuisance and prevents using RUSTC_LOG, so you should probably go with 1. and just liberally make toolchain copies/backups

@LegionMammal978
Copy link
Contributor

Could this be related to #95402? I worked a while back on minimizing it, but I wasn't able to interpret the tracing logs. To repeat my example code:

pub trait Foo: Circular {}

pub trait Circular {
    type Unit;
}

impl<'a> Foo for &'a () {}

impl<'a> Circular for &'a ()
where
    &'a (): Circular<Unit = ()>,
{
    type Unit = ();
}

@eddyb
Copy link
Member Author

eddyb commented Jul 20, 2022

@LegionMammal978 I don't think so, you're not using for<...> and don't have amplification through traversing a nesting like X<X<X<X<...>>>>, the only outcomes you can have there is it finishing very fast or never.

@Noratrieb Noratrieb added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-types Relevant to the types team, which will review and decide on the PR/issue. labels Apr 5, 2023
@fmease fmease added the A-higher-ranked Area: Higher-ranked things (e.g., lifetimes, types, trait bounds aka HRTBs) label Sep 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-associated-items Area: Associated items (types, constants & functions) A-higher-ranked Area: Higher-ranked things (e.g., lifetimes, types, trait bounds aka HRTBs) A-trait-system Area: Trait system I-compiletime Issue: Problems and improvements with respect to compile times. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-types Relevant to the types team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

7 participants