Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

incr.comp.: Maybe optimize case of dependency-less anonymous queries. #45408

Closed
michaelwoerister opened this issue Oct 20, 2017 · 8 comments
Closed
Labels
A-incr-comp Area: Incremental compilation C-cleanup Category: PRs that clean code up or issues documenting cleanup. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@michaelwoerister
Copy link
Member

Some kinds of queries can end up with no dependencies. That is a valid case if their computation solely depends on their query key and not on anything in the environment. erase_regions_ty is such a case.

We could optimize this case by not allocating a DepNode for such query invocations, which in turn would also save registering reads of that DepNode.

@nikomatsakis, should we do this? If so, I could write some mentoring instructions.

@michaelwoerister michaelwoerister added the A-incr-comp Area: Incremental compilation label Oct 20, 2017
@TimNN TimNN added the C-cleanup Category: PRs that clean code up or issues documenting cleanup. label Oct 22, 2017
@michaelwoerister michaelwoerister changed the title incr.comp.: Maybe optimize case of dependency-less non-input queries. incr.comp.: Maybe optimize case of dependency-less anonymous queries. Oct 23, 2017
@michaelwoerister
Copy link
Member Author

The current plan is to implement this optimization via not instantiating anonymous nodes at all. Instead their edges should be duplicated to readers.

@nikomatsakis
Copy link
Contributor

I'm a fan of not instantiating anonymous nodes at all. I think that simplifies the overall model and may be a perf win to boot. Seems good.

@cjgillot
Copy link
Contributor

Is this still relevant? If yes, could you explain what you mean by "not instantiating anonymous nodes"?

@nikomatsakis
Copy link
Contributor

I think it's still relevant, but I'd have to dig more into the code to be sure. An "anonymous node" is a node in the dependency graph that doesn't correspond to a query. They primarily occur in trait selection, if I recall.

@wesleywiser
Copy link
Member

The query system has changed quite a bit since the last time I looked but I will try to leave some notes in case this is helpful:

@jonas-schievink jonas-schievink added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Sep 22, 2020
@tgnottingham
Copy link
Contributor

I'm far from an expert in this area, but I wanted to note some potential wrinkles.

As I understand it, anonymous nodes provide two things, which we probably want to preserve:

  1. They enable, or at least interact with (sorry, hazy on the details) caching the querys' results in a single compilation session. If we make multiple calls to the same anonymous query in a single session, we don't have to recompute the result.

    I'm not sure what has to change to preserve this behavior if we remove the anonymous nodes. The caching system does have some dependency on DepNodeIndex, which wouldn't exist for an anonymous query if it wasn't allocated.

  2. They track dependencies. Suppose query A depends on anonymous query B, which depends on queries C and D. If we remove anonymous nodes, we need to make A depend on C and D directly, or we lose that dependency info.

    I'm not sure if this is trivial to handle. As we execute A, we build up its dependency list. We do that by adding the dependency's DepNodeIndex to A's dependency list. But if B isn't ever allocated, it doesn't have an index, so we're already in trouble. We really want to depend on C and D, anyway, so maybe we can get around it. But where are we getting that list of C and D from if it isn't stored somewhere? And if we're storing it somewhere, how much are we gaining by not just storing it in a node for the anonymous query B? There may be a good answer, it's just not clear to me.

Finally, I want to note that this may not actually be an optimization. If we copy B's dependencies into things that depend on B, this could actually result in more storage being used, depending on the queries involved. Multiple queries can depend on B. If B depends on 10 queries, now we've replaced each dependency on B with 10 dependencies. In a way, anonymous nodes are an optimization for this case. They deduplicate a set of common edges.

@michaelwoerister
Copy link
Member Author

@tgnottingham's analysis is pretty spot on, afaict. To give some context (which is unfortunately lacking in the issue description):

Anonymous dep-nodes were conceived in order to represent things in the dep-graph for which we don't have a query key. In particular we used them for things cached internally in the trait selection system. Their main purpose is to enable dependency tracking for things that don't fit in the query system -- and I think we always considered them as a kind of crutch that we wanted to get rid of as soon as possible.

Some time later we discovered that we could use them as a performance optimization for queries that don't get cached on disk, like erase_regions_ty. They are more performant because they don't need to compute the DepNode fingerprint from the query key.

This issue talks about handling dependency tracking differently: Instead of introducing an anonymous node and pointing the dep-edge to it, we just duplicate the anonymous node's outgoing edges. E.g.

A <-------+  +------ C
          |  |
          |  v
         (anon) 
          |  ^
          |  |
B <-------+  +------ D    

becomes

A <---------------- C
^                   |
|                   |
+-----------+       |
            |       |
+---------- | ------+
|           |
v           |
B <---------D 

That way, we would still keep all dependencies intact. However, as @tgnottingham points out, this might not actually be an optimization. The number of edges becomes n * m instead of n + m. In the case of erase_regions_ty where n == 0, this would be beneficial, but for other cases it might make things a lot worse.

I'm wondering if we could omit allocating anon dep-nodes IFF they have zero outgoing edges? I'm not sure if that could lead to complications. Probably yes.

bors added a commit to rust-lang-ci/rust that referenced this issue Jun 2, 2021
Avoid creating anonymous nodes with zero or one dependency.

Anonymous nodes are only useful to encode dependencies, and cannot be replayed from one compilation session to another.
As such, anonymous nodes without dependency are always green.
Anonymous nodes with only one dependency are equivalent to this dependency.

cc rust-lang#45408
cc `@michaelwoerister`
@cjgillot
Copy link
Contributor

cjgillot commented Nov 7, 2021

Marking as fixed by #85337.

@cjgillot cjgillot closed this as completed Nov 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-incr-comp Area: Incremental compilation C-cleanup Category: PRs that clean code up or issues documenting cleanup. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

7 participants