-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] figure out story around linear queries #41710
Comments
IMO not requiring declaring inputs up-front is a plus, that's why I didn't even mention the scheme you came up with but rather dismissed it because of the constraints I had. That is, I wanted a system that could maintain the current flexibility in that queries are never declared (not even when setting up the providers, although I like that it is possible), but rather arise from imperative code. Perhaps more than correctness, I was focused on performance and somewhat on ergonomics, assuming we could use no interior mutability in the values themselves (i.e. no The immediate result is the basic linear query idea, where you can only invoke the query once, and you own the value instead of a clone being made. But then if several other queries want to access the same value, you have no way to share it. You can tuple up those other queries into a single query, but not if you only want some of the results to be linear too. So the constraints more or less involve a bunch of related analyses/transformations, where both the inputs and the outputs contain values you don't want to duplicate, and also values you do want to duplicate (e.g. MIR const-qualification results). All of that said, the only element of correctness was reducing the "trusted abstraction base", specifically the use of interior mutability outside of the implementation of the query engine. |
I'm still not sure if we need or should have linear queries at all. They complicate things, like caching things to disk in the background (e.g. unoptimized MIR is queued for serialization in another thread, then another query steals it) and I'm not sure they are worth the trouble. As far as I can tell, the MIR transformation pipeline is the only case this is ever going to be used. Can't the optimization pass just make it's own copy of the unoptimized MIR in the beginning and then modify that in place? |
I don't think background caching is realistic, because of |
What maps are you referring to here? I would imagine the cache to be mainly |
@michaelwoerister I mean that serialization might require concurrent map lookup for some leaves. FWIW, you don't even need |
@eddyb Yeah, that's a good point. I kind of assumed, that anything reachable would already be interned and not need any lookups. That might not be true... |
PR #41625, in an effort to rationalize the MIR pipeline, introduced queries that yield a
Steal
type. Cribbing from a rather long and overwrought comment that I wrote, the idea is to model linear queries as ordinary queries that return a&'tcx Steal<D>
, which supports two operations ("borrow" and "steal"). Once a value is stolen, any further operation is abug!()
. This means that all parts of the compiler which invoke a query with a steal result must be aware of any other parts that might use that same query and coordinate with them. (This is a performance optimization: that is, we could make "stolen" queries simply regenerate the resulting value, and everything would work, but we don't want to be regenerating the results of these queries multiple times, so instead we make this error abug!
.)In all the MIR cases, at least, there really isn't much to intercoordinate. For the most part, each MIR optimization is just used by one other query: the next optimization in the sequence. (In the current PR, each optimization pass has its own query, but if we convert to just having a query-per-suite, then this would apply at the level of suites.) Other parts of the compiler should use one of the queries (e.g.,
optimized_mir()
) that do not return aSteal
result.However, there are a few exceptions. One example is const qualification. This is a lightweight scan of the contents of a
const
item, and it needs to take place relatively early in the pipeline (before optimizations are applied and so forth). The way I handled this is (a) to have itborrow()
from the steal and then (b) to useforce
before we steal the relevant MIR, so that we know that it has executed. If we forgot to add theforce()
call, then the result would be dependent on the order in which queries were issued; that is, if one requestedconst-qualif(D)
first, it would execute successfully, but if one requestedoptimized_mir(D)
first, thenconst-qualif(D)
afterwards, you would get abug!
becauseconst-qualif
would be trying to read stolen data. (However, if compilation succeeds, we are always assured of a consistent result.)The complete set of examples that I am aware of where we have a linear query that also needs to be accessed is as follows:
const fn
, need to save a copy of the IR before validation and optimization that can be used by miri (indeed, optimization may want to execute miri to evaluate constant expressions, perhaps even some appearing within theconst fn
itself).The current solution is simple and may indeed be "good enough" -- it depends a bit (in my mind) on how often we introduce linear queries and how many things depend on them. However, there have been some alternative proposals for how to handle it. The purpose of this issue is to describe those proposals and try to settle on which is best.
Rather than continuing with concrete examples, in the descriptions that follow, I will abstract out the pattern as follows. Consider a "stealable" query A that needs to be read by query B (e.g., borrow checking) but consumed by query C (e.g., optimization).
For each system, I will describe how it works, and then discuss some of the tradeoffs.
Current system: "stealable queries"
How it works: As described in the run-up, you have a query that yields a
&'tcx Steal<T>
. This can be either borrowed or stolen. An attempt to borrow after a steal (or a second attempt to steal) will result in abug!
call.How you can mess it up: You can forget to call
force()
on a potential consumer when stealing. In the case of our example, this would mean that query C fails to force query B.What happens if you mess it up: Compilation may succeed with some query execution orders (e.g., if
B
executes first) but fail with abug!
in others (e.g., ifC
execues first).Other downsides: entanglement It's kind of a drag that the stealer (C) must be aware of the reader (B). It
Proposed alternative: "linear queries" with "tuple providers"
How it works: @eddyb proposed an alternative in which linear queries can only be used once. In this scheme, once you execute a query once, any further attempt to request the same query is a
bug!
. So we can't have a query A that is read by B and consumed by C, as I had, since both B and C must consume, and that would violate linearity. To support this scenario, then, eddyb wanted to introduce "tuple" providers (name is mine). Basically, when setting up theProviders
struct, I can use a bit of magic to knit together a function that processes the result fromA
and produces a(B, C)
tuple. This magic would then divide the tuple and store theB
into theB
map and store theC
into theC
map in the right spots. This might look something like this:How you can mess it up: You could fail to use the tuple system, and instead write queries B and C independently.
What happens if you mess it up: In any execution in which both queries B and C are used, no matter which order they are used in, compilation will ICE. This is thus mildly more robust than the stealable scheme.
Other downsides: entanglement. It's kind of a drag that we must produce B and C from one function.
Other downsides: imprecise dependencies. It's possible that producing B and C use different sets of inputs. But since there is one function that is doing both bits of work, we won't be able to tell them apart in the query system, and hence they will wind up with imprecise dependencies.
Other advantages: splitting. This technique allows you to take the result of query A and split it into pieces without cloning. This is not possible with the other alternatives.
Proposed alternative: "linear queries" with "mapping providers"
In my original comment, I described a variant of of tuple providers. I want to describe a variant here of that idea that I've been thinking since. The idea is to make linear queries part of the query framework, as in the "tuple providers" proposal. However, when you define a linear query, you don't use it in the same way as non-linear queries. That is, if you have a linear query
A
, you can't dotcx.A(def_id)
as I showed earlier. Instead, when creating the provider struct, we use some magic functions to "connect" derived queries to a linear query. This connection can be in one of two modes ("read" or "consume"). For example:(These
read_a()
andto_produce_b_with()
methods would be auto-generatedby the macro.)
The basic idea here is that we are telling the providers struct two things:
This allows the framework to do more intelligent routing of results. In particular, if there is more than one consume query for a linear query, we can detect that at provider creation time -- i.e., before we even process any input. Moreover, we can ensure that before the "consume" query for a given linear value runs, we force all of the "read queries". (In this case, that means that if you request C, we will force the query B, so that it can read before C executes.) (This last part is why the provider functions have to know what query you are producing.)
How you can mess it up: You cannot mess this up, since you can't access linear queries using the normal methods. Even if you tried to register two "consume" queries for the same linear value, that would fail when creating the provider struct (so such a PR could never land, for example, and no test could ever pass).
What happens if you mess it up: You cannot, unless I missed something. =)
Other advantages: no entanglement. This is the only proposal that avoids the need for queries B and C to know about one another. Simply by registering the queries, you help the framework sort things out and ensure that the appropriate
force
calls happen.Other proposals?
I'll try to update this header if more ideas come up.
cc @rust-lang/compiler @matthewhammer
The text was updated successfully, but these errors were encountered: