-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
incr.comp.: Fix Span hashing in various ways #46301
incr.comp.: Fix Span hashing in various ways #46301
Conversation
The previous method ran into problems because ICH would treat Spans as (file,line,col) but the cache contained byte offsets and its possible for the latter to change while the former stayed stable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, so this code seems ok to me, but I feel like I may not be the right reviewer. I could try to dig more deeply, but ultimately I don't have a strong insight into whether these changes will violate some sort of invariant.
I suppose that either way they represent progress over the status quo and hence I am in favor of landing.
One question: I didn't see any form of new tests. Is there a reason this is hard to write targeted tests for that show how the new system is better?
std_hash::Hash::hash(&loc2.1, hasher); | ||
std_hash::Hash::hash(&loc2.2, hasher); | ||
if span.hi < span.lo { | ||
return std_hash::Hash::hash(&TAG_INVALID_SPAN, hasher); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why we generate so many invalid spans... how confident are you that these don't show up in diagnostics etc? I am not that confident. =) Are there asserts testing that hypothesis?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What gives me some confidence is that the output would be unreadable. That is, such spans span multiple file so displaying one of them in diagnostics would show two partial files and a random number of unrelated files that happen to be located in between the two initial files in the codemap. I haven't seen something like that yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dimly recall adding some asserts of this kind in the code that generates snippets and things, though I can't find that now.
Right, hm. So things only start to break once you actually cache many things that contain spans -- which is only done on a local branch of mine. So the existing tests will cover this but in the current state of the compiler it might be hard to exercise the relevant code paths. |
Those fields are used after lowering to HIR since fields, methods, and everything else that gets resolved after lowering is hygienic with macros 2.0. For example, mod foo {
struct S { x: i32 }
pub macro m() {
// Due to hygiene, the following always resolves, and is never a privacy error, wherever `m!()` is invoked.
let s: S = S { x: 0 };
s.x;
}
}
// ...
fn f() { foo::m!(); } In this example, we need the hygiene info for the two That being said, we (I) still need to implement to serializing hygiene info into the crate metadata anyway to support nested macro definitions across crate boundaries (i.e. a macro that defines a macro, the latter which gets used from another crate, see the limitation section of #40847), and I don't want to block you on this. I would be OK landing this as is, which would force me to do this :) |
Thanks a lot for taking a look, @jseyfried! I'd be OK with landing too since:
|
☔ The latest upstream changes (presumably #46299) made this pull request unmergeable. Please resolve the merge conflicts. |
OK. r=me |
@michaelwoerister |
Thanks everyone! I'm closing this in favor of #46338 which contains the commits from this PR. |
incr.comp.: Load cached diagnostics lazily and allow more things in the cache. This PR implements makes two changes: 1. Diagnostics are loaded lazily from the incr. comp. cache now. This turned out to be necessary for correctness because diagnostics contain `Span` values and deserializing those requires that the source file they point to is still around in the current compilation session. Obviously this isn't always the case. Loading them lazily allows for never touching diagnostics that are not valid anymore. 2. The compiler can now deal with there being no cache entry for a given query invocation. Before, all query results of a cacheable query were always expected to be present in the cache. Now, the compiler can fall back to re-computing the result if there is no cache entry found. This allows for caching things that we cannot force from dep-node (like the `symbol_name` query). In such a case we'll just have a "best effort" caching strategy. ~~This PR is based on #46301 (=first 2 commits), so please don't merge until that has landed. The rest of the commits are ready for review though.~~ r? @nikomatsakis
This PR contains two sets of changes that warrant some discussion:
The first commit makes the compiler persist macro expansion contexts and restores them during deserialization. The latter is done by directly allocating the expansion context without trying to restore the
parent
field inMarkData
or theprev_ctxt
field inSyntaxContextData
. I think this is fine for now since those fields are not used after HIR lowering and incremental compilation only kicks in after that point. But I'd like some feedback from @jseyfried, @nrc, and others who are dealing with macros. Would you consider this a hack? Should I use some kind of poison value forparent
andprev_ctxt
so we can do runtime checks?The second commit makes the compiler store spans as
(file, line, column, length)
in the incr. comp. cache instead of as byte offsets into the previousFileMap
. It turned out that the previous approach silently breaks when loading something from the cache that is defined in a changed source file. The downside (or, depending on your point of view, upside) of the new approach is that it ICEs on corrupt span values; whereas the old approach would just pass them on, maybe making them a little more corrupted in the process:)
Since the compiler seems to produce tons of invalid span values, we have to deal with them somehow. For now I chose to solve this by detecting invalid span values during serializations and explicitly serialize those as a special value. During deserialization these values get turned into
DUMMY_SP
. In order for that not to break-Zincremental-verify-ich
, I also adapted theHashStable
implementation to behave semantically equivalent: All invalid values, includingDUMMY_SP
produce the same hash value distinct from all hashes a valid span would produce. I think this is a valid approach under the assumption that invalid spans are never legitimately used to produce a valid result.r? @nikomatsakis
cc @rust-lang/compiler