Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

incr.comp.: Fix Span hashing in various ways #46301

Conversation

michaelwoerister
Copy link
Member

This PR contains two sets of changes that warrant some discussion:

  1. The first commit makes the compiler persist macro expansion contexts and restores them during deserialization. The latter is done by directly allocating the expansion context without trying to restore the parent field in MarkData or the prev_ctxt field in SyntaxContextData. I think this is fine for now since those fields are not used after HIR lowering and incremental compilation only kicks in after that point. But I'd like some feedback from @jseyfried, @nrc, and others who are dealing with macros. Would you consider this a hack? Should I use some kind of poison value for parent and prev_ctxt so we can do runtime checks?

  2. The second commit makes the compiler store spans as (file, line, column, length) in the incr. comp. cache instead of as byte offsets into the previous FileMap. It turned out that the previous approach silently breaks when loading something from the cache that is defined in a changed source file. The downside (or, depending on your point of view, upside) of the new approach is that it ICEs on corrupt span values; whereas the old approach would just pass them on, maybe making them a little more corrupted in the process :)
    Since the compiler seems to produce tons of invalid span values, we have to deal with them somehow. For now I chose to solve this by detecting invalid span values during serializations and explicitly serialize those as a special value. During deserialization these values get turned into DUMMY_SP. In order for that not to break -Zincremental-verify-ich, I also adapted the HashStable implementation to behave semantically equivalent: All invalid values, including DUMMY_SP produce the same hash value distinct from all hashes a valid span would produce. I think this is a valid approach under the assumption that invalid spans are never legitimately used to produce a valid result.

r? @nikomatsakis
cc @rust-lang/compiler

The previous method ran into problems because ICH would treat Spans
as (file,line,col) but the cache contained byte offsets and its
possible for the latter to change while the former stayed stable.
@carols10cents carols10cents added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Nov 27, 2017
Copy link
Contributor

@nikomatsakis nikomatsakis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, so this code seems ok to me, but I feel like I may not be the right reviewer. I could try to dig more deeply, but ultimately I don't have a strong insight into whether these changes will violate some sort of invariant.

I suppose that either way they represent progress over the status quo and hence I am in favor of landing.

One question: I didn't see any form of new tests. Is there a reason this is hard to write targeted tests for that show how the new system is better?

std_hash::Hash::hash(&loc2.1, hasher);
std_hash::Hash::hash(&loc2.2, hasher);
if span.hi < span.lo {
return std_hash::Hash::hash(&TAG_INVALID_SPAN, hasher);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why we generate so many invalid spans... how confident are you that these don't show up in diagnostics etc? I am not that confident. =) Are there asserts testing that hypothesis?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What gives me some confidence is that the output would be unreadable. That is, such spans span multiple file so displaying one of them in diagnostics would show two partial files and a random number of unrelated files that happen to be located in between the two initial files in the codemap. I haven't seen something like that yet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dimly recall adding some asserts of this kind in the code that generates snippets and things, though I can't find that now.

@michaelwoerister
Copy link
Member Author

One question: I didn't see any form of new tests. Is there a reason this is hard to write targeted tests for that show how the new system is better?

Right, hm. So things only start to break once you actually cache many things that contain spans -- which is only done on a local branch of mine. So the existing tests will cover this but in the current state of the compiler it might be hard to exercise the relevant code paths.

@michaelwoerister
Copy link
Member Author

It would be great if we could make some progress here. Maybe @nrc could have a quick look, especially at the first commit (b28e036)?

@jseyfried
Copy link
Contributor

jseyfried commented Nov 28, 2017

@michaelwoerister

I think this is fine for now since those fields are not used after HIR lowering and incremental compilation only kicks in after that point.

Those fields are used after lowering to HIR since fields, methods, and everything else that gets resolved after lowering is hygienic with macros 2.0. For example,

mod foo {
    struct S { x: i32 }
    pub macro m() {
        // Due to hygiene, the following always resolves, and is never a privacy error, wherever `m!()` is invoked.
        let s: S = S { x: 0 };
        s.x;
    }
}
// ...
fn f() { foo::m!(); }

In this example, we need the hygiene info for the two x's to be able to privacy-check the expansion correctly.

That being said, we (I) still need to implement to serializing hygiene info into the crate metadata anyway to support nested macro definitions across crate boundaries (i.e. a macro that defines a macro, the latter which gets used from another crate, see the limitation section of #40847), and I don't want to block you on this. I would be OK landing this as is, which would force me to do this :)

@michaelwoerister
Copy link
Member Author

Thanks a lot for taking a look, @jseyfried!
What kind of errors would we potentially see due to the changes in this PR? Spurious resolution errors in things expanded from macros?

I'd be OK with landing too since:

  • I've run rather large test suites with a compiler implementing these changes, without running into errors, and
  • there's still a little bit of time before this will be used by default in an extensive way, so @jseyfried has a window of properly implementing this cross-crate functionality.

@bors
Copy link
Contributor

bors commented Nov 30, 2017

☔ The latest upstream changes (presumably #46299) made this pull request unmergeable. Please resolve the merge conflicts.

@nikomatsakis
Copy link
Contributor

OK. r=me

@kennytm kennytm added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 30, 2017
@jseyfried
Copy link
Contributor

@michaelwoerister
I believe it could regress hygiene 2.0 privacy in limited instances. However, IIRC names from the cache are at least as unhygienic already, so this change should make anything much worse.

@michaelwoerister
Copy link
Member Author

Thanks everyone! I'm closing this in favor of #46338 which contains the commits from this PR.

bors added a commit that referenced this pull request Dec 1, 2017
incr.comp.: Load cached diagnostics lazily and allow more things in the cache.

This PR implements makes two changes:
1. Diagnostics are loaded lazily from the incr. comp. cache now. This turned out to be necessary for correctness because diagnostics contain `Span` values and deserializing those requires that the source file they point to is still around in the current compilation session. Obviously this isn't always the case. Loading them lazily allows for never touching diagnostics that are not valid anymore.
2. The compiler can now deal with there being no cache entry for a given query invocation. Before, all query results of a cacheable query were always expected to be present in the cache. Now, the compiler can fall back to re-computing the result if there is no cache entry found. This allows for caching things that we cannot force from dep-node (like the `symbol_name` query). In such a case we'll just have a "best effort" caching strategy.

~~This PR is based on #46301 (=first 2 commits), so please don't merge until that has landed. The rest of the commits are ready for review though.~~

r? @nikomatsakis
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants