incr.comp.: Use ICH-based DepNodes in order to make them PODs #42294

michaelwoerister · 2017-05-29T14:16:58Z

The current implementation of DepNode has a DefId or something more complex as its identifier. This has a few downsides:

DefIds are bound to a specific compilation session. When the dependency graph of a previous session is loaded, all nodes in it have to be "re-traced", which means that their DefPath has to be mapped to a DefId in the current compilation session. The has some cost at runtime and complicates the implementation (because we have to generate and store the DefPathDirectory).
Re-tracing a DepNode can also fail because the item it refers to does not exist anymore in the current compilation session. This is not a conceptual problem but it is an annoyance that DepNodes from a previous dep-graph cannot easily be represented. At the moment this is solved by making DepNode generic over the type of identifier it uses (either DefId or DefPath) but that is a burden.
Independently of DefId, some DepNode variants are expensive to copy because they contain vectors.

This is a proposal to use a simplified design that uses a generic, globally unique fingerprint as the disambiguator for DepNode:

enum DepNodeKind {
    Krate,
    Hir,
    HirBody,
    MetaData,
    ...
}

struct DepNode {
    kind: DepNodeKind,
    // 128+ bit hash value identifying the DepNode
    fingerprint: Fingerprint,
}

Since this is using a stable hash (like the ICH), a DepNode like this is valid across compilation sessions and does not need any re-tracing. It's also "plain old data" in the sense that it contains no pointers or anything else that is context dependent. As a consequence, it can be easily copied, shared between threads, and memory mapped, something that is not possible with the current design.

The fingerprint-based approach has a few potential downsides but all of them can be addressed adequately, I think:

A truly stable fingerprint/hash value is not trivial to compute.
Solution: We already have the whole ICH infrastructure available which can handle anything we throw at it.
There's a runtime cost to turning a DefId or other identifiers into a fingerprint.
Solution: In the vast majority of cases, it is really just one DefId that needs to be hashed. In this case we already have the hash available (as the DefPathHash) and access it via a simple array-lookup. Additionally, all DepNode are cached in the dependency graph once they are created and can be looked up via a 32-bit DepNodeIndex if the need arises.
Using a hash value introduces the risk of hash collisions.
Solution: This is already a risk for incremental compilation and is mitigated by using a high quality hash function with low enough collision probability. Risk can be adjusted by using fingerprints with more bits.
A fingerprint is opaque while a DefId allows to reconstruct which item it points to. This is bad for debugging output.
Solution: This is can be mitigated by just constructing lookup tables for mapping a Fingerprint back to its ingredients if -Zquery-dep-graph is specified.

cc @nikomatsakis and @eddyb

The text was updated successfully, but these errors were encountered:

michaelwoerister · 2017-05-30T13:25:06Z

I'm in the process of implementing a proof of concept for this and I'm a slightly disenchanted with this approach:

While a DepNode is trivially copyable once it's constructed, the process of construction got more complicated since we now have to have the tcx available at that point. This is not a huge deal, as far as I can tell, but it highlights the fact that this approach does not make the task of "re-tracing" disappear but rather moves it from de-serialization time to construction time.
When checking imported metadata for changes, we need to be able to map from a DepNode::Metadata to the crate the metadata was imported from. In order to do this reliably, we either have to use 160-192 bit hashes instead of 128-bit ones (otherwise collision probability becomes to high) or keep an extra table that maps imported DepNode::Metadata instances to back their crate, which is kind of what we wanted to avoid in the first place with this approach.

So, I'm a bit on the fence if we should do this or not. Just getting rid of the type parameter on DepNode might be reason enough to implement this anyway.

michaelwoerister · 2017-05-31T10:39:24Z

I did some performance measurements and it looks like building a crate-graph-wide DefPathHash -> DefId table (which would solve the second point mentioned above) is slightly less expensive than re-tracing the DefPathDirectory. Both tasks take under a hundred milliseconds on my machine even for Servo's script crate and for smaller crates it's under ten milliseconds, so this should not be something to worry about performance-wise.

I'll go ahead with my proof of concept implementation.

@nikomatsakis

…akis incr.comp.: Use DefPathHash-based DepNodes in the serialized DepGraph and remove obsolete DefIdDirectory With this PR we don't store the dep-graph as a set of `DepNode<IndexIntoDefIdDirectory>` anymore but instead as a set of `DepNode<DefPathHash>`. Since a `DefPathHash` is a global identifier that is valid across compilation sessions, we don't need the `DefIdDirectory` anymore. Since a `DepNode<DefPathHash>` is bigger than a `DepNode<IndexIntoDefIdDirectory>` and our on-disk encoding of the dep-graph is inefficient, this PR will probably increase the amount of space the dep-graph takes up on disk. I'm in the process of gathering some performance data. The changes in here are a step towards implementing ICH-based `DepNodes` (#42294). r? @nikomatsakis

@eddyb

…tsakis Some preparatory refactorings for hash-based DepNodes This PR collects some changes that turn out to be necessary for implementing `DepNodes` based on stable hashes (see #42294). The commits are self-contained and mostly straightforward. The most interesting change here is the introduction of `DefIndices` for things that are not part of the AST: Some pieces of crate metadata now have a `DefIndex` too. cc @eddyb r? @nikomatsakis

@nikomatsakis

incr.comp.: Make DepNode `Copy` and valid across compilation sessions This PR moves `DepNode` to a representation that does not need retracing and thus simplifies comparing dep-graphs from different compilation sessions. The code also gets a lot simpler in many places, since we don't need the generic parameter on `DepNode` anymore. See #42294 for details. ~~NOTE: Only the last commit of this is new, the rest is already reviewed in #42504 This PR is almost done but there are some things I still want to do: - [x] Add some module-level documentation to `dep_node.rs`, explaining especially what the `define_dep_nodes!()` macro is about. - [x] Do another pass over the dep-graph loading logic. I suspect that we can get rid of building the `edges` map and also use arrays instead of hash maps in some places. cc @rust-lang/compiler r? @nikomatsakis

michaelwoerister · 2017-06-19T11:59:13Z

Implemented by #42537.

michaelwoerister added the A-incr-comp Area: Incremental compilation label May 29, 2017

michaelwoerister mentioned this issue May 29, 2017

Tracking issue for "Red/Green" Dependency Tracking #42293

Closed

17 tasks

michaelwoerister mentioned this issue May 31, 2017

incr.comp.: Use DefPathHash-based DepNodes in the serialized DepGraph and remove obsolete DefIdDirectory #42332

Merged

This was referenced Jun 7, 2017

Some preparatory refactorings for hash-based DepNodes #42504

Merged

incr.comp.: Make DepNode Copy and valid across compilation sessions #42537

Merged

michaelwoerister closed this as completed Jun 19, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

incr.comp.: Use ICH-based DepNodes in order to make them PODs #42294

incr.comp.: Use ICH-based DepNodes in order to make them PODs #42294

michaelwoerister commented May 29, 2017

michaelwoerister commented May 30, 2017

michaelwoerister commented May 31, 2017

michaelwoerister commented Jun 19, 2017

incr.comp.: Use ICH-based DepNodes in order to make them PODs #42294

incr.comp.: Use ICH-based DepNodes in order to make them PODs #42294

Comments

michaelwoerister commented May 29, 2017

michaelwoerister commented May 30, 2017

michaelwoerister commented May 31, 2017

michaelwoerister commented Jun 19, 2017