Memory-map the dep-graph instead of reading it up front #95543

cjgillot · 2022-03-31T22:44:39Z

An incremental compilation session starts by reading the dep-graph from disk. This step allocates a lot of memory to store the whole graph. Most of this memory will be used at most once: a node's dependencies are only checked when before trying to force it, and its fingerprint once it has been re-computed.

This PR proposes to skip reading the fingerprints and dependencies at the beginning of the compilation session, and to only load them when necessary. To achieve that, the dep-graph file remains accessible through a memmap throughout the compilation session.

In opposition, the list of dep-nodes, along with the in-file positions of the unread fingerprints are pushed to the end of the dep-graph file. This list is also accessed through the memmap. This list is still read immediately to construct the inverse index DepNode -> SerializedDepNodeIndex, and remains available afterwards.

Along the way, the inverse index is refactored to use a hashbrown RawTable. This avoids having to store all the DepNodes twice.

rust-highfive · 2022-03-31T22:44:43Z

r? @matthewjasper

(rust-highfive has picked a reviewer for you, use r? to override)

cjgillot · 2022-03-31T22:45:52Z

@bors try @rust-timer queue

rust-timer · 2022-03-31T22:45:53Z

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

bors · 2022-03-31T22:46:02Z

⌛ Trying commit e830280c8b3c26f05f2cbef989089caaf59d09ee with merge b4d4e4dac6803298c682aa051f69e224d2b1a41b...

bors · 2022-04-01T00:23:57Z

☀️ Try build successful - checks-actions
Build commit: b4d4e4dac6803298c682aa051f69e224d2b1a41b (b4d4e4dac6803298c682aa051f69e224d2b1a41b)

rust-timer · 2022-04-01T00:23:59Z

Queued b4d4e4dac6803298c682aa051f69e224d2b1a41b with parent 0677edc, future comparison URL.

rust-timer · 2022-04-01T01:43:52Z

Finished benchmarking commit (b4d4e4dac6803298c682aa051f69e224d2b1a41b): comparison url.

Summary: This benchmark run shows 158 relevant improvements 🎉 but 8 relevant regressions 😿 to instruction counts.

Arithmetic mean of relevant regressions: 0.9%
Arithmetic mean of relevant improvements: -3.2%
Arithmetic mean of all relevant changes: -3.0%
Largest improvement in instruction counts: -8.1% on incr-unchanged builds of many-assoc-items debug
Largest regression in instruction counts: 1.3% on full builds of keccak check

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR led to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf +perf-regression

michaelwoerister · 2022-04-01T08:06:56Z

Thanks for the PR, @cjgillot. This looks very promising! I'll take a close look next week.

cjgillot · 2022-04-01T09:39:37Z

In addition to the perf report: the "dep_graph.bin" file is roughly ~30% larger with this PR. This is the same order of magnitude as #83322.

bjorn3 · 2022-04-04T17:03:51Z

compiler/rustc_serialize/src/lib.rs

+impl<'a, T> !MmapSafe for &'a T {}
+impl<'a, T> !MmapSafe for &'a mut T {}
+impl<T> !MmapSafe for *const T {}
+impl<T> !MmapSafe for *mut T {}


Maybe also impl !MmapSafe for UnsafeCell?

michaelwoerister · 2022-04-06T10:10:18Z

compiler/rustc_serialize/src/opaque.rs

+        let mask = align - 1;
+        let extra = pos & mask;
+        let padding = if extra == 0 { 0 } else { align - extra };
+        self.write_all(&ALIGN[..padding])?;


Maybe skip this call altogether when padding is 0?

I'm also wondering if it would be better to just assert that the data will be aligned, instead of adding padding. Then would just add some padding before the entire array? But I'm not sure if that's actually better.

In general, we can't assert alignment. When encoding a Fingerprint (align 8) after SerializedDepNodeIndexes (align 4), there is no reason we should expect the encoder to be aligned.

Yeah, that's a good point.

compiler/rustc_query_system/src/dep_graph/serialized.rs

michaelwoerister · 2022-04-06T12:45:57Z

Here are some thoughts:

I think in general this approach looks very good. I'm definitely in favor of merging something like this.
I'm somewhat skeptical about directly memory-mapping complex data-structures, where we have to be careful about alignment and not accidentally mapping things like pointers. I think can get the performance benefits with that since we are reading the data in question only once or twice anyway, right? I'll describe a possible alternative approach below.
If we can't get rid of the increased on-disk size, I think we should just do an MCP (but maybe we can get rid of the size).

Adapted approach to encoding fingerprint & dependencies

I think we might be able to keep on-disk size somewhat in check and avoid adding the unsafe mmap_* methods to encoders and decoders by doing the following:

Entries in the first list are still a pair of Fingerprint and dependency list, but the dependency list is encoded as a u32 for the length where the lower two bits are used to encode how many bytes an entry in the list has. That way we get a variable length encoding, but the length is decided per-list and not per entry. I think we should be able to only use 3 bytes per entry in 99% of the cases:
```
|  Fingerprint  | Length* | idx0 | idx1 | idx2 |
     16 bytes     4 bytes     1-4 bytes each
```

For decoding we do something like the following pseudo implementation:

impl<K: DepKind> SerializedDepGraph<K> {

    #[inline]
    fn decoder_at(&self, dep_node_index: SerializedDepNodeIndex) -> opaque::Decoder<'_> {
        let address = self.node_addresses[dep_node_index];
        opaque::Decoder::new(self.node_data, address)
    }

    #[inline]
    pub fn fingerprint_by_index(&self, dep_node_index: SerializedDepNodeIndex) -> Fingerprint {
        Fingerprint::decode(&mut self.decoder_at(dep_node_index))
    }

    #[inline]
    pub fn edge_targets_from(&self, source: SerializedDepNodeIndex) -> SerializedDepNodeIndexIter {
        let mut decoder = self.decoder_at(source);
        // Skip the fingerprint
        decoder.read_raw_bytes(SIZE_OF_ENCODED_FINGERPRINT);

        SerializedDepNodeIndexIter::new(decoder)
    }
}

struct SerializedDepNodeIndexIter<'a> {
    decoder: opaque::Decoder<'a>,
    entry_size: usize,
    entries_left: usize,
}

impl SerializedDepNodeIndexIter<'a> {

    fn new(decoder: opaque::Decoder<'a>,) -> Self {
        let entry_count_and_size = decoder.read_u32();
        let entries_left = entry_count_and_size >> 2;

        // obviously this would not really need to be implemented with `match` :)
        let entry_size = match entry_count_and_size & 0b11 {
            0 => 1,
            1 => 2,
            2 => 3,
            3 => 4,
        };

        Self {
            decoder,
            entry_size,
            entries_left,
        }
    }
}

impl Iterator for SerializedDepNodeIndexIter {
    type Item = SerializedDepNodeIndex;

    fn next(&mut self) -> Option<SerializedDepNodeIndex> {
        if self.entries_left == 0 {
            return None;
        }

        self.entries_left -= 1;

        return match self.entry_size  {
            // Read 1, 2, 3, or 4 bytes and convert to SerializedDepNodeIndex
        };
    }
}

cjgillot · 2022-04-09T22:33:43Z

Using odht:
@bors try @rust-timer queue

rust-timer · 2022-04-09T22:33:45Z

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

bors · 2022-04-09T22:33:52Z

⌛ Trying commit c65e12234de16a1f4a6083a4ae782601951702ea with merge c3448cac0af5bae4050e40f9c4578d364e24ab99...

bors · 2022-04-09T23:56:45Z

☀️ Try build successful - checks-actions
Build commit: c3448cac0af5bae4050e40f9c4578d364e24ab99 (c3448cac0af5bae4050e40f9c4578d364e24ab99)

rust-timer · 2022-04-09T23:56:46Z

Queued c3448cac0af5bae4050e40f9c4578d364e24ab99 with parent 8c1fb2e, future comparison URL.

bors · 2022-07-03T12:57:04Z

⌛ Trying commit c7d99a215aa7aef065d6475b64a85771abb8384d with merge 7402e377e7bb74b8a133d3b7d9b88184bb0e7039...

bors · 2022-07-03T14:30:20Z

☀️ Try build successful - checks-actions
Build commit: 7402e377e7bb74b8a133d3b7d9b88184bb0e7039 (7402e377e7bb74b8a133d3b7d9b88184bb0e7039)

rust-timer · 2022-07-03T14:30:21Z

Queued 7402e377e7bb74b8a133d3b7d9b88184bb0e7039 with parent f99f9e4, future comparison URL.

rust-timer · 2022-07-03T15:46:58Z

Finished benchmarking commit (7402e377e7bb74b8a133d3b7d9b88184bb0e7039): comparison url.

Instruction count

Primary benchmarks: 😿 relevant regressions found
Secondary benchmarks: 😿 relevant regressions found

	mean¹	max	count²
Regressions 😿 (primary)	4.0%	10.4%	108
Regressions 😿 (secondary)	3.0%	8.1%	61
Improvements 🎉 (primary)	N/A	N/A	0
Improvements 🎉 (secondary)	-2.0%	-2.0%	1
All 😿🎉 (primary)	4.0%	10.4%	108

Max RSS (memory usage)

Results

Primary benchmarks: mixed results
Secondary benchmarks: mixed results

	mean¹	max	count²
Regressions 😿 (primary)	3.6%	7.4%	32
Regressions 😿 (secondary)	4.9%	7.5%	16
Improvements 🎉 (primary)	-2.7%	-6.5%	39
Improvements 🎉 (secondary)	-3.1%	-4.8%	10
All 😿🎉 (primary)	0.1%	7.4%	71

Cycles

Results

Primary benchmarks: mixed results
Secondary benchmarks: mixed results

	mean¹	max	count²
Regressions 😿 (primary)	2.4%	2.5%	2
Regressions 😿 (secondary)	2.0%	2.6%	3
Improvements 🎉 (primary)	-2.0%	-2.0%	1
Improvements 🎉 (secondary)	-5.9%	-5.9%	1
All 😿🎉 (primary)	0.9%	2.5%	3

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf +perf-regression

the arithmetic mean of the percent change ↩ ↩² ↩³
number of relevant changes ↩ ↩² ↩³

pnkfelix · 2022-07-28T14:34:43Z

Hey y'all, we briefly looked at this PR in the T-compiler triage meeting and I'm curious what the overall trajectory is. (zulip chat https://rust-lang.zulipchat.com/#narrow/stream/238009-t-compiler.2Fmeetings/topic/.5Bweekly.5D.202022-07-28/near/291197831)

Namely, it seemed to me like there were really promising performance gains back in March. But then all the changes since then seem like they've decreased those gains, and have culminated in outright regressions now.

Any idea what's best to do here?

cjgillot · 2022-07-29T17:05:13Z

@bors try @rust-timer queue

rust-timer · 2022-07-29T17:05:15Z

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

bors · 2022-07-29T17:05:21Z

⌛ Trying commit 9b4c844 with merge b9d04ffd4dccc08a93273e66f1a77a0b48597ce6...

bors · 2022-07-29T18:42:33Z

☀️ Try build successful - checks-actions
Build commit: b9d04ffd4dccc08a93273e66f1a77a0b48597ce6 (b9d04ffd4dccc08a93273e66f1a77a0b48597ce6)

rust-timer · 2022-07-29T18:42:35Z

Queued b9d04ffd4dccc08a93273e66f1a77a0b48597ce6 with parent 9fa62f2, future comparison URL.

rust-timer · 2022-07-29T21:09:56Z

Finished benchmarking commit (b9d04ffd4dccc08a93273e66f1a77a0b48597ce6): comparison url.

Instruction count

Primary benchmarks: 😿 relevant regressions found
Secondary benchmarks: 😿 relevant regressions found

	mean¹	max	count²
Regressions 😿 (primary)	3.8%	12.6%	120
Regressions 😿 (secondary)	2.7%	9.6%	58
Improvements 🎉 (primary)	-0.2%	-0.2%	1
Improvements 🎉 (secondary)	N/A	N/A	0
All 😿🎉 (primary)	3.8%	12.6%	121

Max RSS (memory usage)

Results

Primary benchmarks: mixed results
Secondary benchmarks: mixed results

	mean¹	max	count²
Regressions 😿 (primary)	3.8%	7.4%	31
Regressions 😿 (secondary)	5.4%	7.6%	17
Improvements 🎉 (primary)	-2.7%	-6.4%	41
Improvements 🎉 (secondary)	-3.1%	-5.0%	8
All 😿🎉 (primary)	0.1%	7.4%	72

Cycles

Results

Primary benchmarks: 😿 relevant regressions found
Secondary benchmarks: mixed results

	mean¹	max	count²
Regressions 😿 (primary)	2.7%	4.3%	26
Regressions 😿 (secondary)	2.6%	3.8%	12
Improvements 🎉 (primary)	-2.2%	-2.2%	1
Improvements 🎉 (secondary)	-4.1%	-5.1%	3
All 😿🎉 (primary)	2.5%	4.3%	27

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf +perf-regression

the arithmetic mean of the percent change ↩ ↩² ↩³
number of relevant changes ↩ ↩² ↩³

cjgillot · 2022-07-30T14:00:03Z

Seems it's not worth it any more.

rust-highfive assigned matthewjasper Mar 31, 2022

rustbot added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Mar 31, 2022

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Mar 31, 2022

cjgillot assigned michaelwoerister and unassigned matthewjasper Mar 31, 2022

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 31, 2022

rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Apr 1, 2022

bjorn3 reviewed Apr 4, 2022

View reviewed changes

michaelwoerister reviewed Apr 6, 2022

View reviewed changes

compiler/rustc_query_system/src/dep_graph/serialized.rs Show resolved Hide resolved

cjgillot force-pushed the mmap-dg branch from e830280 to c65e122 Compare April 9, 2022 22:32

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Apr 9, 2022

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 3, 2022

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Jul 3, 2022

cjgillot removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 28, 2022

cjgillot added 5 commits July 29, 2022 18:52

Mmap the DepGraph instead of reading it.

27732c2

Do not store the DepNode twice.

b136f26

Remove useless bounds.

8b9e38c

Go back to LEB128 for edges.

620d16a

Decode edges using an iterator.

9b4c844

cjgillot force-pushed the mmap-dg branch from c7d99a2 to 9b4c844 Compare July 29, 2022 17:05

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 29, 2022

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Jul 29, 2022

cjgillot closed this Jul 30, 2022

apiraino removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jul 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory-map the dep-graph instead of reading it up front #95543

Memory-map the dep-graph instead of reading it up front #95543

cjgillot commented Mar 31, 2022

rust-highfive commented Mar 31, 2022

cjgillot commented Mar 31, 2022

rust-timer commented Mar 31, 2022

bors commented Mar 31, 2022

bors commented Apr 1, 2022

rust-timer commented Apr 1, 2022

rust-timer commented Apr 1, 2022

michaelwoerister commented Apr 1, 2022

cjgillot commented Apr 1, 2022

bjorn3 Apr 4, 2022

michaelwoerister Apr 6, 2022

michaelwoerister Apr 6, 2022

cjgillot Apr 6, 2022

michaelwoerister Apr 8, 2022

michaelwoerister commented Apr 6, 2022

cjgillot commented Apr 9, 2022

rust-timer commented Apr 9, 2022

bors commented Apr 9, 2022

bors commented Apr 9, 2022

rust-timer commented Apr 9, 2022

bors commented Jul 3, 2022

bors commented Jul 3, 2022

rust-timer commented Jul 3, 2022

rust-timer commented Jul 3, 2022

pnkfelix commented Jul 28, 2022

cjgillot commented Jul 29, 2022

rust-timer commented Jul 29, 2022

bors commented Jul 29, 2022

bors commented Jul 29, 2022

rust-timer commented Jul 29, 2022

rust-timer commented Jul 29, 2022

cjgillot commented Jul 30, 2022

Memory-map the dep-graph instead of reading it up front #95543

Memory-map the dep-graph instead of reading it up front #95543

Conversation

cjgillot commented Mar 31, 2022

rust-highfive commented Mar 31, 2022

cjgillot commented Mar 31, 2022

rust-timer commented Mar 31, 2022

bors commented Mar 31, 2022

bors commented Apr 1, 2022

rust-timer commented Apr 1, 2022

rust-timer commented Apr 1, 2022

michaelwoerister commented Apr 1, 2022

cjgillot commented Apr 1, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

michaelwoerister commented Apr 6, 2022

Adapted approach to encoding fingerprint & dependencies

cjgillot commented Apr 9, 2022

rust-timer commented Apr 9, 2022

bors commented Apr 9, 2022

bors commented Apr 9, 2022

rust-timer commented Apr 9, 2022

bors commented Jul 3, 2022

bors commented Jul 3, 2022

rust-timer commented Jul 3, 2022

rust-timer commented Jul 3, 2022

Footnotes

pnkfelix commented Jul 28, 2022

cjgillot commented Jul 29, 2022

rust-timer commented Jul 29, 2022

bors commented Jul 29, 2022

bors commented Jul 29, 2022

rust-timer commented Jul 29, 2022

rust-timer commented Jul 29, 2022

Footnotes

cjgillot commented Jul 30, 2022