Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory-map the dep-graph instead of reading it up front #95543

Closed
wants to merge 5 commits into from

Conversation

cjgillot
Copy link
Contributor

An incremental compilation session starts by reading the dep-graph from disk. This step allocates a lot of memory to store the whole graph. Most of this memory will be used at most once: a node's dependencies are only checked when before trying to force it, and its fingerprint once it has been re-computed.

This PR proposes to skip reading the fingerprints and dependencies at the beginning of the compilation session, and to only load them when necessary. To achieve that, the dep-graph file remains accessible through a memmap throughout the compilation session.

In opposition, the list of dep-nodes, along with the in-file positions of the unread fingerprints are pushed to the end of the dep-graph file. This list is also accessed through the memmap. This list is still read immediately to construct the inverse index DepNode -> SerializedDepNodeIndex, and remains available afterwards.

Along the way, the inverse index is refactored to use a hashbrown RawTable. This avoids having to store all the DepNodes twice.

@rustbot rustbot added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Mar 31, 2022
@rust-highfive
Copy link
Collaborator

r? @matthewjasper

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Mar 31, 2022
@cjgillot
Copy link
Contributor Author

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 31, 2022
@bors
Copy link
Contributor

bors commented Mar 31, 2022

⌛ Trying commit e830280c8b3c26f05f2cbef989089caaf59d09ee with merge b4d4e4dac6803298c682aa051f69e224d2b1a41b...

@bors
Copy link
Contributor

bors commented Apr 1, 2022

☀️ Try build successful - checks-actions
Build commit: b4d4e4dac6803298c682aa051f69e224d2b1a41b (b4d4e4dac6803298c682aa051f69e224d2b1a41b)

@rust-timer
Copy link
Collaborator

Queued b4d4e4dac6803298c682aa051f69e224d2b1a41b with parent 0677edc, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (b4d4e4dac6803298c682aa051f69e224d2b1a41b): comparison url.

Summary: This benchmark run shows 158 relevant improvements 🎉 but 8 relevant regressions 😿 to instruction counts.

  • Arithmetic mean of relevant regressions: 0.9%
  • Arithmetic mean of relevant improvements: -3.2%
  • Arithmetic mean of all relevant changes: -3.0%
  • Largest improvement in instruction counts: -8.1% on incr-unchanged builds of many-assoc-items debug
  • Largest regression in instruction counts: 1.3% on full builds of keccak check

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR led to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf +perf-regression

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Apr 1, 2022
@michaelwoerister
Copy link
Member

Thanks for the PR, @cjgillot. This looks very promising! I'll take a close look next week.

@cjgillot
Copy link
Contributor Author

cjgillot commented Apr 1, 2022

In addition to the perf report: the "dep_graph.bin" file is roughly ~30% larger with this PR. This is the same order of magnitude as #83322.

impl<'a, T> !MmapSafe for &'a T {}
impl<'a, T> !MmapSafe for &'a mut T {}
impl<T> !MmapSafe for *const T {}
impl<T> !MmapSafe for *mut T {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also impl !MmapSafe for UnsafeCell?

let mask = align - 1;
let extra = pos & mask;
let padding = if extra == 0 { 0 } else { align - extra };
self.write_all(&ALIGN[..padding])?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe skip this call altogether when padding is 0?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also wondering if it would be better to just assert that the data will be aligned, instead of adding padding. Then would just add some padding before the entire array? But I'm not sure if that's actually better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, we can't assert alignment. When encoding a Fingerprint (align 8) after SerializedDepNodeIndexes (align 4), there is no reason we should expect the encoder to be aligned.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's a good point.

@michaelwoerister
Copy link
Member

Here are some thoughts:

  • I think in general this approach looks very good. I'm definitely in favor of merging something like this.
  • I'm somewhat skeptical about directly memory-mapping complex data-structures, where we have to be careful about alignment and not accidentally mapping things like pointers. I think can get the performance benefits with that since we are reading the data in question only once or twice anyway, right? I'll describe a possible alternative approach below.
  • If we can't get rid of the increased on-disk size, I think we should just do an MCP (but maybe we can get rid of the size).

Adapted approach to encoding fingerprint & dependencies

I think we might be able to keep on-disk size somewhat in check and avoid adding the unsafe mmap_* methods to encoders and decoders by doing the following:

  • Entries in the first list are still a pair of Fingerprint and dependency list, but the dependency list is encoded as a u32 for the length where the lower two bits are used to encode how many bytes an entry in the list has. That way we get a variable length encoding, but the length is decided per-list and not per entry. I think we should be able to only use 3 bytes per entry in 99% of the cases:

    |  Fingerprint  | Length* | idx0 | idx1 | idx2 |
         16 bytes     4 bytes     1-4 bytes each
    
  • For decoding we do something like the following pseudo implementation:

    impl<K: DepKind> SerializedDepGraph<K> {
    
        #[inline]
        fn decoder_at(&self, dep_node_index: SerializedDepNodeIndex) -> opaque::Decoder<'_> {
            let address = self.node_addresses[dep_node_index];
            opaque::Decoder::new(self.node_data, address)
        }
    
        #[inline]
        pub fn fingerprint_by_index(&self, dep_node_index: SerializedDepNodeIndex) -> Fingerprint {
            Fingerprint::decode(&mut self.decoder_at(dep_node_index))
        }
    
        #[inline]
        pub fn edge_targets_from(&self, source: SerializedDepNodeIndex) -> SerializedDepNodeIndexIter {
            let mut decoder = self.decoder_at(source);
            // Skip the fingerprint
            decoder.read_raw_bytes(SIZE_OF_ENCODED_FINGERPRINT);
    
            SerializedDepNodeIndexIter::new(decoder)
        }
    }
    
    struct SerializedDepNodeIndexIter<'a> {
        decoder: opaque::Decoder<'a>,
        entry_size: usize,
        entries_left: usize,
    }
    
    impl SerializedDepNodeIndexIter<'a> {
    
        fn new(decoder: opaque::Decoder<'a>,) -> Self {
            let entry_count_and_size = decoder.read_u32();
            let entries_left = entry_count_and_size >> 2;
    
            // obviously this would not really need to be implemented with `match` :)
            let entry_size = match entry_count_and_size & 0b11 {
                0 => 1,
                1 => 2,
                2 => 3,
                3 => 4,
            };
    
            Self {
                decoder,
                entry_size,
                entries_left,
            }
        }
    }
    
    impl Iterator for SerializedDepNodeIndexIter {
        type Item = SerializedDepNodeIndex;
    
        fn next(&mut self) -> Option<SerializedDepNodeIndex> {
            if self.entries_left == 0 {
                return None;
            }
    
            self.entries_left -= 1;
    
            return match self.entry_size  {
                // Read 1, 2, 3, or 4 bytes and convert to SerializedDepNodeIndex
            };
        }
    }

@cjgillot
Copy link
Contributor Author

cjgillot commented Apr 9, 2022

Using odht:
@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Apr 9, 2022
@bors
Copy link
Contributor

bors commented Apr 9, 2022

⌛ Trying commit c65e12234de16a1f4a6083a4ae782601951702ea with merge c3448cac0af5bae4050e40f9c4578d364e24ab99...

@bors
Copy link
Contributor

bors commented Apr 9, 2022

☀️ Try build successful - checks-actions
Build commit: c3448cac0af5bae4050e40f9c4578d364e24ab99 (c3448cac0af5bae4050e40f9c4578d364e24ab99)

@rust-timer
Copy link
Collaborator

Queued c3448cac0af5bae4050e40f9c4578d364e24ab99 with parent 8c1fb2e, future comparison URL.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 3, 2022
@bors
Copy link
Contributor

bors commented Jul 3, 2022

⌛ Trying commit c7d99a215aa7aef065d6475b64a85771abb8384d with merge 7402e377e7bb74b8a133d3b7d9b88184bb0e7039...

@bors
Copy link
Contributor

bors commented Jul 3, 2022

☀️ Try build successful - checks-actions
Build commit: 7402e377e7bb74b8a133d3b7d9b88184bb0e7039 (7402e377e7bb74b8a133d3b7d9b88184bb0e7039)

@rust-timer
Copy link
Collaborator

Queued 7402e377e7bb74b8a133d3b7d9b88184bb0e7039 with parent f99f9e4, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (7402e377e7bb74b8a133d3b7d9b88184bb0e7039): comparison url.

Instruction count

  • Primary benchmarks: 😿 relevant regressions found
  • Secondary benchmarks: 😿 relevant regressions found
mean1 max count2
Regressions 😿
(primary)
4.0% 10.4% 108
Regressions 😿
(secondary)
3.0% 8.1% 61
Improvements 🎉
(primary)
N/A N/A 0
Improvements 🎉
(secondary)
-2.0% -2.0% 1
All 😿🎉 (primary) 4.0% 10.4% 108

Max RSS (memory usage)

Results
  • Primary benchmarks: mixed results
  • Secondary benchmarks: mixed results
mean1 max count2
Regressions 😿
(primary)
3.6% 7.4% 32
Regressions 😿
(secondary)
4.9% 7.5% 16
Improvements 🎉
(primary)
-2.7% -6.5% 39
Improvements 🎉
(secondary)
-3.1% -4.8% 10
All 😿🎉 (primary) 0.1% 7.4% 71

Cycles

Results
  • Primary benchmarks: mixed results
  • Secondary benchmarks: mixed results
mean1 max count2
Regressions 😿
(primary)
2.4% 2.5% 2
Regressions 😿
(secondary)
2.0% 2.6% 3
Improvements 🎉
(primary)
-2.0% -2.0% 1
Improvements 🎉
(secondary)
-5.9% -5.9% 1
All 😿🎉 (primary) 0.9% 2.5% 3

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf +perf-regression

Footnotes

  1. the arithmetic mean of the percent change 2 3

  2. number of relevant changes 2 3

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Jul 3, 2022
@pnkfelix
Copy link
Member

Hey y'all, we briefly looked at this PR in the T-compiler triage meeting and I'm curious what the overall trajectory is. (zulip chathttps://rust-lang.zulipchat.com/#narrow/stream/238009-t-compiler.2Fmeetings/topic/.5Bweekly.5D.202022-07-28/near/291197831)

Namely, it seemed to me like there were really promising performance gains back in March. But then all the changes since then seem like they've decreased those gains, and have culminated in outright regressions now.

Any idea what's best to do here?

@cjgillot cjgillot removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 28, 2022
@cjgillot
Copy link
Contributor Author

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 29, 2022
@bors
Copy link
Contributor

bors commented Jul 29, 2022

⌛ Trying commit 9b4c844 with merge b9d04ffd4dccc08a93273e66f1a77a0b48597ce6...

@bors
Copy link
Contributor

bors commented Jul 29, 2022

☀️ Try build successful - checks-actions
Build commit: b9d04ffd4dccc08a93273e66f1a77a0b48597ce6 (b9d04ffd4dccc08a93273e66f1a77a0b48597ce6)

@rust-timer
Copy link
Collaborator

Queued b9d04ffd4dccc08a93273e66f1a77a0b48597ce6 with parent 9fa62f2, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (b9d04ffd4dccc08a93273e66f1a77a0b48597ce6): comparison url.

Instruction count

  • Primary benchmarks: 😿 relevant regressions found
  • Secondary benchmarks: 😿 relevant regressions found
mean1 max count2
Regressions 😿
(primary)
3.8% 12.6% 120
Regressions 😿
(secondary)
2.7% 9.6% 58
Improvements 🎉
(primary)
-0.2% -0.2% 1
Improvements 🎉
(secondary)
N/A N/A 0
All 😿🎉 (primary) 3.8% 12.6% 121

Max RSS (memory usage)

Results
  • Primary benchmarks: mixed results
  • Secondary benchmarks: mixed results
mean1 max count2
Regressions 😿
(primary)
3.8% 7.4% 31
Regressions 😿
(secondary)
5.4% 7.6% 17
Improvements 🎉
(primary)
-2.7% -6.4% 41
Improvements 🎉
(secondary)
-3.1% -5.0% 8
All 😿🎉 (primary) 0.1% 7.4% 72

Cycles

Results
  • Primary benchmarks: 😿 relevant regressions found
  • Secondary benchmarks: mixed results
mean1 max count2
Regressions 😿
(primary)
2.7% 4.3% 26
Regressions 😿
(secondary)
2.6% 3.8% 12
Improvements 🎉
(primary)
-2.2% -2.2% 1
Improvements 🎉
(secondary)
-4.1% -5.1% 3
All 😿🎉 (primary) 2.5% 4.3% 27

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf +perf-regression

Footnotes

  1. the arithmetic mean of the percent change 2 3

  2. number of relevant changes 2 3

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Jul 29, 2022
@cjgillot
Copy link
Contributor Author

Seems it's not worth it any more.

@cjgillot cjgillot closed this Jul 30, 2022
@apiraino apiraino removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jul 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
perf-regression Performance regression. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.