-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Directly save a byte representation of the dep-graph and work-product index #83322
Conversation
@bors try @rust-timer queue |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
⌛ Trying commit de6aaf996d41dddc68ba0f1e5702c42ecd316236 with merge 6a12039846406dd55e03e6296ebaaab7cd258012... |
☀️ Try build successful - checks-actions |
Queued 6a12039846406dd55e03e6296ebaaab7cd258012 with parent 41b315a, future comparison URL. |
Finished benchmarking try commit (6a12039846406dd55e03e6296ebaaab7cd258012): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never |
@bors try @rust-timer queue |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
⌛ Trying commit a1ea84fbd3fd67d65505013e7d031cf30c6f5d10 with merge 13d82bed22285425f21da9852b33173df9108b4e... |
☀️ Try build successful - checks-actions |
Queued 13d82bed22285425f21da9852b33173df9108b4e with parent 61edfd5, future comparison URL. |
Finished benchmarking try commit (13d82bed22285425f21da9852b33173df9108b4e): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never |
Thanks for the PR, @cjgillot! What's the difference between "raw" and "opaque"? The opaque encoder/decoder are already meant to be the "raw" serialization framework. Is there anything that the "raw" version does that the opaque version could not be made to do? |
The opaque serialization uses LEB128 encoding, so as to have an architecture independent file. The raw encoding dumps bytes as they are. |
Ah, that makes sense. How does that affect the size of the dep-graph file? |
It's difficult to say. LEB128 is variable-length, so encoding small numbers takes less room. In the best case, the LEB128 reduces size by 75%, but in the worst case LEB128 overhead is 15%. |
|
||
macro_rules! write_raw { | ||
($enc:expr, $value:expr, $int_ty:ty) => {{ | ||
let bytes = $value.to_ne_bytes(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please use to_le_bytes
instead? This is just as fast on little endian systems, but for example makes it possible to move the incr cache to a big endian system and use then --target
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this an actual use case? This PR makes the file format unportable (because of isize
/usize
mainly), with the objective to memmap part of it. If portability is required, I need to change the implementation to make sure of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right I can't name any. I do remember some talk about maybe changing crate metadata to be an incr comp snapshot or something like that in the future. I can't remember the details. In that case portability is very important for cross-compilation.
@rust-lang/wg-compiler-performance How far are we away from collecting file sizes via self-profiling and perf.rlo? It should be quite easy to add the recording part to |
No one is currently working on this as far as I know. However, I'm also not aware of any reason why this would not be possible to add. |
⌛ Trying commit bf32e3f8ff612fba7ad7e860a661575a1d91b293 with merge c16d991f9b3d8ee4b2b0f31533933924006b013e... |
☀️ Try build successful - checks-actions |
Queued c16d991f9b3d8ee4b2b0f31533933924006b013e with parent 93542a8, future comparison URL. |
Finished benchmarking commit (c16d991f9b3d8ee4b2b0f31533933924006b013e): comparison url. Summary: This change led to very large relevant mixed results 🤷 in compiler performance.
If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR led to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never |
As expected, there is a ~30% hit on file sizes. This is sizeable. What should be the decision process?
Cons:
We can also wait for a preliminary implementation of memmap dep-graph to benchmark it. |
A few questions: Do we have a sense of whether the disk usage is caused by some types/queries in particular? I'm primarily wondering if we can apply some optimizations to shrink the file size impact, since it seems somewhat non-obvious that the impact is strictly necessary to get the mmap-ability. Do we know what the instruction count improvements are primarily coming from? Can we get those benefits without the disk space increase, or at least, at smaller cost? (e.g., is it due to avoiding variable-sized integer encoding/decoding?) Also, is the impact limited just to incremental artifacts (I presume so?) -- if so, then the 30-40% impact is likely to be less for smaller workspaces, since their target directory size is probably dominated by compiled dependencies, not the leaf crate's incremental data. For rustc and other large workspaces (where incremental is applied to many crates), though, a 30-40% increase is going to be rather painful -- I'm worried that it may make it harder for folks to use incremental. |
If I'm reading the code correctly this is only about the DepGraph in particular. So what we encode here are mostly node indices, fingerprints, and DepNodes (which internally are a pair of discriminant and fingerprint).
Is this observable somewhere? Some ideas on reducing the file size:
Regarding the decision process: I don't know :) My thoughts:
Having a memmapped implementation that doesn't require any up-front decoding might indeed be helpful for making a decision here. |
☔ The latest upstream changes (presumably #94174) made this pull request unmergeable. Please resolve the merge conflicts. |
This comment has been minimized.
This comment has been minimized.
The job Click to see the possible cause of the failure (guessed by this bot)
|
☔ The latest upstream changes (presumably #95418) made this pull request unmergeable. Please resolve the merge conflicts. |
Those files are internal to the incremental engine. They are not meant to be portable.
r? @ghost