-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
incr.comp.: Explore delayed read-edge deduplication or getting rid of it entirely. #45873
Comments
I'm working on this. |
I've collected some data from a few crates. It appears that duplicate reads happen much more frequently than originally thought. Of the crates I tested, the lowest was 19.2% duplicated reads. @michaelwoerister since this doesn't match what you were expecting, would you mind looking at my changes and verifying that I didn't do something wrong? My changes are available in my incr_duplicate_read_stats branch (diff). |
Huh, this is very interesting. Thank you so much for collecting this data. Goes to show again that one should always measure before assuming anything. The data collection in your branch looks correct, except for one thing: For anonymous nodes we can't really make deduplication delayed, so counting them too will skew the assessment of what effect the proposed optimization would have. Sorry for not mentioning this earlier. Would you mind adapting your code so that it uses a separate counter for anonymous nodes? I'm not going to make a prediction on how this will affect the numbers |
Reran the same tests with the updated code. The results aren't significantly different (the above "Full results here" link now points to the updated results). |
Thanks, @wesleywiser! OK, let's see, I see two options on how to proceed:
I'll let you decide, @wesleywiser |
I'm game to implement delayed duplication and see how it performs. Do you mind if I go ahead and push up a PR for the stats collection? Also, what's the best way to measure the performance before and after the change? Just use |
Cool, I'm really curious how it will do.
No, please go ahead!
I usually use That's for local testing. Once you have a version that you think is optimized enough, you can open a PR and @Mark-Simulacrum will trigger a perf.rust-lang.org measurement for us. That will give us a good idea on what performance will look like. |
…elwoerister [incremental] Collect stats about duplicated edge reads from queries Part of #45873
@michaelwoerister I've tried delaying deduplication until serialization but I'm not seeing much of a difference in compilation time. I've set a If the issue is reallocations, would preallocating the (For reference purposes, my code is in my incr_delay_dedup branch) |
I think the potential for improvement here is about 2% for a debug build with an empty cache. For a rebuild this might actually slow things down, since we'll be deduplicating already deduplicated vectors
Pretty much, but you have to be careful to measure the correct thing. A few remarks:
There are a few things in our implementation that can be improved:
|
Thanks, that's really helpful! I implemented your feedback but the results don't look very good. (The source is available in the branch I linked above) |
Well, that's sad. But your implementation looks correct (except for not preserving edge order in the I hope you still found it interesting to try a few things out! I"ll make sure to mention your efforts on the next impl period newsletter. |
Thanks! One thing I'm left wondering is if the time saved by delaying the deduplication is getting eaten by resizing the |
If you are up for that you can certainly try. |
Ok! I'll try to collect some data and report back. |
This graph looks like it's missing some axes. Could you upload the raw data? |
@michaelwoerister Given the huge number of empty vectors, I think preallocating would probably cause a huge memory blowup. Unless you have any ideas, I think we can probably close this issue. |
@wesleywiser Yeah. Also, pre-allocating isn't free either. Thanks again for all your work on this! |
At the moment the compiler will always eagerly deduplicate read-edges in
CurrentDepGraph::read_index
. In order to be able to do this the compiler has to allocate aHashSet
for each task. My suspicion is that there is not much duplication to begin with and that there's potential for optimization here:So the first step would be to collect some data on how much read-edge duplication there even is. This is most easily done by modifying the compiler to count duplicates (in
librustc/dep_graph/graph.rs
) and print the number/percentage in-Zincremental-info
. This modified compiler can then be used to compile a number of crates to get an idea what's going on.If duplication is low (e.g. less than 2% of reads got filtered out), then we could just remove de-duplication and test the effect with a try-build. Otherwise, we can move deduplication to
DepGraph::serialize()
and measure the performance impact of that.The text was updated successfully, but these errors were encountered: