Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix performance regression in debuginfo file_metadata. #70803

Merged
merged 1 commit into from
Apr 5, 2020

Conversation

arlosi
Copy link
Contributor

@arlosi arlosi commented Apr 5, 2020

Fixes performance regression caused by #69718.

Finding the SourceFile associated with a FileName called get_source_file on the SourceMap, which does a linear search through all files in the SourceMap.

This resolves the issue by passing the SourceFile in from the caller (which already had it available) instead of the FileName

Fixes #70785.

Finding the `SourceFile` associated with a `FileName` called `get_source_file` on
the `SourceMap`, which does a linear search through all files in the `SourceMap`.

This resolves the issue by passing the SourceFile in from the caller (which already
had it available).
@rust-highfive
Copy link
Collaborator

r? @matthewjasper

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Apr 5, 2020
@arlosi
Copy link
Contributor Author

arlosi commented Apr 5, 2020

r? @eddyb

Should we do a perf run to see if this fully resolves the regression? It seemed to resolve it in a few benchmarks I ran locally.

@rust-highfive rust-highfive assigned eddyb and unassigned matthewjasper Apr 5, 2020

let source_file = cx.sess().source_map().get_source_file(file_name);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, I missed this during review! This is indeed unnecessarily slow.

file_name: &FileName,
source_file: &SourceFile,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, I think I did this (plus removing defining_crate and relying on the more accurate source_file.is_imported()) in one of my PRs that has been sitting around.

@eddyb
Copy link
Member

eddyb commented Apr 5, 2020

Let's land it and see the results there, to reduce the time master spends regressed.

@bors r+ rollup=never p=10

@bors
Copy link
Contributor

bors commented Apr 5, 2020

📌 Commit 4cdceda has been approved by eddyb

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 5, 2020
@bors
Copy link
Contributor

bors commented Apr 5, 2020

⌛ Testing commit 4cdceda with merge 607b858...

@bors
Copy link
Contributor

bors commented Apr 5, 2020

☀️ Test successful - checks-azure
Approved by: eddyb
Pushing 607b858 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Apr 5, 2020
@bors bors merged commit 607b858 into rust-lang:master Apr 5, 2020
@nnethercote
Copy link
Contributor

@arlosi: This change fixes most of the regressions, thanks.

But I notice that the "clap-rs-debug patched incremental: println" benchmark regression is only partly fixed. Any ideas what happened there?

@mati865
Copy link
Contributor

mati865 commented Apr 8, 2020

@nnethercote
Copy link
Contributor

@mati865: is that the right perf link? Those perf results look like not much changed.

@mati865
Copy link
Contributor

mati865 commented Apr 8, 2020

@nnethercote it's difference starting right before original PR and after this fix proving there is no big preformance hit.
For clap see this results (it's different PR): https://perf.rust-lang.org/compare.html?start=74bd074eefcf4915c73d1ab91bc90859664729e6&end=9e55101bb681010c82c3c827305e2665fc8f2aa0&stat=instructions:u and the comment I linked above for the explanation.

@nnethercote
Copy link
Contributor

I see now that this caused a large regression in memory usage

@arlosi: was that expected? Any thoughts on how it could be improved?

@arlosi
Copy link
Contributor Author

arlosi commented May 7, 2020

@nnethercote It appears that the change that caused the CPU regression, also caused an rss improvement. And that fixing the CPU regression removed the rss improvement.

The original change that contained the CPU regression went in on 2020-04-03, and the fix went in on 2020-04-05. Looking at the graph for cargo-debug max-rss over that time period shows an improvement in memory usage, then a matching regression:
https://perf.rust-lang.org/index.html?start=2020-04-01&end=2020-04-08&absolute=true&stat=max-rss

Here's the run that introduced the CPU regression (and rss improvement):
https://perf.rust-lang.org/compare.html?start=9e55101bb681010c82c3c827305e2665fc8f2aa0&end=6050e523bae6de61de4e060facc43dc512adaccd&stat=max-rss

The key difference that caused the CPU regression was adding this line:
let source_file = cx.sess().source_map().get_source_file(file_name);
However, I can't explain how that could have reduced memory usage.

@nnethercote
Copy link
Contributor

I investigated the max RSS change. As far as I can tell, the number of allocations, and their sizes, are basically identical in the two versions. But the distribution is different, due to the significant timing differences. My theory is that because the back-end is multi-threaded, how the memory peaks for all the individual threads overlap have a big effect on how the global peak manifests.

Here is what DHAT tells me about memory usage when the speed regression was in place:

    Total:     22,161,188,077 bytes (100%, 71,870.02/Minstr) in 52,214,866 blocks (100%, 169.34/Minstr), avg size 424.42 bytes, avg lifetime 6,620,711,098.71 instrs (2.15% of program duration)
    At t-gmax: 662,297,094 bytes (100%) in 2,100,503 blocks (100%), avg size 315.3 bytes

and here's what it looks like now:

    Total:     22,161,559,231 bytes (100%, 128,445.93/Minstr) in 52,214,794 blocks (100%, 302.63/Minstr), avg size 424.43 bytes, avg lifetime 7,079,327,664.2 instrs (4.1% of program duration)
    At t-gmax: 1,018,333,644 bytes (100%) in 4,138,546 blocks (100%), avg size 246.06 bytes

The "Total" lines describe the allocations done over the entire life of the program, and they are the same. The "At t-gmax" lines describe the live allocations that exist at the global memory peak, and they are very different.

Here is the with-regression memory usage shown by Massif:
old
There are lots of local maxima, spread out evently, that are close to the global maximum.

And here is the same thing for the current code:
new
The global maximum is larger and more pronounced relative to the rest of the curve.

In short, I don't think there's really anything to be done about this. At least it's good to know that nothing silly is happening.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

A big performance regression in benchmarks
7 participants