-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NLL compile-time performance regression #58178
Comments
Thank you for the detailed steps to reproduce. I had to use
The generated @ehuss: Using normal profiling tools on a test case that is slow long-running will be painful. Do you have any suggestions for generating test case that is roughly 10--100x smaller? I guess I could try just removing things from |
I did some ad hoc profiling on a partial run and found that Here is a partial histogram showing the distribution of loop counts for the inner loop:
Here is the weighted version of the same histogram:
The ratios of the "counts" line in each one is 4.4, i.e. the average inner loop count is 4.4, and although it sometimes (not shown in the above data) gets above 300, generally it loops a handful of times. I also found that the In contrast, here is a partial histogram showing the distribution of loop counts for the outer loop:
Here is the weighted version of the same histogram:
The ratios of the "counts" line in each one is 8421, i.e. the average outer loop count is 8421, and it gets as high as 51566. |
Another data point: the length of |
Thanks for taking a look. It's tricky to create an isolated reproduction. One approach as you mentioned is to just use a smaller grammar. Delete all the files in |
I did a Cachegrind run using a cut-down version of foo.lyg. It basically confirmed everything I found above. The cost of traversing the constraint graph is a bit higher than I expected, as these lines show:
and
(It's extremely rare to see 25% or 13% of all instructions attributed to a single line of code!) I can see one or two opportunities to improve things, but they'll only be constant-time shavings. An algorithmic change will be necessary to make things reasonable, and I don't understand this code well enough to come up with anything myself. |
#58210 is really simple but reduces instruction counts by 22% and wall-time by 8% for compilation of my cut-down version of |
@matthewjasper and I have also been looking at this, and removing the call to For example, stubbing the call out makes the NLL overhead closer to 10% here (but of course simply removing the call worsens the diagnostics). Matthew mentioned they'll be looking at this very soon. |
@lqd could |
@estebank seems possible to have such a threshold if need be yeah. But Matthew (who, unlike me, has worked on this part of the code before — and which Niko wrote IIRC) also mentioned on Zulip there are other ways to get this diagnostics information. (Oh and in the little experiment I mentioned earlier, the "worse diagnostics" weren't terrible or anything) |
I've opened #58347 which gets the performance to a much better level (less than 50% slower for a full clean debug build). Once that has landed we probably want to add the test to perf.rlo and profile what's causing the remaining slow down. |
… r=pnkfelix Closure bounds fixes * Ensures that "nice region errors" are buffered so that they are sorted and migrated correctly. * Propagates fewer constraints for closures (cc rust-lang#58178) * Propagate constraints from closures more precisely (rust-lang#58127) Closes rust-lang#58127 r? @nikomatsakis
… r=pnkfelix Closure bounds fixes * Ensures that "nice region errors" are buffered so that they are sorted and migrated correctly. * Propagates fewer constraints for closures (cc rust-lang#58178) * Propagate constraints from closures more precisely (rust-lang#58127) Closes rust-lang#58127 r? @nikomatsakis
With #58347 merged, this should be much better now. Could @Mark-Simulacrum add this to perf.rlo? And does anyone want to investigate the remaining perf gap? |
I probably won't have a chance for a few days but it should be fairly straightforward to do so - copy files into collector/benchmarks/grammar and file a PR. |
The input size might need tweaking so that the runtime is reasonable. |
NLL triage. Seems to me like we still want to investigate the remaining performance gap here. Tagging as P-high. |
Also nominating for discussion at NLL meeting (mostly to see if I can find someone to continue the investigation of the remaining performance gap). |
assigning to @csmoe after they pinged me on zulip |
un-nominating based on assumption that @csmoe is going to continue investigating. |
I did some investigation on this, it appears that we are still spending too much time on the closure bounds. Lazily determining the best constraint seems like the best way forwards here. |
Downgrading from P-high to P-medium; the bulk of the truly awful performance regression has been addressed. Some issues may remain, but its not clear if they are worth tracking in this specific issue. So I may close this in the future. Reassigning from @csmoe to self so that I keep track of its status. |
After further discussion with @Mark-Simulacrum and @nnethercote , closing as essentially fixed (at least to the point of not being worth tracking in this issue). |
NLL seems to cause a severe regression in compile time for code generated by the gll crate.
cargo build
. For me, takes about 1 minute.src/lib.rs
and add#![feature(nll)]
cargo build
. For me, takes 77 minutes.gll generates a very large
parse.rs
file, about 46,000 lines long. (Located intarget/debug/build/rust-grammar-*/out/parse.rs
)From what little debugging I did, I see it spending most of its time in a function called
RegionInferenceContext::find_outlives_blame_span
.Tested: 1.32.0 (stable) and rustc 1.34.0-nightly (f6fac42 2019-02-03).
Please let me know if there's any other information I can provide.
cc @qmx
The text was updated successfully, but these errors were encountered: