Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compile-time: massive regression during 2015 #25069

Closed
nwin opened this issue May 3, 2015 · 23 comments
Closed

compile-time: massive regression during 2015 #25069

nwin opened this issue May 3, 2015 · 23 comments
Labels
I-compiletime Issue: Problems and improvements with respect to compile times. P-medium Medium priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@nwin
Copy link
Contributor

nwin commented May 3, 2015

I noticed earlier that the compile time of image increased massively during the beginning of 2015. Unfortunately it was hard to quantify this regression since we also changed a lot during this time.

Fortunately I just found a good example that shows a 20-fold increase in compile time. It is a single file with no other dependencies (if somebody wants to try: the original version is even much older). The two revisions are exactly the same code, the newer just has been updated to compile on a recent rustc. Note that this is rustc only, no time is spend in llvm.

$ sudo rustc --version
rustc 0.13.0-nightly (7608dbad6 2014-12-31 10:06:21 -0800)
$ time sudo rustc --crate-type=lib -Z no-trans inflate.rs

real    0m1.176s
user    0m1.059s
sys 0m0.113s

(note that something is wrong with the old .pkg-file I used to install rustc, I need to use sudo to run it)

$ rustc --version
rustc 1.1.0-nightly (c4b23aec4 2015-04-29) (built 2015-04-28)
$ time rustc --crate-type=lib -Z no-trans inflate.rs

real    0m25.208s
user    0m23.919s
sys 0m1.213s

The time profiles show that driver::phase_1_parse_input and driver::phase_2_configure_and_expand almost stayed constant while the time spent in driver::phase_3_run_analysis_passes exploded.

@huonw huonw added the I-compiletime Issue: Problems and improvements with respect to compile times. label May 3, 2015
@bluss
Copy link
Member

bluss commented May 3, 2015

If you can, bench after the merge of aecf3d8 / PR #24965 as well. Merged Fri May 1 08:33:08 2015 +0000 so it should be in the nightlies. Oh and share output of -Z time-passes too!

@nwin
Copy link
Contributor Author

nwin commented May 3, 2015

that improved marginally (edit: I checked that is after the merge)

$ rustc --version
rustc 1.1.0-nightly (c42c1e7a6 2015-05-02) (built 2015-05-01)
$ time rustc --crate-type=lib -Z no-trans inflate.rs

real    0m19.984s
user    0m18.948s
sys 0m1.033s
$ rustc --crate-type=lib -Z no-trans -Z time-passes inflate.rs
time: 0.007     parsing
time: 0.000     recursion limit
time: 0.001     configuration 1
time: 0.000     gated macro checking
time: 0.000     crate injection
time: 0.009     macro loading
time: 0.000     plugin loading
time: 0.000     plugin registration
time: 0.169     expansion
time: 0.001     complete gated feature checking 1
time: 0.018     configuration 2
time: 0.009     maybe building test harness
time: 0.008     prelude injection
time: 0.001     checking that all macro invocations are gone
time: 0.001     complete gated feature checking 2
time: 0.010     assigning node ids and indexing ast
time: 0.001     external crate/lib resolution
time: 0.002     language item collection
time: 0.037     resolution
time: 0.002     lifetime resolution
time: 0.000     looking for entry point
time: 0.000     looking for plugin registrar
time: 0.012     region resolution
time: 0.001     loop checking
time: 0.001     static item recursion checking
time: 0.002     type collecting
time: 0.001     variance inference
time: 0.129     coherence checking
time: 18.775    type checking
time: 0.048     const checking
time: 0.009     privacy checking
time: 0.000     stability index
time: 0.003     intrinsic checking
time: 0.003     effect checking
time: 0.006     match checking
time: 0.194     liveness checking
time: 0.438     borrow checking
time: 0.035     rvalue checking
time: 0.000     reachability checking
time: 0.007     death checking
time: 0.079     stability checking
time: 0.000     unused lib feature checking
time: 0.116     lint checking

@arielb1
Copy link
Contributor

arielb1 commented May 3, 2015

Your code has a >10kLOC method. Couldn't you split it?

@arielb1
Copy link
Contributor

arielb1 commented May 3, 2015

Performance does seem O(n^2) in the number of len! calls.

@nwin
Copy link
Contributor Author

nwin commented May 3, 2015

Maybe it is a duplicate of #23977?

@bluss
Copy link
Member

bluss commented May 4, 2015

Tested current nightly after the PR to inline hashing better. Only reduces run time of the type checking pass by 6%.

@pnkfelix
Copy link
Member

pnkfelix commented May 4, 2015

Did binary search over the nightlies (thanks to multirust); problem was injected between nightly-2015-03-30 and nightly-2015-03-31

In other words, this version is fast:

rustc 1.0.0-nightly (d8be84eb4 2015-03-29) (built 2015-03-29)

and this version is slow:

rustc 1.0.0-nightly (6cf3b0b74 2015-03-30) (built 2015-03-30)

Update: scanning over git log d8be84eb4..6cf3b0b74, I wonder whether #23673 is a likely culprit.

@bluss
Copy link
Member

bluss commented May 4, 2015

@pnkfelix That's what niko is indicating with the other issue too #23977 (comment)

Edit: Awesome job bisecting though! That's a great trick to use.

@Manishearth
Copy link
Member

I have noticed a similar effect in Servo across a particular Rust upgrade (forgot which).

Can we give this priority now that 1.0 is out?

@arielb1
Copy link
Contributor

arielb1 commented Jun 2, 2015

@Manishearth

Are you sure that this is the issue?

@Manishearth
Copy link
Member

No, I'm not, but regardless, we should be fixing this with priority.

@arielb1
Copy link
Contributor

arielb1 commented Jun 2, 2015

The problem is that literal inference creates tons of literal variables (IntVid/FloatVid) and obligations around them, and we uselessly try to resolve these obligations.

I think of 3 ways to solve this
1) Make the operator impls built-in, essentially impl<intvar T> Add<T> for T. This would avoid creating bogus obligations.
2) dually, make check_binop not try to select_obligations_where_possible when it ends up with a float/int var. This would prevent the O(n^2)-ness, but if there are other places that select_obligations_where_possible we would have O(n m)
3) Quickly ignore obligations that didn't make progress during selection. I think this would improve selection performance, but this requires more complexity.
cc @nikomatsakis

@pnkfelix
Copy link
Member

pnkfelix commented Jun 2, 2016

Nominating for prioritization. I suspect at this point its merely P-medium, but best to discuss. (Plus for all I know its actually been resolved, since #25916 was resolved by PR #31680, but then again that same PR notes that the quadratic behavior is still present; it just has less impact...)

@nikomatsakis
Copy link
Contributor

In compiler team meeting: we believe this to have been fixed. Would be good to verify whether the perf gap has been corrected in the meantime!

@nikomatsakis
Copy link
Contributor

triage: P-medium

@rust-highfive rust-highfive added the P-medium Medium priority label Jun 2, 2016
@DemiMarie
Copy link
Contributor

Has anyone tried benchmarking the compiler?

@Mark-Simulacrum
Copy link
Member

I believe @nnethercote has been working on profiling and benchmarking the compiler, and there is also http://perf.rust-lang.org/ which attempts to keeps track of current compile times.

Ignore the 18000% regression on perf.rlo right now, we recently changed what we were benchmarking for syntex which caused that jump.

@nnethercote
Copy link
Contributor

Yes, I have been doing some benchmarking and profiling. So far I've landed #36734 and #36592, and I just opened #36917. I think there is still some low- to middle-hanging fruit to be picked.

I have also done some work on https://github.com/rust-lang-nursery/rustc-benchmarks. I added a script that lets you compare the speed of two different compilers. I also fixed it so that the syntex benchmark now measures the right thing, which explains the huge regression that @Mark-Simulacrum mentioned.

@pnkfelix
Copy link
Member

pnkfelix commented Oct 3, 2016

I just took the gist linked earlier and re-ran it on today's compiler.

(I had to add some extra #![feature] and #![allow] attributes to get it to compile without error, but that process is pretty mechanical and uninteresting.)

The performance of the compiler has not returned to the point where it was at circa March 29th. In particular:

Nightly Commit Time
nightly-2015-03-30 d8be84e real 0m6.208s
nightly-2015-03-31 6cf3b0b real 0m18.546s
nightly-2015-07-30 4d52d7c real 0m13.056s
nightly-2015-08-30 f3f23bf real 0m12.320s
nightly-2015-09-30 65d5c08 real 0m12.639s
nightly-2016-01-30 303892e real 0m11.610s
nightly-2016-02-14 fae5162 real 0m12.203s
nightly-2016-02-17 57c357d real 0m2.163s
nightly-2016-02-28 54fdae3 real 0m2.181s
nightly-2016-09-30 289f3a4 real 0m2.273s

so we've managed to recover from part of the regression, but not all of it.


Update: hold on, I just realized that I was looking at September of 2015 in those runs. Let me do a few more.

Update 2: Okay, now that I actually looked at runs from 2016, it looks like we have completely recovered all of the lost time, at least on the isolated benchmark in the gist.

@pnkfelix
Copy link
Member

pnkfelix commented Oct 3, 2016

I'm going to close this ticket since I believe we have addressed the specific compile-time regression described here.

@pnkfelix pnkfelix closed this as completed Oct 3, 2016
@nnethercote
Copy link
Contributor

I just took the gist linked earlier and re-ran it on today's compiler.

FWIW: that gist contains inflate.rs. A slightly different version of that file is in rustc-benchmarks, which means it's likely to get some attention. In fact, I've already done some profiling for it and found that it is totally dominated by operations involving ObligationForest. (It's the only benchmark I've seen for which this is true.) I will try to get around to looking at that one soon, though I don't know anything about ObligationForest so if someone more knowledgeable wants to look in the meantime, that would be fine with me :)

@nikomatsakis
Copy link
Contributor

@nnethercote feel free to ping me sometime if you want to look at it a bit together

@arielb1
Copy link
Contributor

arielb1 commented Oct 4, 2016

@nnethercote

That would be expected. The test-case triggers an O(n^2) case in ObligationForest. We have mitigated that case to take a reasonable amount of time on reasonable examples.

It is possible that this could be made O(n), but the ways seem to have bad overheads in the common case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I-compiletime Issue: Problems and improvements with respect to compile times. P-medium Medium priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests