Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[nll] why is inflate so dang slow? #51377

Closed
nikomatsakis opened this issue Jun 5, 2018 · 10 comments
Closed

[nll] why is inflate so dang slow? #51377

nikomatsakis opened this issue Jun 5, 2018 · 10 comments
Labels
A-NLL Area: Non-lexical lifetimes (NLL) C-enhancement Category: An issue proposing an enhancement or a PR with one. NLL-performant Working towards the "performance is good" goal T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@nikomatsakis
Copy link
Contributor

Looking at http://perf.rust-lang.org/nll-dashboard.html, the inflate test is way out of line with the others — what is making it so slow? We need to do some profiles.

I wrote up a guide to profiling that talks about how I do profiling.

@nikomatsakis nikomatsakis added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. WG-compiler-nll NLL-performant Working towards the "performance is good" goal labels Jun 5, 2018
@Mark-Simulacrum
Copy link
Member

Note that when doing profiling you should be using the in-tree inflate, not the crate off of crates.io, as their code is quite different.

@spastorino spastorino self-assigned this Jun 5, 2018
@spastorino
Copy link
Member

[santiago@archlinux inflate (master)]$ perf focus '{do_mir_borrowck}' --tree-callees --tree-min-percent 3
Matcher    : {do_mir_borrowck}
Matches    : 2290
Not Matches: 331
Percentage : 87%

Tree
| matched `{do_mir_borrowck}` (87% total, 0% self)
: | rustc_mir::borrow_check::nll::compute_regions (72% total, 0% self)
: : | rustc_mir::borrow_check::nll::type_check::type_check_internal (64% total, 0% self)
: : : | rustc_mir::borrow_check::nll::type_check::type_check::_$u7b$$u7b$closure$u7d$$u7d$::h5e644cf9693979bb (63% total, 0% self)
: : : : | rustc_mir::borrow_check::nll::type_check::liveness::generate (63% total, 59% self)
: : | rustc_mir::borrow_check::nll::region_infer::RegionInferenceContext::solve (6% total, 0% self)
: : : | rustc::util::common::time (6% total, 0% self)
: : : : | rustc_mir::borrow_check::nll::region_infer::RegionInferenceContext::solve_inner (6% total, 0% self)
: : : : : | rustc_mir::borrow_check::nll::region_infer::RegionInferenceContext::compute_region_values (6% total, 0% self)
: : : : : : | <rustc_data_structures::bitvec::SparseBitMatrix<R, C>>::merge (6% total, 0% self)
: : : : : : : | <alloc::btree::map::BTreeMap<K, V>>::entry (5% total, 1% self)
: : : : : : : : | alloc::btree::search::search_tree (3% total, 0% self)
: : : : : : : : : | alloc::btree::search::search_tree (3% total, 3% self)
: | rustc::mir::visit::Visitor::visit_mir (5% total, 3% self)
: | <rustc_mir::borrow_check::MirBorrowckCtxt<'cx, 'gcx, 'tcx> as rustc_mir::dataflow::DataflowResultsConsumer<'cx, 'tcx>>::visit_statement_entry (4% total, 0% self)
: | rustc_mir::dataflow::do_dataflow (3% total, 0% self)

@spastorino spastorino removed their assignment Jun 6, 2018
Mark-Simulacrum added a commit to Mark-Simulacrum/rust that referenced this issue Jun 6, 2018
NLL performance boost

This makes compilation of the [inflate](https://github.com/rust-lang-nursery/rustc-perf/tree/master/collector/benchmarks/inflate) benchmark compile 2 times faster on my computer when NLL is enabled.
This does not fix rust-lang#51377 perfectly, it's still 4-5 times slower than without NLL, but it's a start.
Mark-Simulacrum added a commit to Mark-Simulacrum/rust that referenced this issue Jun 7, 2018
NLL performance boost

This makes compilation of the [inflate](https://github.com/rust-lang-nursery/rustc-perf/tree/master/collector/benchmarks/inflate) benchmark compile 2 times faster on my computer when NLL is enabled.
This does not fix rust-lang#51377 perfectly, it's still 4-5 times slower than without NLL, but it's a start.
bors added a commit that referenced this issue Jun 7, 2018
NLL performance boost

This makes compilation of the [inflate](https://github.com/rust-lang-nursery/rustc-perf/tree/master/collector/benchmarks/inflate) benchmark compile 2 times faster on my computer when NLL is enabled.
This does not fix the #51377 perfectly, it's still 4-5 times slower than without NLL, but it's a start.
kennytm added a commit to kennytm/rust that referenced this issue Jun 7, 2018
NLL performance boost

This makes compilation of the [inflate](https://github.com/rust-lang-nursery/rustc-perf/tree/master/collector/benchmarks/inflate) benchmark compile 2 times faster on my computer when NLL is enabled.
This does not fix the rust-lang#51377 perfectly, it's still 4-5 times slower than without NLL, but it's a start.
@nnethercote
Copy link
Contributor

Speaking about the non-NLL case: inflate is really weird. I don't know why, but the profiles for it look entirely different to those for any of the other rustc-perf benchmarks. It spends an unusual amount of time in obligation_forest code. In particular, it makes a bazillion calls to shallow_resolve, which in turn makes a bazillion calls to get_root_key within ena. From Cachegrind, here is a key hotspot:

          .      fn process_obligation(&mut self,
          .                            pending_obligation: &mut Self::Obligation)
          .                            -> ProcessResult<Self::Obligation, Self::Error>
          .      {
          .          // if we were stalled on some unresolved variables, first check
          .          // whether any of them have been resolved; if not, don't bother
          .          // doing more work yet
 55,898,106          if !pending_obligation.stalled_on.is_empty() {
 59,971,481              if pending_obligation.stalled_on.iter().all(|&ty| {
179,914,443                  let resolved_ty = self.selcx.infcx().shallow_resolve(&ty);
          .                  resolved_ty == ty // nothing changed here
          .              }) {
          .                  debug!("process_predicate: pending obligation {:?} still stalled on {:?}",
          .                         self.selcx.infcx()
          .                             .resolve_type_vars_if_possible(&pending_obligation.obligation),
          .                         pending_obligation.stalled_on);
167,524,737                  return ProcessResult::Unchanged;
          .              }
     94,744              pending_obligation.stalled_on = vec![];
          .          }

#51411 improves some of that.

clap-rs is the benchmark most similar to inflate, but it still spends much less time in those parts of the code.

I don't know if those functions are used with NLL, but hopefully that's useful info.

@jkordish jkordish added C-enhancement Category: An issue proposing an enhancement or a PR with one. A-NLL Area: Non-lexical lifetimes (NLL) labels Jun 13, 2018
@nikomatsakis
Copy link
Contributor Author

@nnethercote pretty interesting. As of now NLL time is 300% of normal time, which is still high though not as high as it was to start. I guess it is worth profiling this separately, particularly once #51411 lands...it seems like #51538 does not particularly help.

@eddyb
Copy link
Member

eddyb commented Jun 26, 2018

As usual, you really need to keep in mind that the inflate version that's in rustc-perf is ridiculously bloated, as it predates image-rs/inflate#35 (the PR looks small but try expanding the macros).

Could we please rename it to inflate-old-and-bloated (or a similar shorthand)?
Ideally we could apply such a rename retroactively to past data as well.

Also, I'd rather tell people to update their inflate, than block NLL. And we should ask @bvssvni to release a new 0.x.y version for every 0.x of inflate, with the de-bloating applied.

clap-rs is the benchmark most similar to inflate

Funny you should mention that (it's also a pre-debloat) rust-lang/rustc-perf#183 (comment)

As an aside, this sort of thing (one function with a lot of code in it) is why I started hacking on rustc many years ago. I was auto-generating a huge machine code decoder function and rustc was OOM-ing on it. IIRC, one of the culprits was borrowck, failing to allocate roughly 700TB for one Vec.

@bvssvni
Copy link

bvssvni commented Jun 26, 2018

@eddyb can you find somebody more familiar with inflate to make these releases?

@eddyb
Copy link
Member

eddyb commented Jun 26, 2018

@bvssvni Do you mean making the git commits, or also publishing? I don't know anyone other than you who can publish piston crates (I don't think I can, anyway, but I'm not sure how to check).
cc @RazrFalcon for the former, but I might do it myself if I find time for it.

@bvssvni
Copy link

bvssvni commented Jun 26, 2018

@eddyb all PistonCollaborators can publish. You're one of them.

@eddyb
Copy link
Member

eddyb commented Jun 28, 2018

@bvssvni Makes sense, is that automatic via GitHub? I'm not sure what the process is, and I can't find you on IRC. In the meanwhile, I realized that all past version bumps don't appear to be necessary and left this comment with a suggestion of just publishing the latest 0.4 patch version under all previous minor versions: image-rs/inflate#37 (comment).

@nikomatsakis
Copy link
Contributor Author

I'm going to close this issue, as inflate is now 278.32%. Slow but it's a tiny edge case test anyway and it's not 10x anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-NLL Area: Non-lexical lifetimes (NLL) C-enhancement Category: An issue proposing an enhancement or a PR with one. NLL-performant Working towards the "performance is good" goal T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

7 participants