Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking Issue for making incremental compilation the default for Release Builds #57968

Open
1 of 4 tasks
michaelwoerister opened this issue Jan 29, 2019 · 25 comments
Open
1 of 4 tasks
Labels
A-incr-comp Area: Incremental compilation C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC I-compiletime Issue: Problems and improvements with respect to compile times. S-tracking-design-concerns Status: There are blocking design concerns. T-cargo Relevant to the cargo team, which will review and decide on the PR/issue. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-core Relevant to the core team, which will review and decide on the PR/issue. WG-compiler-performance Working group: Compiler Performance

Comments

@michaelwoerister
Copy link
Member

michaelwoerister commented Jan 29, 2019

Since incremental compilation supports being used in conjunction with ThinLTO the runtime performance of incrementally built artifacts is (presumably) roughly on par with non-incrementally built code. At the same time, building things incrementally often is significantly faster ((1.4-5x according to perf.rlo). As a consequence it might be a good idea to make Cargo default to incremental compilation for release builds.

Possible caveats that need to be resolved:

  • The initial build is slightly slower with incremental compilation, usually around 10%. We need to decide if this is a worthwhile tradeoff. For debug and check builds everybody seems to be fine with this already.
  • Some crates, like style-servo, are always slower to compile with incr. comp., even if there is just a small change. In the case of style-servo that is 62 seconds versus 64-69 seconds on perf.rlo. It is unlikely that this would improve before we make incr. comp. the default. We need to decide if this is a justifiable price to pay for improvements in other projects.
  • Even if incremental compilation becomes the default, one can still always opt out of it via the CARGO_INCREMENTAL flag or a local Cargo config. However, this might not be common knowledge, the same as it isn't common knowledge that one can improve runtime performance by forcing the compiler to use just one codegen unit.
  • It still needs to be verified that runtime performance of compiled artifacts does not suffer too much from switching to incremental compilation (see below).

Data on runtime performance of incrementally compiled release artifacts

Apart from anectodal evidence that runtime performance is "roughly the same" there have been two attempts to measure this in a more reliable way:

  1. PR [experiment] Benchmark incremental ThinLTO'd compiler. #56678 did an experiment where we compiled the compiler itself incrementally and then tested how the compiler's runtime performance was affected by this. The results are twofold:
    1. In general performance drops by 1-2% (compare results for clean builds)
    2. For two of the small test cases (helloworld, unify-linearly) performance drops by 30%. It is known that these test cases are very sensitive to LLVM making the right inlining decisions, which we already saw when switching from single-CGU to non-incremental ThinLTO. This is indicative that microbenchmarks may see performance drops unless the author of the benchmark takes care of marking bottleneck functions with #[inline].
  2. For a limited period of time we made incremental compilation the default in Cargo (Make incremental compilation the default for all profiles. cargo#6564) in order to see how this affected measurements on lolbench.rs. It is not yet clear if the experiment succeeded and how much useful data it collected since we had to cut it short because of a regression (Nightly regression: Can't perform LTO when compiling incrementally #57947). The initial data looks promising: only a handful of the ~600 benchmarks showed performance losses (see https://lolbench.rs/#nightly-2019-01-27). But we need further investigation on how reliable the results are. We might also want to re-run the experiment since the regression can easily be avoided.

One more experiment we should do is compiling Firefox because it is a large Rust codebase with an excellent benchmarking infrastructure (cc @nnethercote).

cc @rust-lang/core @rust-lang/cargo @rust-lang/compiler

@michaelwoerister michaelwoerister added I-compiletime Issue: Problems and improvements with respect to compile times. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. A-incr-comp Area: Incremental compilation T-core Relevant to the core team, which will review and decide on the PR/issue. T-cargo Relevant to the cargo team, which will review and decide on the PR/issue. C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC WG-compiler-performance Working group: Compiler Performance labels Jan 29, 2019
@joshtriplett
Copy link
Member

joshtriplett commented Jan 29, 2019 via email

@michaelwoerister
Copy link
Member Author

I don't think we should change the default to something that has any runtime performance cost.

I'm not sure. The current default already has a quite significant runtime performance cost because it's using ThinLTO instead of -Ccodegen-units=1.

@alexcrichton
Copy link
Member

We've had a ton of discussions before about comiple time and runtime tradeoffs, see #45320 and #44941 for just a smattering. We are very intentionally not enabling the fastest compilation mode with cargo build --release by default today, and an issue like this is a continuation of that.

@joshtriplett
Copy link
Member

@alexcrichton To avoid ambiguity, what do you mean by "fastest compilation mode" here?

I certainly think we don't need to worry about compiling as fast as possible, but I don't think our default compile should pay a runtime performance penalty like this.

@alexcrichton
Copy link
Member

Ah by that I mean that producing the fastest code possible. Producing the fastest code by default for --release would mean things like LTO, enabling PGO, customizing the LLVM pass manager to just rerun itself to either a fixed point or until some amount of time lapses, etc.

@Lokathor
Copy link
Contributor

Lokathor commented Feb 2, 2019

So if release is a "best effort at being fast while still finishing the build sometime today", can we just add a different profile for "really the fastest but it'll take a day to build".

@CryZe
Copy link
Contributor

CryZe commented Feb 2, 2019

Yeah I'm honestly thinking that it may be time for a profile between debug and release, such that there is these use cases:

  • Debug: The code is compiled such that you have the best experience trying to remove bugs.
  • "Development" / "Optimized": The code is incrementally compiled with some optimizations, such that it's suitable for fast development cycles and using it for everyday programming.
  • Release: The code is heavily optimized, such that it can be published.

At the moment I'm seeing lots of people either sacrifice the debug profile for that "Development" use case (bumping optimization levels, but reducing the debugability of the project) or sacrifice the release profile by reducing optimizations, both are kind of suboptimal.

@lnicola
Copy link
Member

lnicola commented Feb 2, 2019

rust-lang/cargo#2007
rust-lang/cargo#5326 (comment)
rust-lang/rfcs#2282

This came up a lot of times, but for some reason was never implemented. The discussions about it turned into talk about "workflows" and "profile overrides", although it's not very clear to me why:

I personally think (though I may be wrong) that the primary use-case for this is compiling specific dependencies with opts in debug mode, in which case it's unclear we even need custom profiles, and not just "being able to specify overrides for existing profiles".

alexcrichton added a commit to alexcrichton/rust that referenced this issue Feb 12, 2019
Currently the compiler will produce an error if both incremental
compilation and full fat LTO is requested. With recent changes and the
advent of incremental ThinLTO, however, all the hard work is already
done for us and it's actually not too bad to remove this error!

This commit updates the codegen backend to allow incremental full fat
LTO. The semantics are that the input modules to LTO are all produce
incrementally, but the final LTO step is always done unconditionally
regardless of whether the inputs changed or not. The only real
incremental win we could have here is if zero of the input modules
changed, but that's so rare it's unlikely to be worthwhile to implement
such a code path.

cc rust-lang#57968
cc rust-lang/cargo#6643
@michaelwoerister
Copy link
Member Author

Here is an update regarding the lolbench experiment: The first nightly with incremental release builds is nightly-2019-01-28 and the last one is nightly-2019-02-06. The following is a list of benchmarks that showed a negative regression above 1% in the range between these dates:

I've looked through all of them and didn't find any case that looked like an actual regression. They are just slightly flaky benchmarks that show spikes before, during, and after the experiment.

Unless something went wrong with the experimental setup, it seems that incremental ThinLTO produces code that is just as fast as the one produced by regular ThinLTO. At least for the ~600 benchmarks that lolbench runs.

(@anp lolbench is so awesome!! ❤️)

Centril added a commit to Centril/rust that referenced this issue Feb 14, 2019
…haelwoerister

rustc: Implement incremental "fat" LTO

Currently the compiler will produce an error if both incremental
compilation and full fat LTO is requested. With recent changes and the
advent of incremental ThinLTO, however, all the hard work is already
done for us and it's actually not too bad to remove this error!

This commit updates the codegen backend to allow incremental full fat
LTO. The semantics are that the input modules to LTO are all produce
incrementally, but the final LTO step is always done unconditionally
regardless of whether the inputs changed or not. The only real
incremental win we could have here is if zero of the input modules
changed, but that's so rare it's unlikely to be worthwhile to implement
such a code path.

cc rust-lang#57968
cc rust-lang/cargo#6643
@da-x
Copy link
Member

da-x commented Feb 16, 2019

@lnicola I've updated my custom profiles implementation now, here: rust-lang/cargo#6676 . Maybe it can be useful for this issue.

@anp
Copy link
Member

anp commented Feb 17, 2019

This is so cool 😍.

If any perf regressions do emerge, I'd like to briefly plug sending a quick PR to lolbench so they can be caught in the future.

@alexcrichton
Copy link
Member

@michaelwoerister I was wondering, are there more blockers to this that you know of?

@michaelwoerister
Copy link
Member Author

Just the ones listed in the original post. Although, maybe also Windows performance. I worked on Windows 10 for a few days and got the impression that incr. comp. is very slow there. That should be verified though.

@eddyb
Copy link
Member

eddyb commented Mar 29, 2019

@michaelwoerister Could NTFS' infamous low performance on many small files be at fault here?
Any chance we can coalesce some of those files together?

@mati865
Copy link
Contributor

mati865 commented Mar 29, 2019

I'd rather blame Windows Defender which is enabled by default on Windows 10.
If it's the case you will see it in task manager squishing everything out of single core.

@lnicola
Copy link
Member

lnicola commented Mar 29, 2019

Re Defender, I've seen someone investigating parallel extraction of the installation packages to split the scanning across multiple cores, since it's synchronous.

Does the compiler use multiple threads when writing the files? Or, going the other way, would sticking then in a SQLite database help?

@wesleywiser
Copy link
Member

Visited during the compiler team's backlog bonanza. Reading over the discussion, it seems like we still haven't decided if it makes sense to enable incremental compilation by default for release builds.

@wesleywiser wesleywiser added the S-tracking-design-concerns Status: There are blocking design concerns. label Jun 24, 2022
@michaelwoerister
Copy link
Member Author

At this time, I am strongly against making incr. comp. the default for release builds because that's the mode that's usually being used for production builds -- and incr. comp. is not a good choice for that because of increased code size, less effective optimizations, and the added risks for running into incr. comp. specific compiler bugs.

That being said, optimized incremental builds certainly have their use cases. My recommendation would be for people to use Cargo custom profiles in those cases.

In my opinion the stabilization of Cargo custom profiles makes this issue obsolete.

@eddyb
Copy link
Member

eddyb commented Jun 27, 2022

I think the conflation of "debug" and "dev" in Cargo was a bit of a mistake, especially since dev has more interesting implications.

Having some kind of "devopt" profile by default would've been great, especially if it includes some minimal amount of debuginfo (which would help with e.g. profiling, something one might want to do for local development with optimizations turned on).


I'm personally fine with adding stuff like this to all my workspace-level Cargo.toml, but tbh I'd even forgotten about turning on incremental in release mode:

# `release` but with enough debuginfo for `perf record --call-graph=dwarf`.
[profile.release-profiling]
inherits = "release"
debug = 1

@eddyb
Copy link
Member

eddyb commented Aug 9, 2022

Parts of what I said in my last comment above (#57968 (comment)) keep coming up for me.

Most recently, a coworker was wondering why some backtrace was missing almost any useful information (there's even some frames there that make no sense, so the frame pointer is probably not used consistently either).

You may ask "what does debuginfo have to do with incremental"? But that's not why I bring it up - the connection to this issue is that both "missing incremental while developing" and "missing debuginfo while developing" are side-effects of --release.


"Development" vs "Release"

Cargo came with two profiles (until the recent custom ones):

  • dev (the default)
    • -C opt-level=0, -C debuginfo=2, incremental enabled
  • release (opt-in via --release)
    • -C opt-level=3, -C debuginfo=0, incremental disabled

Three different aspects of compilation, toggled with one convenient switch.

But if we were to treat those aspects as independent, we may get this:

  • development (local source checkout, editing the code, or at least testing it)
    • optimizations: when necessary (e.g. they might be a bottleneck for testing)
    • debuginfo:
      • 0 (disabled): sometimes acceptable (when none of the below usecases apply)
      • 1 ("line info" only):
        • essential to backtraces and profiling (more so when optimizations are enabled!)
        • enough for some debugger usage (but not deep introspection)
      • 2 ("full", incl. types and values, on top of the above):
        • enables debuggers to let you poke at values in variables etc. (YMMV though, can be too much "internal implementation details" and not enough "Rust API")
        • but it can be more work to produce, and larger on disk
        • also, LLVM doesn't always preserve value debuginfo through optimizations, even when theoretically possible, so maximum debuggability is tied to (low) opt-level
    • incremental: almost universally desired (AFAIK not conflicting with the other aspects)
  • release (deployment to end-users by cargo install, distros, companies, etc.)
    • optimizations: as much as can be reasonably done (classic time-quality tradeoff)
      • could offer e.g. a bit more ThinLTO in the future, even by default?
    • debuginfo:
      • today's default (that might make sense for cargo install) is 0
      • however, bigger venues tend to enable it themselves, split it out, and:
        • (many) distros will offer separate debuginfo packages, sometimes even e.g. patch gdb to use build-id to show up a package manager install command, or even have gdb get everything on the fly itself (which might not even require patching nowadays)
        • companies shipping proprietary software often keep some debuginfo (sometimes processed into another form) and combine it with raw crash dumps from users, to debug said crashes without any secrets therein leaking to users
    • incremental: while long-term it may make sense to use it, some good reasons against it were given above
      • if a software publisher wants to try and take advantage of this for e.g. quick patches, they should probably be enabling and configuring it to suit their specific needs

Instead, we fossilized dev = "debug mode" and "optimizations" = "release mode".

Two major "unaddressed usecases" jump out at me right away, from the above presentation:

  • dev-opt: development with more optimizations, less debuginfo, but still incremental
  • rel-dbg: release with (split) debuginfo

However, I'll focus on dev-opt (more relevant to this issue and what to do about rel-dbg is less clear to me), which is also what prompted me to write this analysis after a rant on twitter.

My informal proposal (cargo build -O)

Since custom Cargo profiles were stabilized, people can do this themselves:

[profile.dev-opt]
inherits = "dev" # or "release", ideally everything is overridden below
opt-level = 3
debug = 1
incremental = true # <- only part relevant to this issue

And this is, as per the above analysis, close to the ideal development experience with optimizations (e.g. debug = 2 is more rarely needed and less useful given optimizations).

But --profile dev-opt is still not as convenient as --release (on top of changing your Cargo.toml), and it's more likely that some projects will just redefine --release instead of making a new profile (or they may have done it before custom profiles and don't see a good reason to switch now).

What I want to pass by @rust-lang/cargo, as a "vibe check", is introducing an -O flag for Cargo (vaguely analogous to similar flags in other tooling, indicating "enable optimizations"), that is a shorthand profile selector (like --release) but for the above dev-opt profile.

Off the top of my head, some potentially relevant details:

  • cargo build -O could be like rustc -O in being -C opt-level=2 instead of 3, but if that puts some people off I don't think it's a big deal to "diverge" there
  • if someone has defined their own dev-opt custom profile (with the inherits key), it could be backwards-compatible to not allow -O usage and/or warn about the conflict without impacting --profile dev-opt (i.e. the only way they would be using it today)
  • benchmarking may still want the full cost/benefit tradeoff of --release
  • debug = 1 might limit impact of incremental (relative to debug = 0), but at the same time, most people use incremental today with debug = 2, and either way I'm not sure how big of a deal this is currently

I'm not yet considering writing a full proposal, so if you're reading this in the future and wondering what happened with this idea ("fizzled out", most likely), get in touch.

@ehuss
Copy link
Contributor

ehuss commented Aug 30, 2022

We discussed a little about having a -O flag as a shorter alias, and I don't think it is something we will would like to add at this time. There are some concerns, such as the exact semantics not being clear (if it replaces the profile, or augments the current one), how it is quite a bit different from rustc -O, how it relates to -r (both are "optimized"?), and how common such a workflow would be where people would need at least three different profiles. Aliases provide a workaround for the concern about the command-line length, and I would encourage using that as needed.

@eddyb
Copy link
Member

eddyb commented Aug 31, 2022

and how common such a workflow would be where people would need at least three different profiles

I would argue you basically would never write --release again outside of some CI job, distro packaging, etc.
(cargo install would still default to it but you also don't have to write it today either)

The development workflow isn't "three profiles", it's two (i.e. the development ones, with "release" being "not development") - or just one, really (I almost always end up using debug mode to avoid typing --release - I'll also be honest and say I have never seen anyone suggest using -r, even if it's right there in the --help output).

The summary makes me worried the discussion overfocused on the -O which was my "quick hack" to plausibly get quicker adoption (since it's both shorter than --release - assuming most people don't know about/use -r - and much clearer in the intent of "I want optimizations"), but the real problem IMO is the conflation of "optimized development" w/ "release".

If I had the time and energy I would probably make an RFC, but also this may be one of those things where nothing short of a time machine will really fix it. OTOH we have editions, and the Cargo resolver 2.0 saga is a good precedent - it might be nice to deprecate -r/--release and have cargo run -O¹ vs cargo run --profile release.

¹ (or any other way to spell "optimized development", I'm not that attached to -O speficially)


Aliases provide a workaround for the concern about the command-line length, and I would encourage using that as needed.

Sadly there's no good way to do this for profiles without ending up with cargo run-O, cargo build-O, cargo rustc-O, cargo test-O etc. (also, while I can do all sorts of weird things locally to improve my experience, I can't share a command without manually expanding it first, and it can't really be taught as a workflow)

@Lokathor
Copy link
Contributor

if you want development with optimizations, why not just set an optimization level? Similarly, if you want release with split debug, isn't that something you can set in the profile? (i admit I've never had to set split debug, on windows it's always split).

@eddyb
Copy link
Member

eddyb commented Aug 31, 2022

if you want development with optimizations, why not just set an optimization level?

I go into far more detail above, but the obvious choices today are:

  • RUSTFLAGS (big hammer, won't really work well as it with fight w/ Cargo for many flags)
  • modifying the dev profile (losing access to the unoptimized default)
  • new custom profile (also requires modifying Cargo.toml/.cargo/config.toml, and then you have to pass --profile ..., but at least you get to keep the original profile)

One trick I've never seen used, would be to either set CARGO_PROFILE_dev_OPT_LEVEL=3 or pass --config profile.dev.opt-level=3 - but the semantics of that are that of "modify the dev profile" and will trash the build artifacts any time you forget to pass it (or intentionally don't) - just like messing with RUSTFLAGS.

And for all of those options (unless you start with the release profile and "add back" incremental and maybe a bit of debuginfo, to it) will leave you with other settings (like debuginfo level) that you have to also override to get what you want, so it's not as simple as "optimization level".

The closest thing to what I want is a custom profile, except for it not already existing in everyone's config.

Not to sound like a broken record but my point was that we screwed up with the "dev=debug vs opt=release" split, and that if we want to address that, we have to switch people off --release somehow.


With a -P shorthand for --profile, I could see -Pdev-opt working (same length to --release, too), but then either everyone has to define it in their .cargo/config.toml, or in each workspace Cargo.tomls (that would reduce the effort but we'd likely end up with baroque per-project profiles and more confusion), or we provide it by default.

Of course there's nobody stopping anyone from taking that to -Popt or even -PO kinds of extremes, if they're willing to define their own profiles, and I might personally do that if we get anything like a -P shorthand for --profile (and nothing for the dev-opt usecase).


Similarly, if you want release with split debug, isn't that something you can set in the profile

Not something I've ever done personally, so I can't answer this very well, but AFAIK this is generally handled by the release pipeline you have around the cargo build --release - you don't just run that command and then zip up target to ship it to users, there's several things you need to also do. (I'm not sure this is the best example, but the sentry.io docs have some additional details though they only seem to explicit describe the thing I was looking for, for wasm)

@Lokathor
Copy link
Contributor

Sorry, the post with the cargo build -O proposal didn't seem to explain why a person wouldn't turn on optimizations in the dev profile when I read it.

modifying the dev profile (losing access to the unoptimized default)

I would suggest to turn dev into the optimized + debug and make an extra profile for "ultra slow" for the rare cases when you need the ultra slow build. My own experience is that it's usually not needed. That may be my own style of programming, I don't use the step debugger, I forgot how to even turn it on.

I agree with you that cargo's two-profile system is inadequate. I just set dev to "whatever i want cargo build to do", which is usually full optimization + incremental, and then leave release as the "what should cargo install do?" profile and basically never use it myself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-incr-comp Area: Incremental compilation C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC I-compiletime Issue: Problems and improvements with respect to compile times. S-tracking-design-concerns Status: There are blocking design concerns. T-cargo Relevant to the cargo team, which will review and decide on the PR/issue. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-core Relevant to the core team, which will review and decide on the PR/issue. WG-compiler-performance Working group: Compiler Performance
Projects
None yet
Development

No branches or pull requests