Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more optimizations to the release build profile. #11298

Open
horacehoff opened this issue Oct 27, 2022 · 43 comments
Open

Add more optimizations to the release build profile. #11298

horacehoff opened this issue Oct 27, 2022 · 43 comments
Labels
A-profiles Area: profiles C-feature-request Category: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted` S-triage Status: This issue is waiting on initial triage.

Comments

@horacehoff
Copy link

Problem

The release profile is used when wanting to build an optimized binary. Thus, I think it would be logical to optimize it to the fullest, sometimes at the cost of a greater build time, but I think it is a good compromise for many people, including me.

Proposed Solution

More optimizations flags should be added to the release build profile, notably lto=true and codegen-units=1, as well as optimizing all the packages with opt-level=3.
This greatly enhances runtime performance, although yes, at the cost of a bigger binary size and longer build time.

Notes

I have tested what I am proposing, with the following added in my cargo.toml:

[profile.release]
lto = true
codegen-units = 1

[profile.release.package."*"]
opt-level = 3

This improved the output binary's runtime performance by orders of magnitude, thus this issue.

@horacehoff horacehoff added the C-feature-request Category: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted` label Oct 27, 2022
@epage
Copy link
Contributor

epage commented Oct 27, 2022

I suspect lto = "thin" is a sufficient compromise that it'd be worth being the default compared to "fat" / true.

@epage epage added the A-profiles Area: profiles label Oct 27, 2022
@Muscraft
Copy link
Member

Compile times are a sensitive area and increasing them in any way is always a worry.

I ran a test on my machine (M1 Max chip 32GB ram) against a private project that has 395 dependencies (many are very heavy) and the jump in compile time is large for "fat" LTO. The results also appear to agree that lto = "thin" could work.

One thing to consider is if changing the release profile could be breaking in some unintended way. Crater might need to be run to see if it changes anything. Either way, it might be a good idea to add a new profile that contains these changes, instead of changing release.

Test Results
profile time
"fat" LTO codegen=1 2m 30s
"fat" 2m 27s
"thin" LTO codegen=1 1m 44s
"thin" 1m 08s
No LTO codegen=1 1m 39s
release 1m 09s
"fat" codegen=1
[profile.production]
inherits = "release"
lto = true
codegen-units = 1
opt-level = 3
"fat"
[profile.production]
inherits = "release"
lto = true
opt-level = 3
"thin" codegen=1
[profile.production]
inherits = "release"
lto = "thin"
codegen-units = 1
opt-level = 3
"thin"
[profile.production]
inherits = "release"
lto = "thin"
opt-level = 3
No LTO codegen=1
[profile.production]
inherits = "release"
codegen-units = 1
opt-level = 3

@horacehoff
Copy link
Author

I completely agree with you, lto is, whatever the value, a great performance increase

@horacehoff
Copy link
Author

horacehoff commented Oct 27, 2022

Either way, it might be a good idea to add a new profile that contains these changes, instead of changing release.

That's what I thought at first, adding a "production" profile that would optimize the binary to the maximum and which would use the flags I mentioned in the issue.

@weihanglo
Copy link
Member

adding a "production" profile that would optimize the binary to the maximum

Just want to make it clear. Are we proposing a new built-in profile here?

@memoryruins
Copy link

One thing to consider is if changing the release profile could be breaking in some unintended way.

Currently ThinLTO creates issues with at least one common target while linking with lld; a potential fix was opened recently rust-lang/rust#103353

@horacehoff
Copy link
Author

horacehoff commented Nov 2, 2022

Just want to make it clear. Are we proposing a new built-in profile here?

It depends on what the majority of devs think. It could be a new built-in profile or it could simply be en enhancement to the current release build profile.

elasticdog added a commit to EarthmanMuons/rustops-blueprint that referenced this issue May 23, 2023
This is what we'll use to generate the distributable artifacts. Some of
these things don't exist on the default release profile because there's
an aversion to making compilation times longer.

This also moves `strip` to just the production profile and adds "thin"
LTO to the release profile. It'll be nice to have a faster version of
LTO for local release builds (and benchmarks), but it won't add too much
time for compiling.

See:
- https://doc.rust-lang.org/cargo/reference/profiles.html
- https://nnethercote.github.io/perf-book/build-configuration.html
- rust-lang/cargo#11298
github-merge-queue bot pushed a commit to EarthmanMuons/rustops-blueprint that referenced this issue May 23, 2023
This is what we'll use to generate the distributable artifacts. Some of
these things don't exist on the default release profile because there's
an aversion to making compilation times longer.

This also moves `strip` to just the production profile and adds "thin"
LTO to the release profile. It'll be nice to have a faster version of
LTO for local release builds (and benchmarks), but it won't add too much
time for compiling.

See:
- https://doc.rust-lang.org/cargo/reference/profiles.html
- https://nnethercote.github.io/perf-book/build-configuration.html
- rust-lang/cargo#11298
@kornelski
Copy link
Contributor

I don't think the release profile should be made to optimize harder, because Rust's build times are already slow, and a full LTO can be extremely slow and memory-hungry (I have some projects that I can't build with fat LTO at all because they run out of memory).

It may be difficult to get everyone agree on what is desired for some "dist"/"production" profile. For example, I care about small executable sizes more than backtraces, so my preferred dist profiles include panic=abort, strip, and even panic_immediate_abort and -Zlocation-detail=none where possible.

@Kobzol
Copy link
Contributor

Kobzol commented Feb 17, 2024

Agreeing on a dist profile would be near impossible, yeah. I think that a better way would be to add some guidance/templates/profile wizard to cargo, which would help users build the desired profile interactively, with some predefined presets.

@linyihai
Copy link
Contributor

Each option for a build configuration has pros and cons, depending on the needs of the developer.
But I do need some predefined templates or configurations to make the build accomplish certain goals, such as faster builds, smaller binaries.
It would be nice to have written guidance from an official. It is also acceptable if there are unofficial plug-ins that implement predefined templates out of the box

@Kobzol
Copy link
Contributor

Kobzol commented Mar 10, 2024

Created a Cargo subcommand for configuring Cargo projects (focused on performance aspects of configuration), might be useful to automate the creation of optimized Cargo profiles: https://github.com/Kobzol/cargo-wizard.

@horacehoff
Copy link
Author

horacehoff commented Apr 11, 2024

I agree that making everyone agree on one new profile is impossible. I think it should then be considered to possibly add multiple new build profiles, each fulfilling a goal, or at least mention new ones in the docs, offering the developers the possibility to modify their 'release' profile to maximize(or minimize) a certain aspect.
For example, I think there could be at least two new ones: 'performance' (or 'speed'), and 'size' (or 'binary-size'). The first one would have the modifications pointed out in my first comment, and the second one would modify parameters to minimize the final binary size.
This would then allow developers to overwrite certain parameters in those profiles as they please while having a build profile aimed towards a certain aspect, as pointed out here by @kornelski :

my preferred dist profiles include panic=abort, strip, and even panic_immediate_abort and -Zlocation-detail=none where possible.

@foxtran
Copy link

foxtran commented Oct 28, 2024

Hi!

I've created a #14738 that enables fat Link-Time Optimization (LTO) by default for new packages which are started via cargo init command. To speed-up development cycle, LTO is enabled only for performance-related build types (release, bench). For library targets, only bench profile gets enabled LTO.

As I can see, here the discussion is mostly about already existing packages which may have a huge number of dependencies and the guys are right about usage of LTO with such packages may leads to troubles with compile time/memory consumption. However, a new projects will not have those issues since not so many code was written. And later, when the codebase of new package will grow, these optimizations can be disabled so, I do not see that it is a big issue to suggest users maximal optimization from beginning with the possibility to disable them in time.

For me, enabling of LTO looks like a next step in level of optimizations, right after -O3, so, that is why I would prefer to have enabled fat LTO for projects from scratch. The users, who will be worried about their compilation time, later can switch it to thin LTO or just disable it.

@Kobzol
Copy link
Contributor

Kobzol commented Oct 28, 2024

Fat LTO can be an incredible compile-time hog, and in my experience often with little gain over thin LTO. I would also rather see a new built-in ultra optimized profile than modifying release, tbh.

You said that LTO is not an issue for new projects, but the problem with that is that Rust compiles everything from scratch. So it's enough to add 2-3 lines to your Cargo.toml [dependencies], and suddenly you're compiling 150 crates with LTO. You don't even need to wait for the codebase to grow, just adding dependencies will make the compilation much slower (and even more so because release builds don't use incremental by default, and incremental with LTO is not so effective anyway, IIRC).

But maybe my intuition is wrong. In any case, I think that before making any kind of decision like this, we should get compilation benchmark results across the ecosystem. The Rust compiler benchmark suite could be used for this, I did something similar with it recently (I analyzed different things than the effect of LTO on compilation time, though).

This might also be interesting to you: https://github.com/zamazan4ik/lto-statistics

@horacehoff
Copy link
Author

I agree with @Kobzol, optimizations that heavily impact the compile time should be used on a new, dedicated, "heavy" profile.
In that case, I think this profile should implement fat LTO, codegen-units=1, and use opt-level=3 on all packages. This profile would thus be used when wanting an "ultra-optimized" binary, at the expense of a potentially very long build time.

@foxtran
Copy link

foxtran commented Oct 28, 2024

@Kobzol,

Fat LTO can be an incredible compile-time hog, and in my experience often with little gain over thin LTO.

How often do rust developers compile release config in modify-build-test cycle? It seems to me that mostly it is happen in CI(/CD) rather on developers' machines. So, I do not see a problem here. And since it is in CI, it will relatively easy to collect statistic of compilation times during package development and decide to use lighter LTO version or refuse it.

I would also rather see a new built-in ultra optimized profile than modifying release, tbh.

Although an idea about a new profile (publish?) with all possible optimizations does not look so bad, in many existing projects release profile is used in this way. So, the divergence in old- and -new style projects may become a really big problem in a long distance.

You said that LTO is not an issue for new projects, but the problem with that is that Rust compiles everything from scratch. So it's enough to add 2-3 lines to your Cargo.toml [dependencies], and suddenly you're compiling 150 crates with LTO.

I'm looking on #14719 and I do not see a real problem here. As I can see cargo itself has about 350 packages and it compiles about 1 min in CI. With enabled LTO compilation takes only 4 min in the worst case (I write PR descriptions in a much longer time). And assuming that developers just calls cargo build, they will not see this time in normal development cycle.

The Rust compiler benchmark suite could be used for this, I did something similar with it recently (I analyzed different things than the effect of LTO on compilation time, though).

I think for seeing on these graphs properly, one need to enable LTO default for whole ecosystem :)

This might also be interesting to you: https://github.com/zamazan4ik/lto-statistics

Thank you! As I can see, currently @zamazan4ik goes through all Rust repositories where LTO is not enabled and asks devs to enable it. Tremendous work! And along devs who answered him, most of them enabled LTO, so in most cases enabled LTO is not an issue :)

Also, sometimes developers create faster release builds like here where thin LTO is used:
https://github.com/metta-systems/vesper/blob/f96cf8f85c743033c4fac5cab98a3b85257c22d0/Cargo.toml#L51-L54
@berkus said that he found similar pattern in bevy game engine and adopt it for his own usage.

@horacehoff,

Thanks! Nice flags! I will add them into my branch :)

@weihanglo
Copy link
Member

Fat LTO can be an incredible compile-time hog, and in my experience often with little gain over thin LTO. I would also rather see a new built-in ultra optimized profile than modifying release, tbh.

Agree with @Kobzol and others on this. Here are some data points from building Cargo itself: #14719

@weihanglo
Copy link
Member

With enabled LTO compilation takes only 4 min in the worst case (I write PR descriptions in a much longer time). And assuming that developers just calls cargo build, they will not see this time in normal development cycle.

As a maintainer of Cargo I do see problems. I sometimes build in release mode for testing some subtle bugs and I don't want to wait for 4 minutes for each change I made. Also, the 4 minutes build was on a machine with more than 100 cores (granted some cases are under codegen-units=1), so it might be slower in a lower-end machine.

I think one of the issues underneath is how to teach and discover optimizations options. We have an awesome The Rust Performance Book, though it is not official and not mentioned in The Cargo Book. The Cargo Book is like a reference and doesn't really provide a guide for optimization.

@berkus
Copy link

berkus commented Oct 28, 2024

for testing some subtle bugs and I don't want to wait for 4 minutes for each change I made.

[profile.release-fast]
inherits = "release"
lto = "thin"
codegen-units = 16

@berkus
Copy link

berkus commented Oct 28, 2024

and suddenly you're compiling 150 crates with LTO.

LTO is applied at LINK time, not compile time.

@Kobzol
Copy link
Contributor

Kobzol commented Oct 28, 2024

How often do rust developers compile release config in modify-build-test cycle? It seems to me that mostly it is happen in CI(/CD) rather on developers' machines. So, I do not see a problem here. And since it is in CI, it will relatively easy to collect statistic of compilation times during package development and decide to use lighter LTO version or refuse it.

Even though it's not as important as debug/incr for modify-build-test, there are definitely use-cases where people use release for local rebuilds, be it e.g. for faster tests or in domains where it is in fact required (e.g. bevy and games). The performance of debug Rust programs is notoriously bad. Having statistics for this would of course be nicer though.

I'm looking on #14719 and I do not see a real problem here. As I can see cargo itself has about 350 packages and it compiles about 1 min in CI. With enabled LTO compilation takes only 4 min in the worst case (I write PR descriptions in a much longer time). And assuming that developers just calls cargo build, they will not see this time in normal development cycle.

3 minutes might not seem like much, but we're talking about a 4x slowdown, that's massive.

I would personally love to enable LTO by default, but there's a reason why it hasn't been done so far, and also why it isn't being done in similar toolchains, such as C and C++ compilers. In the current state of affairs, we IMO simply cannot make LTO be the default, as people would eat us alive.

That being said, this discussion is mostly based on vibes and feelings. If we had data from the ecosystem about how much does LTO slow down compilation and how much it improves performance across the board, for various benchmarks and crates, then it would be easier to make decisions based on it.

@Kobzol
Copy link
Contributor

Kobzol commented Oct 28, 2024

LTO is applied at LINK time, not compile time.

Sure, the compilation might even get a bit faster, since you're just generating bitcode, instead of code (depending on how the compilation pipeline is configured). It's true that if your program is literally a hello world, then LTO won't have much compilation effect even with many dependencies. But once you start using them, the costs will start to appear.

@horacehoff
Copy link
Author

this discussion is mostly based on vibes and feelings

It doesn't have to be, an entirely new build profile wouldn't impact existing projects and would give developers full control over whether they want to have better performance with worse compile times, or not.
Modifying the release profile, as pointed out by multiple people, would require extensive data from the ecosystem, and would also impact existing projects who already face large compile times with the release profile, hence why I think it's a much better option to add a new build profile with all the "hard"/heavy optimizations.

@Kobzol
Copy link
Contributor

Kobzol commented Oct 28, 2024

It doesn't have to be, an entirely new build profile wouldn't impact existing projects and would give developers full control over whether they want to have better performance with worse compile times, or not.

Sure, but that's a separate discussion that should be led on a separate issue. FWIW, "would give developers full control" already happens today, people can just create their own profile, and even use tools such as https://github.com/Kobzol/cargo-wizard to prepare it for them. There are I think two main problems with a new profile, first is backwards compatibility, because an existing profile with the same name could have already existed in an existing project (but I think that should be possible to override), and the second is.. bikeshedding :) It's very difficult to say what are the "best options" for runtime performance. Should it have CGU=1? For some projects more CGUs result in better performance (optimizations are a heuristic after all). Should it have debug = 0? Many projects set at least debuginfo = "line-tables-only" for production builds. Should it use LTO=thin or LTO=fat? Etc. So yeah, could be an interesting and.. difficult :) discussion.

@berkus
Copy link

berkus commented Oct 28, 2024

If we had data from the ecosystem about how much does LTO slow down compilation and how much it improves performance across the board, for various benchmarks and crates

A crater run perhaps?

@horacehoff
Copy link
Author

It's very difficult to say what are the "best options" for runtime performance.

That's true and that's a very good point :)
On the other side, yes some options could have different effects depending on the project, but IMO there are some for which performance improvements are almost guaranteed, like opt-level=3 on all packages of a project, although I'm not 100% sure LTO makes all binaries faster (even if it's said to be), I've done some research and there are cases where it slows down execution speed. @berkus's proposition could help decide, though.

"would give developers full control" already happens today, people can just create their own profile

What I meant by this is modifying the release profile would impact developers who are currently fine with its optimizations and who face an already large compile time, forcing them to create a new profile with the optimizations they desire, whereas a new build profile would impact no-one and leave the choice.

@foxtran
Copy link

foxtran commented Oct 28, 2024

@weihanglo,

As a maintainer of Cargo I do see problems. I sometimes build in release mode for testing some subtle bugs and I don't want to wait for 4 minutes for each change I made. Also, the 4 minutes build was on a machine with more than 100 cores (granted some cases are under codegen-units=1), so it might be slower in a lower-end machine.

For such cases Cargo might have special release-fast build profile like:

[profile.release-fast]
inherits = "release"
lto = "thin"
codegen-units = 16

as it was suggested by @berkus a bit later. With this profile, compilation takes one minute according to your measurements in #14719. Moreover, enabling of LTO in CI allows you and other cargo devs to detect issues with LTO at the time when they happens in PR's CI, not when somebody willl come with a bug-report that LTO does not work with Cargo (not a good news, do you agree?).

I think one of the issues underneath is how to teach and discover optimizations options. We have an awesome The Rust Performance Book, though it is not official and not mentioned in The Cargo Book. The Cargo Book is like a reference and doesn't really provide a guide for optimization.

Wow! Nice book! Probably, Cargo-books needs to split profile flags by logical sections like debug-info + symbols / optimization / compilation process flags.

@Kobzol,

Even though it's not as important as debug/incr for modify-build-test, there are definitely use-cases where people use release for local rebuilds, be it e.g. for faster tests or in domains where it is in fact required (e.g. bevy and games). The performance of debug Rust programs is notoriously bad. Having statistics for this would of course be nicer though.

For just simply running cargo build, cargo users modifies dev profile like in:

and enables LTO for release builds. So, enabling LTO by default in new projects just simplifies life of Rust developers.

3 minutes might not seem like much, but we're talking about a 4x slowdown, that's massive.

4x times only for build process. Testing takes some time too. So, current 14 min (like this for Linux) turns into 17 min and it is only 1.2x slowdown. For MacOS, it will be only 1.1x slowdown. So, it is an exchange of 1.1x-1.2x slowdown for 1.05x speedup. Sounds cool!

I will measure our massive code (1.5M LoC) with and without LTO tonight, it has funny results.

I would personally love to enable LTO by default,

Anyway, @zamazan4ik goes through Rust repos and does it by his own hands and many folks accepts usage of LTO for release builds. So, again, enabling of LTO in most of Rust repos is just a question of time, let's make this time close. And let's do it not only by single active person, but by the whole community.

but there's a reason why it hasn't been done so far, and also why it isn't being done in similar toolchains

Yep, there is a lot of reason why C/C++ applications prefer not to be compiled with LTO, sometimes even -O3 breaks the code. For example, I fixed today one bug which happens only with novel GCC's here. Fortunately, in Rust, we do not have such a large number of undefined behaviours that allows Rust users to get higher level of optimizations.

such as C and C++ compilers.

According to this issue in CMake, MSVC uses LTO for Release starting from Visual Studio 2008. You may find more info with Google.
IBM XL compilers (C/C++/Fortran) with -O4 level enables (thin?)LTO (named as interprocedure analysis, IPA), at -O5 level enables fat LTO. And it was at least for IBM XL 13.1 that was released in 2014 or a bit earlier.

if your program is literally a hello world

At the starting point, each rust package is a hello world, so I still do not see a problem with enabling LTO by default for new projects via extra info in Cargo.toml. When project status will be changed, users might start to think about disabling LTO and they will see an effect of enabling/disabling LTO.

@horacehoff,

Modifying the release profile, as pointed out by multiple people, would require extensive data from the ecosystem, and would also impact existing projects who already face large compile times with the release profile

How? Only new packages will be affected. Changing of default behaviour of Cargo for new project as it was proposed in #14738 does not affect existing.

I suppose that we started to misunderstand each other at some point. My suggestion is not to enable LTO by default for release profile in Cargo for any project, both existing and new. My suggestion is to create Cargo.toml with enabled LTO by default only for new projects. That is the point.

@Kobzol
Copy link
Contributor

Kobzol commented Oct 29, 2024

Yep, there is a lot of reason why C/C++ applications prefer not to be compiled with LTO, sometimes even -O3 breaks the code. For example, I fixed today one bug which happens only with novel GCC's grimme-lab/xtb#1121. Fortunately, in Rust, we do not have such a large number of undefined behaviours that allows Rust users to get higher level of optimizations.

Ah, you just reminded me that we disable using LTO for rustc on Windows, because it was producing miscompilations. Also, LTO + PGO has been broken (rust-lang/rust#115344) for several years. So sadly, yes, we do in fact have various LTO bugs. I'm not personally comfortable with enabling it by default across the board at the moment.

I suppose that we started to misunderstand each other at some point. My suggestion is not to enable LTO by default for release profile in Cargo for any project, both existing and new. My suggestion is to create Cargo.toml with enabled LTO by default only for new projects. That is the point.

Oh, that's definitely not what I understood from your earlier messages :) In that case, please create a new issue, that's something different than what this issue is about.

@foxtran
Copy link

foxtran commented Oct 29, 2024

I will measure our massive code (1.5M LoC) with and without LTO tonight, it has funny results.

16 core machines build our app (mixed app with 1.5M LoC of heavy-mathematical Fortran code + small part in C++) with different setups with GCC 14.1 (the same behaviour is at least for GCC 10-14, but I do not test it this time).

# cores noLTO timings LTO timings LTO speedup
4 2h 3m 1h 20m ~ 1.5x
16 1h 54 m 46m ~ 2.5x

Without LTO, binary size is 422 Mb; with LTO, binary size is 341 Mb.

So, LTO may not only increase compilation time ;-)

@Kobzol

Ah, you just reminded me that we disable using LTO for rustc on Windows, because it was producing miscompilations.

Does this miscompilation happen in Rust source code or somewhere in LLVM?

Also, LTO + PGO has been broken (rust-lang/rust#115344) for several years.

That is sad.

So sadly, yes, we do in fact have various LTO bugs. I'm not personally comfortable with enabling it by default across the board at the moment.

The increasing popularity of LTO may attract attention on this optimization in LLVM community so less number of bugs will be with this technology.

In that case, please create a new issue, that's something different than what this issue is about.

Done! See #14741

@Kobzol
Copy link
Contributor

Kobzol commented Oct 29, 2024

Does this miscompilation happen in Rust source code or somewhere in LLVM?

No idea, it was discovered at the beginning of 2023, since then we don't use LTO for the compiler on Windows.

@horacehoff
Copy link
Author

I suppose that we started to misunderstand each other at some point

Yes I think so :)
As @Kobzol said, this issue is about modifying release/adding a new build profile with extra optimization flags. However, since I've created this issue, it has come to my realization that it would be much simpler and overall better to just create a new ultra optimized build profile.

So, LTO may not only increase compilation time ;-)

So sadly, yes, we do in fact have various LTO bugs

I'm pretty sure in most cases it increases compilation time, plus I don't think pushing an optimization which faces many bugs in a current build profile would make people happy.

@Kobzol Do you have any concerns about adding a new ultra-optimized build profile?

@Kobzol
Copy link
Contributor

Kobzol commented Oct 29, 2024

I personally don't have concerns about that, per-se (I'm also not on the Cargo team, though, just a random onlooker). I'm not sure if it's needed though, would it really help discoverability if there was a built-in profile with a special name (that we probably couldn't even ever change due to backwards compatibility)? People would still need to figure out that something like that exists. And as already mentioned, it would have trade-offs, there is no single profile that is best for all use-cases. In that case it seems to me that it might be better to just make peoplemore aware of the Performance book or https://github.com/Kobzol/cargo-wizard, e.g. by linking to them somewhere in the Cargo docs. Because there they can find the various trade-offs explained.

@foxtran
Copy link

foxtran commented Oct 29, 2024

Hoho! I suppose we can enable LTO and these optimizations for all packages that uses Rust 2024 edition since it is not stable yet! Easy solution! :) Since it is breaking change, we can do it :)

@horacehoff
Copy link
Author

I disagree. As I said before, yes no single profile is the best for all use-cases but there are some optimization flags that are almost guaranteed to improve runtime performance for most projects.

People would still need to figure out that something like that exists

For "new" people, I think they would figure out it exists the same way they would figure out a "release" or a "bench" profile exists (if shown in the same way, of course). Plus, a new built-in profile which guarantees better runtime performance for most projects would be easier for developers than having to install an extra subcommand, although I do agree that developers need to be made aware of the performance trade-offs/Performance Book.

@Kobzol
Copy link
Contributor

Kobzol commented Oct 29, 2024

Well, we are already frequently encountering the issue that people are not aware even of the release profile, it has become a running joke :) So even that is not very discoverable on its own.

But yes, I suppose that being a builtin profile would be slightly more discoverable than being a third-party solution.

@horacehoff
Copy link
Author

I'm curious what the cargo team thinks about adding a new built-in profile.

@epage
Copy link
Contributor

epage commented Oct 29, 2024

I'm curious what the cargo team thinks about adding a new built-in profile.

Speaking only for myself, I'm not a fan

  • Built-in profiles have a property that technically makes them a breaking change to add
    • I have wondered about moving these into a reserved cargo:: namespace and making the existing profile names just defaults that inherit those
  • We already have enough of a problem of people discovering --release and adding a new profile seems even less likely for people to know about
  • If it doesn't have special integration with Cargo in some way, I question whether i should be in Cargo.

@epage
Copy link
Contributor

epage commented Oct 29, 2024

@Kobzol

Sure, but that's a separate discussion that should be led on a separate issue. FWIW, "would give developers full control" already happens today, people can just create their own profile, and even use tools such as https://github.com/Kobzol/cargo-wizard to prepare it for them.

As I mentioned in #14741, I tend to find it best to have Issues focus on the underlying need, rather than being overly fixated on one specific way of solving the problem. This makes it easier to fully weigh out the solutions rather than splitting the conversation up between several issues, making it harder to track, weigh against each other, and coordinate across interested parties. If we come to the point that we have settled on a solution but it isn't sufficient for some reason, we can then split the issues up then.

@epage
Copy link
Contributor

epage commented Oct 29, 2024

Hoho! I suppose we can enable LTO and these optimizations for all packages that uses Rust 2024 edition since it is not stable yet! Easy solution! :) Since it is breaking change, we can do it :)

I believe we are past the deadline for locking in everything that will be in the 2024 edition. The RFC cut off for it was a while ago.

@foxtran
Copy link

foxtran commented Oct 29, 2024

@epage,

Speaking only for myself, I'm not a fan
Built-in profiles have a property that technically makes them a breaking change to add
I have wondered about moving these into a reserved cargo:: namespace and making the existing profile names just defaults that inherit those
We already have enough of a problem of people discovering --release and adding a new profile seems even less likely for people to know about
If it doesn't have special integration with Cargo in some way, I question whether i should be in Cargo.

Totally agreed with these arguments against new profiles.

As I mentioned in #14741, I tend to find it best to have Issues focus on the underlying need, rather than being overly fixated on one specific way of solving the problem.

Ok. Let's continue discussion here.

Hoho! I suppose we can enable LTO and these optimizations for all packages that uses Rust 2024 edition since it is not stable yet! Easy solution! :) Since it is breaking change, we can do it :)

I believe we are past the deadline for locking in everything that will be in the 2024 edition. The RFC cut off for it was a while ago.

That is a pity. But we still are able to start to generate a bit more optimized profiles at cargo init stage for new projects and during the next two years analyse how actively LTO is disabling in those projects and also try to resolve all raised issues to make LTO default in release and bench profiles for the next Rust edition where again some breaking changes will be.

@horacehoff
Copy link
Author

horacehoff commented Oct 29, 2024

Speaking only for myself, I'm not a fan

I understand. If it's a definitive no, how about modifying release to further optimize it without significantly increasing compilation time, as I initially proposed in this issue ?
For example, thin lto offers a non-negligible runtime performance increase, while minimally impacting compilation time, although if I understood correctly there are some bugs. The same could also be said for opt-level=3 on all packages, even if I agree that this point is not valid for projects which have many dependencies.

@kornelski
Copy link
Contributor

Perhaps an equivalent of this:

[profile.release.package."*"]
opt-level = 3

could be set by the packages themselves? There are certain crates, like compressors and encoders, that know they're slow and need to be optimized to be usable, so they could have something in their manifest to tell Cargo to optimize them harder. This would be even more useful in debug builds.

@epage
Copy link
Contributor

epage commented Oct 30, 2024

Perhaps an equivalent of this: could be set by the packages themselves?

Not seeing it in the meeting notes to remember the context but I think the idea of a package providing default profile settings came up in yesterday's Cargo team meeting. It also came up previously when discussing mitigations for the downsides of mir-only rlibs.

@epage epage added the S-triage Status: This issue is waiting on initial triage. label Nov 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-profiles Area: profiles C-feature-request Category: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted` S-triage Status: This issue is waiting on initial triage.
Projects
None yet
Development

No branches or pull requests

10 participants