-
-
Notifications
You must be signed in to change notification settings - Fork 30.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
./configure --enable-optimizations should enable LTO #89536
Comments
When Python is configured with: ./configure --enable-optimizations PGO is enabled but not LTO. I recall that a few years ago, GCC with LTO had bugs. But now, GCC with LTO is reliable. I suggest to enable it by default in Python 3.11. Or did I miss a reason to not do that? |
IIRC activating lto is specially annoying on MacOS due to the need of some llvm components that are a bit hard to get. Let me dig a bit to see if I reproduce the problem |
Pablo:
Ah, I guess that you are referring to this requirement: Maybe configure can enable LTO on all platforms but macOS. |
Can you say more? We are currently using --with-lto with a vanilla Apple Command Line Tools (or Xcode) for macOS installer builds when building on macOS 10.15 High Sierra or higher. Perhaps this was just an issue on older versions. |
Yeah, I had problems in the past to get llvm-ar or some other component. I still need time to reproduce and to see if this still happens on new versions. |
I don't think you need llvm-ar anymore with the Apple Tool Chain but let me look into it as I have all the relevant previous macOS releases as VMs to test with. |
FWIW I agree that we should try adding LTO to --enable-optimizations now. |
Sorry, this slipped off my radar and I haven't gone back and checked older versions of macOS. But it certainly is true that at least with the current versions of the Apple Developer Tools (either the Command Line Tools or Xcode) for macOS 11 (Big Sur) and macOS 12 (Monterey), things just work. |
Can we use --lto=thin when availabe? |
One data point: when I enabled |
@methane san cc @ned-deily I would like to switch the default LTO policy to the thinLTO if available from Python3.12. AS-IS
##TO-BE
|
I like faster LTO by default. Because default LTO is slow. Random comments: I don't want to use LTO while profiling build (1st build in PGO). What about GCC? GCC supports |
I don't have an opinion yet. I plan to compare both options on macOS. |
+1
We may need to add the option for |
So. Can we enable LTO in Python 3.12? Is there a way to opt-out from LTO when |
Is it possible to disable LTO if it's known to not work properly / if the toolchain is too old? I'm thinking at Ned's toolchain to build Python on macOS for x86 CPU. |
What's the status of LTO in 2023? Can it be now enabled by default in Python? cc @corona10 |
@ned-deily
One problem is the GCC side, The FullLTO is too slow, we should pass the -flto=$(number of cores) or -flto=auto |
Is the build performance really a matter when the user explicitly asks for enable optimizations? I don't think so. |
If an user really cares about performance and prefers to disable LTO, would it work to specify |
Yeah, if the customer is okay to use, I am fine. |
Most popular Linux distributions provide binary packages, like Ubuntu or Fedora: users don't build Python themselves (hopefully!), packages are built once on fast build servers. Source based distros linux ArchLinux are less popular. macOS users have a .dmg binary installer, even if HomeBrew seems to be popular. Windows users have .msi binary installer. FreeBSD is source based. For source based distributions like ArchLinux and FreeBSD, Python packages can be tuned to disable LTO if they prefer to save some minutes during build but waste CPU time later. |
I'm a builder of Python that would be affected by turning on LTO by default. I would likely turn it off but having it on by default and turning off by command line option is okay with me. Last time I tried it, I decided it takes too long and too much RAM for the build for too little benefit. However, my guess is for most people using |
Would you mind to elaborate? |
We deploy our software using Linux containers. I like to use the most up-to-date Python release and so I build my own Python from source when building the container images. The VM host that builds container images is somewhat limited on RAM and CPU and so enabling LTO stresses it quite a bit. We could just use a higher spec container build host. However, last I checked, the gain from using LTO is fairly small so I didn't think it worth it. Again, I think LTO on by default, with |
That sounds like a nice enhancement. Is it required to enable LTO by default when optimizations are enabled? About RHEL, do you you know which GCC version added the |
I don't see any real blocker issue. Would you be ok to enable LTO optimization with |
It looks like The overall runtime performance gain looks to be about 3% using gcc One thing I noticed: It is doing the profile guided build with LTO enabled as well, so the double compilation includes the super long final LTO link steps in both. I'm not sure an LTO enabled build is necessary to generate profile data for PGO. If the LTO flags could be skipped for that first PGO build it'd significantly reduce build time. I'm in favor of |
Yes, As I wrote, it should be possible to disable explicitly LTO using
I'm not comfortable to enable LTO or not depending on the C compiler :-( If you consider that "longer build time" is a blocker issue, I would prefer to stick to the status quo, don't enable LTO with I don't think that good defaults exist, only good documentation exist :-) Currently, Performance Options to tune Python build are documented at: https://docs.python.org/dev/using/configure.html#performance-options
|
Read my last paragraph.
|
Oh, it seems like i misunderstood your comment. You are ok to switch it on by default for all compilers. But use thinLTO when available. |
Yep, and if logic to "use thin" is painful, keep it simple and just always use full. (I'm running some benchmarks to compare full vs thin build performance as I haven't looked at that in ages) Everyone building for big distros likely rolls their own setups and flags anyways based on their own reasoning and history of having done that long before we added the collective flag. |
My benchmark results came back. lto=full (+3%) performs significantly better than lto=thin (1%) on clang pyperformance so keeping it simple and always using full makes sense. (thin does build a lot faster though, but that isn't the point of this configure flag) |
@corona10: By the way, GCC has https://gcc.gnu.org/wiki/LinkTimeOptimization says:
Design document: WHOPR - Fast and Scalable Whole Program Optimizations in GCC (2007). I never heard of it before now! |
Description of the feature in 2020: https://stackoverflow.com/questions/64954525/does-gcc-have-thin-lto
|
Apparently, the status quo is to have a separated option to opt-in for LTO. It's already recommended in https://docs.python.org/dev/using/configure.html#performance-options documentation. So I close the issue. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: