./configure --enable-optimizations should enable LTO #89536

vstinner · 2021-10-05T09:32:26Z

BPO	45373
Nosy	@gpshead, @vstinner, @ned-deily, @methane, @corona10, @pablogsal, @erlend-aasland

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2021-10-05.09:32:25.894>
labels = ['build', '3.11']
title = './configure --enable-optimizations should enable LTO'
updated_at = <Date 2022-03-03.09:51:27.150>
user = 'https://github.com/vstinner'

bugs.python.org fields:

activity = <Date 2022-03-03.09:51:27.150>
actor = 'erlendaasland'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Build']
creation = <Date 2021-10-05.09:32:25.894>
creator = 'vstinner'
dependencies = []
files = []
hgrepos = []
issue_num = 45373
keywords = []
message_count = 9.0
messages = ['403209', '403210', '403211', '403244', '403248', '403249', '414312', '414313', '414316']
nosy_count = 7.0
nosy_names = ['gregory.p.smith', 'vstinner', 'ned.deily', 'methane', 'corona10', 'pablogsal', 'erlendaasland']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = None
url = 'https://bugs.python.org/issue45373'
versions = ['Python 3.11']

vstinner · 2021-10-05T09:32:26Z

When Python is configured with:

./configure --enable-optimizations

PGO is enabled but not LTO.

I recall that a few years ago, GCC with LTO had bugs. But now, GCC with LTO is reliable. I suggest to enable it by default in Python 3.11.

Or did I miss a reason to not do that?

pablogsal · 2021-10-05T09:34:55Z

IIRC activating lto is specially annoying on MacOS due to the need of some llvm components that are a bit hard to get. Let me dig a bit to see if I reproduce the problem

vstinner · 2021-10-05T09:38:33Z

Pablo:

IIRC activating lto is specially annoying on MacOS due to the need of some llvm components that are a bit hard to get. Let me dig a bit to see if I reproduce the problem

Ah, I guess that you are referring to this requirement:
"The C compiler Clang requires llvm-ar for LTO (ar on macOS), as well as an LTO-aware linker (ld.gold or lld)."
https://docs.python.org/dev/using/configure.html#cmdoption-with-lto

Maybe configure can enable LTO on all platforms but macOS.

ned-deily · 2021-10-05T15:49:28Z

IIRC activating lto is specially annoying on MacOS due to the need of some llvm components that are a bit hard to get

Can you say more? We are currently using --with-lto with a vanilla Apple Command Line Tools (or Xcode) for macOS installer builds when building on macOS 10.15 High Sierra or higher. Perhaps this was just an issue on older versions.

pablogsal · 2021-10-05T16:12:12Z

Yeah, I had problems in the past to get llvm-ar or some other component. I still need time to reproduce and to see if this still happens on new versions.

ned-deily · 2021-10-05T16:15:01Z

I don't think you need llvm-ar anymore with the Apple Tool Chain but let me look into it as I have all the relevant previous macOS releases as VMs to test with.

gpshead · 2022-03-01T22:29:58Z

FWIW I agree that we should try adding LTO to --enable-optimizations now.

ned-deily · 2022-03-01T22:34:08Z

Sorry, this slipped off my radar and I haven't gone back and checked older versions of macOS. But it certainly is true that at least with the current versions of the Apple Developer Tools (either the Command Line Tools or Xcode) for macOS 11 (Big Sur) and macOS 12 (Monterey), things just work.

methane · 2022-03-01T22:56:10Z

Can we use --lto=thin when availabe?
And can we not use --lto when building profiling python?

nascheme · 2022-04-11T19:32:51Z

One data point: when I enabled --with-lto, the cpython build task failed on my container build host due to an out of memory condition. It seems that gcc uses a lot more RAM in that case (host has 2 GB, I think). I fixed it by adding a swap file but I suspect if we make it to default on, more people will run into the issue.

corona10 · 2022-04-13T01:51:16Z

@methane san cc @ned-deily

I would like to switch the default LTO policy to the thinLTO if available from Python3.12.
The thinLTO option is only introduced from Python3.11 so let's leave this option as an experimental choice for Python 3.11 :)

AS-IS

./configure --with-lto  # fullLTO
./configure --with-lto=full  # fullLTO
./configure --with-lto=thin  # thinLTO

##TO-BE

./configure --with-lto # thinLTO if possible if not fullLTO
./configure --with-lto=full  # fullLTO
./configure --with-lto=thin  # thinLTO

methane · 2022-04-13T01:59:46Z

I like faster LTO by default. Because default LTO is slow.

Random comments:

I don't want to use LTO while profiling build (1st build in PGO).

What about GCC? GCC supports -flto=auto option since GCC 10.
https://www.phoronix.com/scan.php?page=news_item&px=GCC-flto-auto-option

ned-deily · 2022-04-13T02:04:57Z

I don't have an opinion yet. I plan to compare both options on macOS.

corona10 · 2022-04-13T02:07:22Z

I don't want to use LTO while profiling build (1st build in PGO).

+1

What about GCC? GCC supports -flto=auto option since GCC 10

We may need to add the option for ./configure --with-lto=auto and I think that we have a chance to add it to Python 3.11
Also in that case, from Python 3.12 ./configure --with-lto should use auto option if available, if not should use the default LTO policy for GCC.

vstinner · 2022-09-22T15:04:20Z

So. Can we enable LTO in Python 3.12? Is there a way to opt-out from LTO when --enable-optimizations is used? For example, does ./configure --enable-optimizations --without-lto work as expected?

vstinner · 2022-09-22T15:07:33Z

Is it possible to disable LTO if it's known to not work properly / if the toolchain is too old? I'm thinking at Ned's toolchain to build Python on macOS for x86 CPU.

vstinner · 2023-05-31T12:28:53Z

What's the status of LTO in 2023? Can it be now enabled by default in Python?

cc @corona10

corona10 · 2023-05-31T16:26:35Z

@ned-deily
Do you have any concerns with unifying into the single optimization from macOS side?

@vstinner

What's the status of LTO in 2023? Can it be now enabled by default in Python?

One problem is the GCC side, The FullLTO is too slow, we should pass the -flto=$(number of cores) or -flto=auto
I am not sure that it will be an acceptable situation even for the RedHat.
WDYT?

vstinner · 2023-05-31T16:39:41Z

The FullLTO is too slow

Is the build performance really a matter when the user explicitly asks for enable optimizations? I don't think so.

vstinner · 2023-05-31T16:42:11Z

If an user really cares about performance and prefers to disable LTO, would it work to specify ./configure --enable-optimizations --disable-lto?

corona10 · 2023-05-31T16:54:59Z

Is the build performance really a matter when the user explicitly asks for enable optimizations? I don't think so.

Yeah, if the customer is okay to use, I am fine.
But I am not familiar with customers of Linux.
So I asked you :)

vstinner · 2023-05-31T17:12:13Z

But I am not familiar with customers of Linux.

Most popular Linux distributions provide binary packages, like Ubuntu or Fedora: users don't build Python themselves (hopefully!), packages are built once on fast build servers. Source based distros linux ArchLinux are less popular.

macOS users have a .dmg binary installer, even if HomeBrew seems to be popular.

Windows users have .msi binary installer.

FreeBSD is source based.

For source based distributions like ArchLinux and FreeBSD, Python packages can be tuned to disable LTO if they prefer to save some minutes during build but waste CPU time later.

nascheme · 2023-05-31T17:17:27Z

I'm a builder of Python that would be affected by turning on LTO by default. I would likely turn it off but having it on by default and turning off by command line option is okay with me. Last time I tried it, I decided it takes too long and too much RAM for the build for too little benefit. However, my guess is for most people using --enable-optimizations, LTO on by default would be better. Turning it on would likely help it mature more as well (e.g. improve compilers).

vstinner · 2023-05-31T17:22:26Z

I'm a builder of Python that would be affected by turning on LTO by default

Would you mind to elaborate?

nascheme · 2023-05-31T17:59:22Z

We deploy our software using Linux containers. I like to use the most up-to-date Python release and so I build my own Python from source when building the container images. The VM host that builds container images is somewhat limited on RAM and CPU and so enabling LTO stresses it quite a bit. We could just use a higher spec container build host. However, last I checked, the gain from using LTO is fairly small so I didn't think it worth it. Again, I think LTO on by default, with --enable-optimizations, would be an okay default.

vstinner · 2023-06-01T11:05:26Z

One problem is the GCC side, The FullLTO is too slow, we should pass the -flto=$(number of cores) or -flto=auto
I am not sure that it will be an acceptable situation even for the RedHat.

That sounds like a nice enhancement. Is it required to enable LTO by default when optimizations are enabled?

About RHEL, do you you know which GCC version added the -flto=auto option? Maybe configure can check if it's supported or not.

vstinner · 2023-09-13T04:23:59Z

I don't see any real blocker issue. Would you be ok to enable LTO optimization with ./configure --enable-optimizations in Python 3.13?

gpshead · 2023-09-13T15:14:27Z

It looks like --enable-optimizations --with-lto=full it increases build time from 3x-10x vs --enable-optimizations --with-lto=no using gcc due to a couple of long linking steps. When using clang, --with-lto=thin mode is available, that cuts the LTO build time vs =full in half for me.

The overall runtime performance gain looks to be about 3% using gcc --with-lto=full.

One thing I noticed: It is doing the profile guided build with LTO enabled as well, so the double compilation includes the super long final LTO link steps in both. I'm not sure an LTO enabled build is necessary to generate profile data for PGO. If the LTO flags could be skipped for that first PGO build it'd significantly reduce build time.

I'm in favor of --enable-optimizations including LTO by default in 3.13. I'd default to --with-lto=thin when the compiler is clang. If this winds up causes problems we can undo it. It's just a configure flag.

vstinner · 2023-09-13T16:48:50Z

It looks like --enable-optimizations --with-lto=full it increases build time from 3x-10x vs --enable-optimizations --with-lto=no using gcc due to a couple of long linking steps.

Yes, --enable-optimizations makes Python build faster. For the extreme opposite, I always use CFLAGS="-O0" for the development. But do you consider that "longer build time" is a blocker issue?

As I wrote, it should be possible to disable explicitly LTO using --enable-optimizations --without--lto, for people who have good reasons to limit the build time.

I'd default to --with-lto=thin when the compiler is clang.

I'm not comfortable to enable LTO or not depending on the C compiler :-( If you consider that "longer build time" is a blocker issue, I would prefer to stick to the status quo, don't enable LTO with --enable-optimizations and just close this old issue.

I don't think that good defaults exist, only good documentation exist :-) Currently, Performance Options to tune Python build are documented at: https://docs.python.org/dev/using/configure.html#performance-options

Configuring Python using --enable-optimizations --with-lto (PGO + LTO) is recommended for best performance.

gpshead · 2023-09-13T17:20:11Z

Read my last paragraph.

I'm in favor of --enable-optimizations including LTO by default in 3.13. ... If this winds up causing problems we can undo it. It's just a configure flag.

vstinner · 2023-09-13T17:32:06Z

Oh, it seems like i misunderstood your comment. You are ok to switch it on by default for all compilers. But use thinLTO when available.

gpshead · 2023-09-13T17:56:04Z

Yep, and if logic to "use thin" is painful, keep it simple and just always use full. (I'm running some benchmarks to compare full vs thin build performance as I haven't looked at that in ages)

Everyone building for big distros likely rolls their own setups and flags anyways based on their own reasoning and history of having done that long before we added the collective flag. --enable-optimizations is more meant for "everybody else".

gpshead · 2023-09-13T19:57:16Z

My benchmark results came back. lto=full (+3%) performs significantly better than lto=thin (1%) on clang pyperformance so keeping it simple and always using full makes sense. (thin does build a lot faster though, but that isn't the point of this configure flag)

vstinner · 2023-09-14T12:37:23Z

@corona10: By the way, GCC has -fwhopr which seems unrelated to Burger King whoppers, but "like clang thinLTO".

https://gcc.gnu.org/wiki/LinkTimeOptimization says:

-fwhopr: This is similar to -flto but it splits compilation to achieve scalability. It is intended to handle extremely large programs whose call graphs do not fit in memory. See the design document for details.

Design document: WHOPR - Fast and Scalable Whole Program Optimizations in GCC (2007).

I never heard of it before now!

vstinner · 2023-09-14T12:39:11Z

Description of the feature in 2020: https://stackoverflow.com/questions/64954525/does-gcc-have-thin-lto

GCC has an equivalent to Thin LTO: WHOle Program optimizeR (WHOPR)

WHOPR is an extension of the LTO feature of GCC. You can enable it with -fwhopr (added to the standard LTO options).

The standard LTO is fully monolithic (like standard LTO in clang)

The WHOPR is a two-stage LTO (like clang Thin LTO)

The two stages are

WPA: The serial part that does some global optimizations and partitions the IR

LTRANS: Parallel backends to do the optimizations in each partition

Now, in practice, GCC WHOPR needs significantly more memory and time than Clang Thin LTO, but the numbers have been improving recently.

vstinner · 2023-12-20T12:53:10Z

Apparently, the status quo is to have a separated option to opt-in for LTO. It's already recommended in https://docs.python.org/dev/using/configure.html#performance-options documentation. So I close the issue.

vstinner added 3.11 only security fixes build The build process and cross-build labels Oct 5, 2021

ezio-melotti transferred this issue from another repository Apr 10, 2022

corona10 added a commit to corona10/cpython that referenced this issue Sep 12, 2022

pythongh-89536: Use thinLTO policy if possible

029f8e6

bedevere-bot mentioned this issue Sep 12, 2022

gh-89536: Use ThinLTO policy if possible #96766

Merged

corona10 added a commit that referenced this issue Sep 16, 2022

gh-89536: Use ThinLTO policy if possible (gh-96766)

e47b96c

vstinner closed this as not planned Won't fix, can't repro, duplicate, stale Dec 20, 2023

ned-deily mentioned this issue Sep 3, 2024

Python3.13 performance Issue with python.org macOS installers on ARM Macs #122580

Closed

./configure --enable-optimizations should enable LTO #89536

./configure --enable-optimizations should enable LTO #89536

Comments

vstinner commented Oct 5, 2021

vstinner commented Oct 5, 2021

pablogsal commented Oct 5, 2021

vstinner commented Oct 5, 2021

ned-deily commented Oct 5, 2021

pablogsal commented Oct 5, 2021

ned-deily commented Oct 5, 2021

gpshead commented Mar 1, 2022

ned-deily commented Mar 1, 2022

methane commented Mar 1, 2022

nascheme commented Apr 11, 2022 • edited Loading

corona10 commented Apr 13, 2022 • edited Loading

AS-IS

methane commented Apr 13, 2022

ned-deily commented Apr 13, 2022

corona10 commented Apr 13, 2022

vstinner commented Sep 22, 2022

vstinner commented Sep 22, 2022

vstinner commented May 31, 2023

corona10 commented May 31, 2023 • edited Loading

vstinner commented May 31, 2023

vstinner commented May 31, 2023

corona10 commented May 31, 2023

vstinner commented May 31, 2023

nascheme commented May 31, 2023

vstinner commented May 31, 2023

nascheme commented May 31, 2023

vstinner commented Jun 1, 2023

vstinner commented Sep 13, 2023

gpshead commented Sep 13, 2023

vstinner commented Sep 13, 2023

gpshead commented Sep 13, 2023

vstinner commented Sep 13, 2023

gpshead commented Sep 13, 2023

gpshead commented Sep 13, 2023

vstinner commented Sep 14, 2023 • edited Loading

vstinner commented Sep 14, 2023 • edited Loading

vstinner commented Dec 20, 2023

nascheme commented Apr 11, 2022 •

edited

Loading

corona10 commented Apr 13, 2022 •

edited

Loading

corona10 commented May 31, 2023 •

edited

Loading

vstinner commented Sep 14, 2023 •

edited

Loading

vstinner commented Sep 14, 2023 •

edited

Loading