Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

./configure --enable-optimizations should enable LTO #89536

Closed
vstinner opened this issue Oct 5, 2021 · 36 comments
Closed

./configure --enable-optimizations should enable LTO #89536

vstinner opened this issue Oct 5, 2021 · 36 comments
Labels
3.11 only security fixes build The build process and cross-build

Comments

@vstinner
Copy link
Member

vstinner commented Oct 5, 2021

BPO 45373
Nosy @gpshead, @vstinner, @ned-deily, @methane, @corona10, @pablogsal, @erlend-aasland

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2021-10-05.09:32:25.894>
labels = ['build', '3.11']
title = './configure --enable-optimizations should enable LTO'
updated_at = <Date 2022-03-03.09:51:27.150>
user = 'https://github.com/vstinner'

bugs.python.org fields:

activity = <Date 2022-03-03.09:51:27.150>
actor = 'erlendaasland'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Build']
creation = <Date 2021-10-05.09:32:25.894>
creator = 'vstinner'
dependencies = []
files = []
hgrepos = []
issue_num = 45373
keywords = []
message_count = 9.0
messages = ['403209', '403210', '403211', '403244', '403248', '403249', '414312', '414313', '414316']
nosy_count = 7.0
nosy_names = ['gregory.p.smith', 'vstinner', 'ned.deily', 'methane', 'corona10', 'pablogsal', 'erlendaasland']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = None
url = 'https://bugs.python.org/issue45373'
versions = ['Python 3.11']

@vstinner
Copy link
Member Author

vstinner commented Oct 5, 2021

When Python is configured with:

./configure --enable-optimizations

PGO is enabled but not LTO.

I recall that a few years ago, GCC with LTO had bugs. But now, GCC with LTO is reliable. I suggest to enable it by default in Python 3.11.

Or did I miss a reason to not do that?

@vstinner vstinner added 3.11 only security fixes build The build process and cross-build labels Oct 5, 2021
@pablogsal
Copy link
Member

IIRC activating lto is specially annoying on MacOS due to the need of some llvm components that are a bit hard to get. Let me dig a bit to see if I reproduce the problem

@vstinner
Copy link
Member Author

vstinner commented Oct 5, 2021

Pablo:

IIRC activating lto is specially annoying on MacOS due to the need of some llvm components that are a bit hard to get. Let me dig a bit to see if I reproduce the problem

Ah, I guess that you are referring to this requirement:
"The C compiler Clang requires llvm-ar for LTO (ar on macOS), as well as an LTO-aware linker (ld.gold or lld)."
https://docs.python.org/dev/using/configure.html#cmdoption-with-lto

Maybe configure can enable LTO on all platforms but macOS.

@ned-deily
Copy link
Member

IIRC activating lto is specially annoying on MacOS due to the need of some llvm components that are a bit hard to get

Can you say more? We are currently using --with-lto with a vanilla Apple Command Line Tools (or Xcode) for macOS installer builds when building on macOS 10.15 High Sierra or higher. Perhaps this was just an issue on older versions.

@pablogsal
Copy link
Member

Yeah, I had problems in the past to get llvm-ar or some other component. I still need time to reproduce and to see if this still happens on new versions.

@ned-deily
Copy link
Member

I don't think you need llvm-ar anymore with the Apple Tool Chain but let me look into it as I have all the relevant previous macOS releases as VMs to test with.

@gpshead
Copy link
Member

gpshead commented Mar 1, 2022

FWIW I agree that we should try adding LTO to --enable-optimizations now.

@ned-deily
Copy link
Member

Sorry, this slipped off my radar and I haven't gone back and checked older versions of macOS. But it certainly is true that at least with the current versions of the Apple Developer Tools (either the Command Line Tools or Xcode) for macOS 11 (Big Sur) and macOS 12 (Monterey), things just work.

@methane
Copy link
Member

methane commented Mar 1, 2022

Can we use --lto=thin when availabe?
And can we not use --lto when building profiling python?

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
@nascheme
Copy link
Member

nascheme commented Apr 11, 2022

One data point: when I enabled --with-lto, the cpython build task failed on my container build host due to an out of memory condition. It seems that gcc uses a lot more RAM in that case (host has 2 GB, I think). I fixed it by adding a swap file but I suspect if we make it to default on, more people will run into the issue.

@corona10
Copy link
Member

corona10 commented Apr 13, 2022

@methane san cc @ned-deily

I would like to switch the default LTO policy to the thinLTO if available from Python3.12.
The thinLTO option is only introduced from Python3.11 so let's leave this option as an experimental choice for Python 3.11 :)

AS-IS

./configure --with-lto  # fullLTO
./configure --with-lto=full  # fullLTO
./configure --with-lto=thin  # thinLTO

##TO-BE

./configure --with-lto # thinLTO if possible if not fullLTO
./configure --with-lto=full  # fullLTO
./configure --with-lto=thin  # thinLTO

@methane
Copy link
Member

methane commented Apr 13, 2022

I like faster LTO by default. Because default LTO is slow.

Random comments:

I don't want to use LTO while profiling build (1st build in PGO).

What about GCC? GCC supports -flto=auto option since GCC 10.
https://www.phoronix.com/scan.php?page=news_item&px=GCC-flto-auto-option

@ned-deily
Copy link
Member

I don't have an opinion yet. I plan to compare both options on macOS.

@corona10
Copy link
Member

I don't want to use LTO while profiling build (1st build in PGO).

+1

What about GCC? GCC supports -flto=auto option since GCC 10

We may need to add the option for ./configure --with-lto=auto and I think that we have a chance to add it to Python 3.11
Also in that case, from Python 3.12 ./configure --with-lto should use auto option if available, if not should use the default LTO policy for GCC.

@vstinner
Copy link
Member Author

So. Can we enable LTO in Python 3.12? Is there a way to opt-out from LTO when --enable-optimizations is used? For example, does ./configure --enable-optimizations --without-lto work as expected?

@vstinner
Copy link
Member Author

Is it possible to disable LTO if it's known to not work properly / if the toolchain is too old? I'm thinking at Ned's toolchain to build Python on macOS for x86 CPU.

@vstinner
Copy link
Member Author

What's the status of LTO in 2023? Can it be now enabled by default in Python?

cc @corona10

@corona10
Copy link
Member

corona10 commented May 31, 2023

@ned-deily
Do you have any concerns with unifying into the single optimization from macOS side?

@vstinner

What's the status of LTO in 2023? Can it be now enabled by default in Python?

One problem is the GCC side, The FullLTO is too slow, we should pass the -flto=$(number of cores) or -flto=auto
I am not sure that it will be an acceptable situation even for the RedHat.
WDYT?

@vstinner
Copy link
Member Author

The FullLTO is too slow

Is the build performance really a matter when the user explicitly asks for enable optimizations? I don't think so.

@vstinner
Copy link
Member Author

If an user really cares about performance and prefers to disable LTO, would it work to specify ./configure --enable-optimizations --disable-lto?

@corona10
Copy link
Member

Is the build performance really a matter when the user explicitly asks for enable optimizations? I don't think so.

Yeah, if the customer is okay to use, I am fine.
But I am not familiar with customers of Linux.
So I asked you :)

@vstinner
Copy link
Member Author

But I am not familiar with customers of Linux.

Most popular Linux distributions provide binary packages, like Ubuntu or Fedora: users don't build Python themselves (hopefully!), packages are built once on fast build servers. Source based distros linux ArchLinux are less popular.

macOS users have a .dmg binary installer, even if HomeBrew seems to be popular.

Windows users have .msi binary installer.

FreeBSD is source based.

For source based distributions like ArchLinux and FreeBSD, Python packages can be tuned to disable LTO if they prefer to save some minutes during build but waste CPU time later.

@nascheme
Copy link
Member

I'm a builder of Python that would be affected by turning on LTO by default. I would likely turn it off but having it on by default and turning off by command line option is okay with me. Last time I tried it, I decided it takes too long and too much RAM for the build for too little benefit. However, my guess is for most people using --enable-optimizations, LTO on by default would be better. Turning it on would likely help it mature more as well (e.g. improve compilers).

@vstinner
Copy link
Member Author

I'm a builder of Python that would be affected by turning on LTO by default

Would you mind to elaborate?

@nascheme
Copy link
Member

We deploy our software using Linux containers. I like to use the most up-to-date Python release and so I build my own Python from source when building the container images. The VM host that builds container images is somewhat limited on RAM and CPU and so enabling LTO stresses it quite a bit. We could just use a higher spec container build host. However, last I checked, the gain from using LTO is fairly small so I didn't think it worth it. Again, I think LTO on by default, with --enable-optimizations, would be an okay default.

@vstinner
Copy link
Member Author

vstinner commented Jun 1, 2023

One problem is the GCC side, The FullLTO is too slow, we should pass the -flto=$(number of cores) or -flto=auto
I am not sure that it will be an acceptable situation even for the RedHat.

That sounds like a nice enhancement. Is it required to enable LTO by default when optimizations are enabled?

About RHEL, do you you know which GCC version added the -flto=auto option? Maybe configure can check if it's supported or not.

@vstinner
Copy link
Member Author

I don't see any real blocker issue. Would you be ok to enable LTO optimization with ./configure --enable-optimizations in Python 3.13?

@gpshead
Copy link
Member

gpshead commented Sep 13, 2023

It looks like --enable-optimizations --with-lto=full it increases build time from 3x-10x vs --enable-optimizations --with-lto=no using gcc due to a couple of long linking steps. When using clang, --with-lto=thin mode is available, that cuts the LTO build time vs =full in half for me.

The overall runtime performance gain looks to be about 3% using gcc --with-lto=full.

One thing I noticed: It is doing the profile guided build with LTO enabled as well, so the double compilation includes the super long final LTO link steps in both. I'm not sure an LTO enabled build is necessary to generate profile data for PGO. If the LTO flags could be skipped for that first PGO build it'd significantly reduce build time.

I'm in favor of --enable-optimizations including LTO by default in 3.13. I'd default to --with-lto=thin when the compiler is clang. If this winds up causes problems we can undo it. It's just a configure flag.

@vstinner
Copy link
Member Author

It looks like --enable-optimizations --with-lto=full it increases build time from 3x-10x vs --enable-optimizations --with-lto=no using gcc due to a couple of long linking steps.

Yes, --enable-optimizations makes Python build faster. For the extreme opposite, I always use CFLAGS="-O0" for the development. But do you consider that "longer build time" is a blocker issue?

As I wrote, it should be possible to disable explicitly LTO using --enable-optimizations --without--lto, for people who have good reasons to limit the build time.

I'd default to --with-lto=thin when the compiler is clang.

I'm not comfortable to enable LTO or not depending on the C compiler :-( If you consider that "longer build time" is a blocker issue, I would prefer to stick to the status quo, don't enable LTO with --enable-optimizations and just close this old issue.

I don't think that good defaults exist, only good documentation exist :-) Currently, Performance Options to tune Python build are documented at: https://docs.python.org/dev/using/configure.html#performance-options

Configuring Python using --enable-optimizations --with-lto (PGO + LTO) is recommended for best performance.

@gpshead
Copy link
Member

gpshead commented Sep 13, 2023

Read my last paragraph.

I'm in favor of --enable-optimizations including LTO by default in 3.13. ... If this winds up causing problems we can undo it. It's just a configure flag.

@vstinner
Copy link
Member Author

Oh, it seems like i misunderstood your comment. You are ok to switch it on by default for all compilers. But use thinLTO when available.

@gpshead
Copy link
Member

gpshead commented Sep 13, 2023

Yep, and if logic to "use thin" is painful, keep it simple and just always use full. (I'm running some benchmarks to compare full vs thin build performance as I haven't looked at that in ages)

Everyone building for big distros likely rolls their own setups and flags anyways based on their own reasoning and history of having done that long before we added the collective flag. --enable-optimizations is more meant for "everybody else".

@gpshead
Copy link
Member

gpshead commented Sep 13, 2023

My benchmark results came back. lto=full (+3%) performs significantly better than lto=thin (1%) on clang pyperformance so keeping it simple and always using full makes sense. (thin does build a lot faster though, but that isn't the point of this configure flag)

@vstinner
Copy link
Member Author

vstinner commented Sep 14, 2023

@corona10: By the way, GCC has -fwhopr which seems unrelated to Burger King whoppers, but "like clang thinLTO".

https://gcc.gnu.org/wiki/LinkTimeOptimization says:

-fwhopr: This is similar to -flto but it splits compilation to achieve scalability. It is intended to handle extremely large programs whose call graphs do not fit in memory. See the design document for details.

Design document: WHOPR - Fast and Scalable Whole Program Optimizations in GCC (2007).

I never heard of it before now!

@vstinner
Copy link
Member Author

vstinner commented Sep 14, 2023

Description of the feature in 2020: https://stackoverflow.com/questions/64954525/does-gcc-have-thin-lto

GCC has an equivalent to Thin LTO: WHOle Program optimizeR (WHOPR)

WHOPR is an extension of the LTO feature of GCC. You can enable it with -fwhopr (added to the standard LTO options).

  • The standard LTO is fully monolithic (like standard LTO in clang)
  • The WHOPR is a two-stage LTO (like clang Thin LTO)

The two stages are

  • WPA: The serial part that does some global optimizations and partitions the IR
  • LTRANS: Parallel backends to do the optimizations in each partition

Now, in practice, GCC WHOPR needs significantly more memory and time than Clang Thin LTO, but the numbers have been improving recently.

@vstinner
Copy link
Member Author

Apparently, the status quo is to have a separated option to opt-in for LTO. It's already recommended in https://docs.python.org/dev/using/configure.html#performance-options documentation. So I close the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.11 only security fixes build The build process and cross-build
Projects
None yet
Development

No branches or pull requests

7 participants