Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combining LTO + PGO + lib/cdylib crashes on LLVM assertion #117220

Open
Kobzol opened this issue Oct 26, 2023 · 15 comments · May be fixed by #133250
Open

Combining LTO + PGO + lib/cdylib crashes on LLVM assertion #117220

Kobzol opened this issue Oct 26, 2023 · 15 comments · May be fixed by #133250
Assignees
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-LTO Area: Link-time optimization (LTO) C-bug Category: This is a bug. I-crash Issue: The compiler crashes (SIGSEGV, SIGABRT, etc). Use I-ICE instead when the compiler panics. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@Kobzol
Copy link
Contributor

Kobzol commented Oct 26, 2023

When compiling a cargo project with:

  • LTO enabled (LTO=thin/fat)
  • PGO (-Cprofile-use)
  • Both lib and dylib targets for its library

Reproducer

// src/main.rs
use clap::Command;

fn main() {
    Command::new("a").about("a");
}
# Cargo.toml
[package]
name = "hello"
version = "0.1.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
clap = { version = "4.3.24" , default-features = false, features = ["std"]}

[profile.release]
lto = "thin"

[lib]
crate-type = ["lib", "cdylib"]

There also needs to be an empty src/lib.rs file. Here is a reproducer script for PGO:

#!/bin/bash

set -e

VERSION=${1:-1.71}

rustup install ${VERSION}
rustup component add llvm-tools-preview --toolchain ${VERSION}
rm -rf target
rm -rf /tmp/pgo
RUSTFLAGS="-Cprofile-generate=/tmp/pgo" cargo +${VERSION} build --release
./target/release/hello /tmp
`rustc +${VERSION} --print=sysroot`/lib/rustlib/x86_64-unknown-linux-gnu/bin/llvm-profdata merge -o merged-${VERSION}.profdata /tmp/pgo/*
RUSTFLAGS="-Cprofile-use=${PWD}/merged-${VERSION}.profdata" cargo +${VERSION} build --release

I expected to see this happen: the project compiles.

Instead, compilation fails with the following error:

module flag identifiers must be unique (or of 'require' type)
!"CG Profile"
LLVM ERROR: Broken module found, compilation aborted!

This was originally found in #115344, which contains many examples of this problem in the wild. Here I sent a minimal example, I wasn't able to minimize it further (removing LTO, PGO, clap or lib/dylib) fixes the problem.

Meta

rustc --version --verbose:

rustc 1.73.0 (cc66ad468 2023-10-03)
binary: rustc
commit-hash: cc66ad468955717ab92600c770da8c1601a4ff33
commit-date: 2023-10-03
host: x86_64-unknown-linux-gnu
release: 1.73.0
LLVM version: 17.0.2

I managed to reproduce this all the way back to 1.60.0 (although this specific example requires at least 1.70.0 due to clap).

@Kobzol Kobzol added the C-bug Category: This is a bug. label Oct 26, 2023
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Oct 26, 2023
@Jules-Bertholet
Copy link
Contributor

Jules-Bertholet commented Oct 26, 2023

@rustbot label T-compiler A-lto A-llvm I-crash

@rustbot rustbot added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-LTO Area: Link-time optimization (LTO) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Oct 26, 2023
@lqd lqd removed the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Oct 26, 2023
@rustbot rustbot added the I-crash Issue: The compiler crashes (SIGSEGV, SIGABRT, etc). Use I-ICE instead when the compiler panics. label Oct 28, 2023
@DianQK
Copy link
Member

DianQK commented Jun 26, 2024

I can reproduce the issue either with the Cargo.toml or using the lock file below.

# Cargo.toml
[package]
name = "hello"
version = "0.1.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
clap = { version = "=4.3.24" , default-features = false, features = ["std"]}

[profile.release]
lto = "thin"

[lib]
crate-type = ["lib", "cdylib"]
Cargo.lock

# This file is automatically @generated by Cargo.
# It is not intended for manual editing.
version = 3

[[package]]
name = "anstyle"
version = "1.0.7"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "038dfcf04a5feb68e9c60b21c9625a54c2c0616e79b72b0fd87075a056ae1d1b"

[[package]]
name = "clap"
version = "4.3.24"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "fb690e81c7840c0d7aade59f242ea3b41b9bc27bcd5997890e7702ae4b32e487"
dependencies = [
 "clap_builder",
]

[[package]]
name = "clap_builder"
version = "4.3.24"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5ed2e96bc16d8d740f6f48d663eddf4b8a0983e79210fd55479b7bcd0a69860e"
dependencies = [
 "anstyle",
 "clap_lex",
]

[[package]]
name = "clap_lex"
version = "0.5.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "cd7cc57abe963c6d3b9d8be5b06ba7c8957a930305ca90304f24ef040aa6f961"

[[package]]
name = "hello"
version = "0.1.0"
dependencies = [
 "clap",
]

@DianQK
Copy link
Member

DianQK commented Jun 27, 2024

Minimal Reproduction

First, let me provide a minimal reproduction:

# Cargo.toml
[package]
name = "hello"
version = "0.1.0"
edition = "2021"

[profile.release]
lto = "fat"  # "thin" is also ok.

[lib]
crate-type = ["lib", "cdylib"]
// src/lib.rs
pub fn foo() {
    let mut v = Vec::new();
    v.push(1);
}
// src/main.rs
fn main() {
    hello::foo();
}

Next, run the PGO script similar to the one described in the issue.

Why

What LLVM Did

The LLVM error indicates that we created two !"CG Profile".

This happens because every time we effectively run cg-profile, it creates a flag:
https://github.com/llvm/llvm-project/blob/602634d70cba2c51f6177740c4a98a377d10ab6a/llvm/lib/Transforms/Instrumentation/CGProfile.cpp#L42-L44

A simple example: https://llvm.godbolt.org/z/5bzeh6e77.

For me, running it multiple times might lead to unexpected behavior, and I'm not sure if the data can be merged or ignored here. cc @nikic

What rustc/cargo Did

The key point now is understanding why we are running CGProfile twice. The key parameters for the commands executed by cargo are as follows:

rustc src/lib.rs --crate-type lib --crate-type cdylib -Cprofile-use=merged.profdata
rustc src/main.rs --crate-type bin -C lto=fat --extern hello=/path/libhello.rlib -Cprofile-use=merged.profdata

We can remove --crate-type cdylib.

Because we did not pass -Clinker-plugin-lto, PGO data was applied when building libhello.rlib.
Finally, when building the bin, LTO added libhello.rlib that has the "CG Profile" flag.

How to Fix

I believe the key is that crate-type = ["lib", "cdylib"] is causing cargo to omit the -Clinker-plugin-lto and -Clto=fat.

Yes, there's also the issue that cdylib doesn't apply -Clto=fat.

Perhaps we should split this into two separate rustc commands? Or find a way to correctly pass these two parameters to different crate-type? cc @weihanglo

Discovery Process

Some may be interested in this.

A brief description:

  1. Use -Csave-temps to get the LTO input IR and reproduce the issue using opt -O2.
  2. (At this point, I wasn't yet aware of CGProfile) Use llvm-reduce to minimize this IR, and read the CGProfile code. I believe we executed this pass multiple times.
  3. Use -Csave-temps and llvm-dis to find that the clap_builder IR includes the !"CG Profile".
  4. I tried removing some code in clap_builder to reproduce the issue and realized that PGO only handles runtime path. I guessed this was the key, so I remained the calling code, and finally arrived at a minimal reproduction.

@insilications
Copy link

It seems to be happening with sccache

Using: cargo pgo optimize build -- --release --release --no-default-features

module flag identifiers must be unique (or of 'require' type)
!"CG Profile"
LLVM ERROR: Broken module found, compilation aborted!
error: could not compile `sccache` (bin "sccache");
PGO optimized build has failed.
Cargo finished with an error (101)

Rust version: 1.81.0-nightly (6b0f4b5 2024-06-24)
sccache version: main branch d5a1787a8582bd99d3f2fc27f4c3191bd8cc76d9

@weihanglo
Copy link
Member

I believe the key is that crate-type = ["lib", "cdylib"] is causing cargo to omit the -Clinker-plugin-lto and -Clto=fat.

Yes. That is rust-lang/cargo#9672.

@DianQK
Copy link
Member

DianQK commented Jun 28, 2024

I believe the key is that crate-type = ["lib", "cdylib"] is causing cargo to omit the -Clinker-plugin-lto and -Clto=fat.

Yes. That is rust-lang/cargo#9672.

So do we have to execute multiple crate types in the one command? What about compile time? Can we split it into multiple commands when encountering unsupported parameter scenarios?

@weihanglo
Copy link
Member

That might require non-trivial works in Cargo internals (e.g. change Unit-to-CrateType relationship from 1:N to 1:1). For now, a workaround is removing the lib.crate-type from Cargo.toml and instead running cargo rustc --crate-type cdylib to override crate-type on-the-fly via CLI option.

@DianQK
Copy link
Member

DianQK commented Jun 28, 2024

It seems to be happening with sccache

Using: cargo pgo optimize build -- --release --release --no-default-features

module flag identifiers must be unique (or of 'require' type)
!"CG Profile"
LLVM ERROR: Broken module found, compilation aborted!
error: could not compile `sccache` (bin "sccache");
PGO optimized build has failed.
Cargo finished with an error (101)

Rust version: 1.81.0-nightly (6b0f4b5 2024-06-24) sccache version: main branch d5a1787a8582bd99d3f2fc27f4c3191bd8cc76d9

I can't reproduce it with the following command. Do you have more detailed reproduction steps?

rustup install nightly-2024-06-25
rustup +nightly-2024-06-25 component add llvm-tools
cargo +nightly-2024-06-25 pgo instrument build -- --no-default-features
./target/x86_64-unknown-linux-gnu/release/sccache --show-stats
cargo +nightly-2024-06-25 pgo optimize build -- --no-default-features

@DianQK
Copy link
Member

DianQK commented Jun 28, 2024

That might require non-trivial works in Cargo internals (e.g. change Unit-to-CrateType relationship from 1:N to 1:1). For now, a workaround is removing the lib.crate-type from Cargo.toml and instead running cargo rustc --crate-type cdylib to override crate-type on-the-fly via CLI option.

Thanks, this seems feasible. However, I've reconsidered the issue. Since rustc allows passing multiple crate types, it should support passing the corresponding parameters.

@insilications
Copy link

I can't reproduce it with the following command. Do you have more detailed reproduction steps?

@DianQK I'm sorry for the delay. I got angry trying to make LTO + PGO work and ended up deleting the folder with the scripts to reproduce it. But I have found that the problem with finding duplicated !"CG Profile" module flag identifier can sometimes be avoided by using thin LTO. See my post below.

@insilications
Copy link

Ruff LTO + PGO + CODEGEN UNITS 1 works only if LTO is Thin LTO.

FULL Linker-plugin-based LTO + PGO + CODEGEN UNITS 1. LLVM error indicating duplicated !"CG Profile" module flag identifier:

# git clone https://github.com/astral-sh/ruff.git

# Build the instrumented binary.
export CRATE_CC_NO_DEFAULTS=1
export CC=clang
export CXX=clang++
export AR=/usr/bin/llvm-ar
export RANLIB=/usr/bin/llvm-ranlib
export NM=/usr/bin/llvm-nm
export LLVM_PROFILE_FILE=/var/tmp/pgo/code-%p-%m.profraw

export PGO_GEN="-fprofile-generate=/var/tmp/pgo -fprofile-update=atomic"
export CFLAGS="-Ofast -Wl,--as-needed -Wl,--build-id=sha1 -Wl,--emit-relocs -Wl,--enable-new-dtags -Wl,--gc-sections -Wl,--lto-O3 -Wl,--lto-whole-program-visibility -Wl,-plugin-opt=O3 -Wl,-sort-common -Wl,-z,now,-z,relro,-z,max-page-size=0x4000,-z,separate-code -Wp,-D_REENTRANT -ffunction-sections -flto=full -fno-plt -fno-semantic-interposition -fno-trapping-math -fomit-frame-pointer -fuse-ld=lld -fwhole-program-vtables -lpthread -m64 -march=native -mtune=native -pipe -pthread -static-libgcc -static-libstdc++ ${PGO_GEN}"
export LDFLAGS="${CFLAGS}"
export CXXFLAGS="-fvisibility-inlines-hidden ${CFLAGS}"
export RUSTFLAGS="-Ccodegen-units=1 -Cembed-bitcode=yes -Cforce-frame-pointers=no -Cforce-unwind-tables=yes -Clink-arg=-Wl,--lto-pgo-warn-mismatch -Clink-arg=-flto=full -Clink-arg=-fuse-ld=lld -Clink-args=-Wl,--emit-relocs -Clinker-plugin-lto=true -Clinker=clang -Cllvm-args=--pgo-warn-missing-function -Clto=fat -Copt-level=3 -Cprefer-dynamic=no -Cprofile-generate=/var/tmp/pgo -Crelro-level=full -Cstrip=none -Ctarget-cpu=native -Zdylib-lto=yes -Zfunction-sections=yes -Zplt=no -Ztune-cpu=native -Zunstable-options"

cargo build -Zunstable-options -Ztarget-applies-to-host --jobs 20 --target=x86_64-unknown-linux-gnu --release -vv --bin ruff


# Run ruff to generate some useful profiles and then build the optimized binary:
export CRATE_CC_NO_DEFAULTS=1
export CC=clang
export CXX=clang++
export AR=/usr/bin/llvm-ar
export RANLIB=/usr/bin/llvm-ranlib
export NM=/usr/bin/llvm-nm
export LLVM_PROFILE_FILE=/var/tmp/pgo/code-%p-%m.profraw

export PGO_USE="-Wbackend-plugin -Wl,--lto-pgo-warn-mismatch -fprofile-update=atomic -fprofile-use=/var/tmp/pgo/merged.profdata -mllvm -pgo-warn-missing-function"
export CFLAGS="-Ofast -Wl,--as-needed -Wl,--build-id=sha1 -Wl,--emit-relocs -Wl,--enable-new-dtags -Wl,--gc-sections -Wl,--lto-O3 -Wl,--lto-whole-program-visibility -Wl,-plugin-opt=O3 -Wl,-sort-common -Wl,-z,now,-z,relro,-z,max-page-size=0x4000,-z,separate-code -Wp,-D_REENTRANT -ffunction-sections -flto=full -fno-plt -fno-semantic-interposition -fno-trapping-math -fomit-frame-pointer -fuse-ld=lld -fwhole-program-vtables -lpthread -m64 -march=native -mtune=native -pipe -pthread -static-libgcc -static-libstdc++ ${PGO_USE}"
export LDFLAGS="${CFLAGS}"
export CXXFLAGS="-fvisibility-inlines-hidden ${CFLAGS}"
export RUSTFLAGS="-Ccodegen-units=1 -Cembed-bitcode=yes -Cforce-frame-pointers=no -Cforce-unwind-tables=yes -Clink-arg=-Wl,--lto-pgo-warn-mismatch -Clink-arg=-flto=full -Clink-arg=-fuse-ld=lld -Clink-args=-Wl,--emit-relocs -Clinker-plugin-lto=true -Clinker=clang -Cllvm-args=--pgo-warn-missing-function -Clto=fat -Copt-level=3 -Cprefer-dynamic=no -Cprofile-use=/var/tmp/pgo/merged.profdata -Crelro-level=full -Cstrip=none -Ctarget-cpu=native -Zdylib-lto=yes -Zfunction-sections=yes -Zplt=no -Ztune-cpu=native -Zunstable-options"

llvm-profdata merge -o /var/tmp/pgo/merged.profdata /var/tmp/pgo/
cargo build -Ztarget-applies-to-host --jobs 20 --target=x86_64-unknown-linux-gnu --release -vv --bin ruff

THIN Linker-plugin-based LTO + PGO + CODEGEN UNITS 1. Successful build:

# Build the instrumented binary.
export CRATE_CC_NO_DEFAULTS=1
export CC=clang
export CXX=clang++
export AR=/usr/bin/llvm-ar
export RANLIB=/usr/bin/llvm-ranlib
export NM=/usr/bin/llvm-nm
export LLVM_PROFILE_FILE=/var/tmp/pgo/code-%p-%m.profraw

export PGO_GEN="-fprofile-generate=/var/tmp/pgo -fprofile-update=atomic"
export CFLAGS="-fsplit-lto-unit -Ofast -Wl,--as-needed -Wl,--build-id=sha1 -Wl,--emit-relocs -Wl,--enable-new-dtags -Wl,--gc-sections -Wl,--lto-O3 -Wl,--lto-whole-program-visibility -Wl,-plugin-opt=O3 -Wl,-sort-common -Wl,-z,now,-z,relro,-z,max-page-size=0x4000,-z,separate-code -Wp,-D_REENTRANT -ffunction-sections -flto=thin -fno-plt -fno-semantic-interposition -fno-trapping-math -fomit-frame-pointer -fuse-ld=lld -fwhole-program-vtables -lpthread -m64 -march=native -mtune=native -pipe -pthread -static-libgcc -static-libstdc++ ${PGO_GEN}"
export LDFLAGS="${CFLAGS}"
export CXXFLAGS="-fvisibility-inlines-hidden ${CFLAGS}"
export RUSTFLAGS="-Zsplit-lto-unit=yes -Zthinlto=yes -Ccodegen-units=1 -Cembed-bitcode=yes -Cforce-frame-pointers=no -Cforce-unwind-tables=yes -Clink-arg=-Wl,--lto-pgo-warn-mismatch -Clink-arg=-flto=thin -Clink-arg=-fuse-ld=lld -Clink-args=-Wl,--emit-relocs -Clinker-plugin-lto=true -Clinker=clang -Cllvm-args=--pgo-warn-missing-function -Clto=thin -Copt-level=3 -Cprefer-dynamic=no -Cprofile-generate=/var/tmp/pgo -Crelro-level=full -Cstrip=none -Ctarget-cpu=native -Zdylib-lto=yes -Zfunction-sections=yes -Zplt=no -Ztune-cpu=native -Zunstable-options"

cargo build -Zunstable-options -Ztarget-applies-to-host --jobs 20 --target=x86_64-unknown-linux-gnu --release -vv --bin ruff

# Run ruff to generate some useful profiles and then build the optimized binary:
export CRATE_CC_NO_DEFAULTS=1
export CC=clang
export CXX=clang++
export AR=/usr/bin/llvm-ar
export RANLIB=/usr/bin/llvm-ranlib
export NM=/usr/bin/llvm-nm
export LLVM_PROFILE_FILE=/var/tmp/pgo/code-%p-%m.profraw

export PGO_USE="-Wbackend-plugin -Wl,--lto-pgo-warn-mismatch -fprofile-update=atomic -fprofile-use=/var/tmp/pgo/merged.profdata -mllvm -pgo-warn-missing-function"
export CFLAGS="-fsplit-lto-unit -Ofast -Wl,--as-needed -Wl,--build-id=sha1 -Wl,--emit-relocs -Wl,--enable-new-dtags -Wl,--gc-sections -Wl,--lto-O3 -Wl,--lto-whole-program-visibility -Wl,-plugin-opt=O3 -Wl,-sort-common -Wl,-z,now,-z,relro,-z,max-page-size=0x4000,-z,separate-code -Wp,-D_REENTRANT -ffunction-sections -flto=thin -fno-plt -fno-semantic-interposition -fno-trapping-math -fomit-frame-pointer -fuse-ld=lld -fwhole-program-vtables -lpthread -m64 -march=native -mtune=native -pipe -pthread -static-libgcc -static-libstdc++ ${PGO_USE}"
export LDFLAGS="${CFLAGS}"
export CXXFLAGS="-fvisibility-inlines-hidden ${CFLAGS}"
export RUSTFLAGS="-Zsplit-lto-unit=yes -Zthinlto=yes -Ccodegen-units=1 -Cembed-bitcode=yes -Cforce-frame-pointers=no -Cforce-unwind-tables=yes -Clink-arg=-Wl,--lto-pgo-warn-mismatch -Clink-arg=-flto=thin -Clink-arg=-fuse-ld=lld -Clink-args=-Wl,--emit-relocs -Clinker-plugin-lto=true -Clinker=clang -Cllvm-args=--pgo-warn-missing-function -Clto=thin -Copt-level=3 -Cprefer-dynamic=no -Cprofile-use=/var/tmp/pgo/merged.profdata -Crelro-level=full -Cstrip=none -Ctarget-cpu=native -Zdylib-lto=yes -Zfunction-sections=yes -Zplt=no -Ztune-cpu=native -Zunstable-options"

llvm-profdata merge -o /var/tmp/pgo/merged.profdata /var/tmp/pgo/
cargo build -Ztarget-applies-to-host --jobs 20 --target=x86_64-unknown-linux-gnu --release -vv --bin ruff

@insilications
Copy link

insilications commented Jul 9, 2024

Using -Csave-temps and unpacking everything to disassemble and investigate the IR from each dep, including the final LTO IR from the binary, shows that in the THIN LTO case the problem of facing the duplicated !"CG Profile" module flag identifier can sometimes be avoided. I have seen this in many cases since this problem started to be reported last year. It seems that the CG optimization pass is inevitably applied before the final LTO stage in the full LTO case. I remember avoiding using cargo and compiling a small test case using full LTO by hand, including specifying and running all optimization passes manually, just to avoid generating a final link-phase LTO with duplicated "CG Profile" module flag. And it compiled perfectly in that case.

@DianQK
Copy link
Member

DianQK commented Jul 10, 2024

Thank you for your response. Based on https://github.com/astral-sh/ruff/blob/4cc7bc9d32047b4f47b944636af449a9d24cbaac/crates/ruff_wasm/Cargo.toml#L15, I believe this is the same issue.

@insilications
Copy link

Thank you for your response. Based on https://github.com/astral-sh/ruff/blob/4cc7bc9d32047b4f47b944636af449a9d24cbaac/crates/ruff_wasm/Cargo.toml#L15, I believe this is the same issue.

Could be... I'm going to test later, but I'm almost sure that ruff_wasm is the library for exposing Ruff functionality to the web (as a wasm module) (https://play.ruff.rs/) and does not get built when cargo build --release --bin ruff

@FilipAndersson245
Copy link

@insilications did you manage to investigate any future about this error?

@DianQK DianQK self-assigned this Nov 14, 2024
bors added a commit to rust-lang-ci/rust that referenced this issue Nov 20, 2024
[WIP] The embedded bitcode should always be prepared for LTO/ThinLTO

Fixes rust-lang#115344. Fixes rust-lang#117220.

r? ghost
bors added a commit to rust-lang-ci/rust that referenced this issue Nov 21, 2024
[WIP] The embedded bitcode should always be prepared for LTO/ThinLTO

Fixes rust-lang#115344. Fixes rust-lang#117220.

r? ghost
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-LTO Area: Link-time optimization (LTO) C-bug Category: This is a bug. I-crash Issue: The compiler crashes (SIGSEGV, SIGABRT, etc). Use I-ICE instead when the compiler panics. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants