Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression: systemd is failing to compile with clang-{10,11,12} with --optimization=3 -Db_lto=true #8347

Closed
evverx opened this issue Feb 15, 2021 · 21 comments · Fixed by #8368
Assignees
Milestone

Comments

@evverx
Copy link

evverx commented Feb 15, 2021

Describe the bug
Since meson-0.57.0 was released systemd has been failing to compile with --optimization=3 -Db_lto=true with

FAILED: src/basic/libbasic-gcrypt.a.p/gcrypt-util.c.o 
clang-10 -Isrc/basic/libbasic-gcrypt.a.p -Isrc/basic -I../src/basic -Isrc/fundamental -I../src/fundamental -Isrc/systemd -I../src/systemd -I. -I.. -flto -flto-jobs=0 -Xclang -fcolor-diagnostics -pipe -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -Wextra -Werror -std=gnu99 -O3 -g -Wno-unused-parameter -Wno-missing-field-initializers -Wno-unused-result -Werror=undef -Wmissing-include-dirs -Wold-style-definition -Wpointer-arith -Winit-self -Wfloat-equal -Werror=missing-prototypes -Werror=implicit-function-declaration -Werror=missing-declarations -Werror=return-type -Werror=incompatible-pointer-types -Werror=format=2 -Wstrict-prototypes -Wredundant-decls -Wmissing-noreturn -Wshadow -Wendif-labels -Wstrict-aliasing=2 -Wwrite-strings -Werror=overflow -Werror=shift-count-overflow -Wdate-time -Wnested-externs '-Wno-error=#warnings' -Wno-string-plus-int -ffast-math -fno-common -fdiagnostics-show-option -fno-strict-aliasing -fvisibility=hidden -fstack-protector -fstack-protector-strong --param=ssp-buffer-size=4 -Wno-typedef-redefinition -Wno-gnu-variable-sized-type-not-at-end -Werror=shadow -include config.h -Werror -fPIC -fvisibility=default -MD -MQ src/basic/libbasic-gcrypt.a.p/gcrypt-util.c.o -MF src/basic/libbasic-gcrypt.a.p/gcrypt-util.c.o.d -o src/basic/libbasic-gcrypt.a.p/gcrypt-util.c.o -c ../src/basic/gcrypt-util.c
clang: error: argument unused during compilation: '-flto-jobs=0' [-Werror,-Wunused-command-line-argument]
[29/1801] Generating errno-from-name.h with a custom command (wrapped by meson to capture output)
[30/1801] Generating errno-to-name.h with a custom command (wrapped by meson to capture output)
ninja: build stopped: subcommand failed.
+ fatal ''\''meson compile'\'' failed with --optimization=3 -Db_lto=true'
+ echo -e '\033[31;1m'\''meson compile'\'' failed with --optimization=3 -Db_lto=true\033[0m'
'meson compile' failed with --optimization=3 -Db_lto=true
+ exit 1

To Reproduce
The script building systemd can be found at https://github.com/systemd/systemd/blob/main/.github/workflows/build_test.sh. It runs something like

git clone https://github.com/systemd/systemd
cd systemd
AR=llvm-ar-10 CFLAGS=-Werror CXXFLAGS=-Werror CC=clang-10 CXX=clang++-10  meson -Dtests=unsafe -Dslow-tests=true -Dfuzz-tests=true --werror --optimization=3 -Db_lto=true build
ninja -C ./build

I've opened systemd/systemd#18590 to "fix" it.

@eli-schwartz
Copy link
Member

eli-schwartz commented Feb 15, 2021

The cause of the error is pretty obvious:

clang: error: argument unused during compilation: '-flto-jobs=0' [-Werror,-Wunused-command-line-argument]

It's listed as supported at https://clang.llvm.org/docs/ClangCommandLineReference.html though... apparently it is "unused" if you use the same arguments during compile and link?

The release notes for meson 0.57 specifically mentioned this change, and referenced -Db_lto_threads=X to control it. If you don't want parallel lto, set this to 0 actually -1.

@dcbaker should this be moved to .get_lto_link_args()?

@evverx
Copy link
Author

evverx commented Feb 15, 2021

@eli-schwartz thanks for looking into this!

The release notes for meson 0.57 specifically mentioned this change, and referenced -Db_lto_threads=X to control it. If you don't want parallel lto, set this to 0 actually -1.

I opened the issue because I think -Db_lto= should keep working without setting -Db_lto_threads= explicitly (mostly because I wouldn't expect anybody building their software with --werror and -Db_lto=true to go through all their scripts and add another setting just to make it work with the latest meson release).

Regarding -Wunused-command-line-argument, it seems to be used by default when -Werror -Wall is passed to clang.

@dcbaker dcbaker self-assigned this Feb 15, 2021
@dcbaker
Copy link
Member

dcbaker commented Feb 15, 2021

I'll get a fix our for this tomorrow PST, should be an easy one

@evverx
Copy link
Author

evverx commented Feb 15, 2021

@dcbaker I wonder if it would be possible to update the regression tests that, as far as I know, are run every time meson is released to make sure systemd is buildable with clang and gcc with

    "--optimization=0"
    "--optimization=2"
    "--optimization=s"
    "--optimization=3 -Db_lto=true"
    "--optimization=3 -Db_lto=false"
    "-Db_ndebug=true"

?

@evverx
Copy link
Author

evverx commented Feb 15, 2021

Forgot to mention that -Werror is passed by the systemd CI but due to #7360 it's a little more complicated than it should probably be so I'm not sure how to incorporate the "Werror" part into the meson regression tests.

@dcbaker
Copy link
Member

dcbaker commented Feb 16, 2021

We probably should be running our CI with the "no unused arguments" check on, for obvious reasons, which is what failed here. without that we wouldn't have caught this at all anyway.

@xclaesse
Copy link
Member

@evverx We are trying to catch those regressions earlier by doing RC releases, we would appreciate if more people could test future Meson releases in advance to avoid this kind of issue. If you wish to help us, please follow @mesonbuild on twitter to be notified about RC releases, test your projects, and report issues before final release. Thanks.

dcbaker added a commit to dcbaker/meson that referenced this issue Feb 16, 2021
Clang has a hand `-Wunused-command-line-argument` switch, which when
turned to an error, gets very grump about `-flto-jobs=0` being set in
the compiler arguments (although `-flto=` belongs there). We'll refactor
a bit to put that only in the link arguments.

GCC doesn't have this probably because, a) it doesn't have an equivalent
warning, and b) it uses `-flto=<$numthreads.

Fixes: mesonbuild#8347
@dcbaker dcbaker added this to the 0.57.1 milestone Feb 16, 2021
dcbaker added a commit to dcbaker/meson that referenced this issue Feb 16, 2021
Clang has a hand `-Wunused-command-line-argument` switch, which when
turned to an error, gets very grump about `-flto-jobs=0` being set in
the compiler arguments (although `-flto=` belongs there). We'll refactor
a bit to put that only in the link arguments.

GCC doesn't have this probably because, a) it doesn't have an equivalent
warning, and b) it uses `-flto=<$numthreads.

Fixes: mesonbuild#8347
@evverx
Copy link
Author

evverx commented Feb 16, 2021

We are trying to catch those regressions earlier by doing RC releases, we would appreciate if more people could test future Meson releases in advance to avoid this kind of issue.

@xclaesse While I think it's somewhat helpful in terms of catching bugs earlier I think it would be much better if the meson project could try building large projects like systemd in real-world environments automatically on a regular basis even before RC were released (so as not to ask people to spend their time on testing broken RC releases manually). Given that the systemd CI seems to have caught all the C regressions in about 5 minutes I think those scripts would be really helpful here.

@eli-schwartz
Copy link
Member

https://github.com/jon-turney/meson-corpus-test

systemd is being run here, but this will probably only tend to find issues that intersect with your meson.build files... the systemd CI is testing various intersections with meson options so I'm not sure merely trying to compile systemd would have helped here.

@evverx
Copy link
Author

evverx commented Feb 16, 2021

@eli-schwartz I'm not sure how exactly systemd is built there but it's probably just a "smoke" test in a "hermetic" environment without CFLAGS, CXXFLAGS, -dc_args, --werror, sanitizers and so on. I'm pretty sure that environment is far from what is usually used to build packages for example.

@xclaesse
Copy link
Member

@evverx it's difficult to smoke test a variety of projects we have no knowledge about. Each project requires different dependencies, env, platforms, etc... I've been thinking about this a bit, what I realize is most regressions are found when CI of some projects does pip install meson and start using the latest release. It would be great if we had a way to trigger systemd CI using meson RC release (or any given HEAD) and get a result. With gitlab we could trigger a pipeline and set a variable like USE_MESON_RC, and the project could then use pip install --pre meson instead. Just thinking out loud...

@evverx
Copy link
Author

evverx commented Feb 16, 2021

It would be great if we had a way to trigger systemd CI using meson RC release (or any given HEAD) and get a result.

I think it should be doable. I don't think the systemd CI (which contains CentOS CI, packit, Ubuntu/Debian Autopkgtests, CIFuzz, various GH actions and so on and so forth) can be triggered easily that way but I guess it should be possible to gather all systemd use cases and put them into a pipeline of some kind. Could you tell me what the meson project usually uses to run long (pre) release tests?

cc @mrc0mmand.

@evverx
Copy link
Author

evverx commented Feb 17, 2021

Another option would be for a meson bot to open PRs against the systemd repository on GitHub with USE_MESON_RC set in a couple of files. This way the systemd CI would be fully unleashed on meson. Though I'm not sure when and how that bot should be triggered, who should be CCed, who would look at the results and so on.

@dcbaker
Copy link
Member

dcbaker commented Feb 17, 2021

We should open a dedicated issue for discussing ci changes, so it doesn't get lost when this issue is closed

@mrc0mmand
Copy link

Maybe adding something similar to our build test as another PR GH action here would make sense as well. Just use the default gcc/clang versions instead of the custom version shenanigans and slightly change the build scenarios to cover the "problematic" cases, i.e.:

ARGS=(
    "--optimization=0"
    "--optimization=3 -Db_lto=true"
    "--optimization=3 -Db_lto=false"
    "-Dc_args='-fno-omit-frame-pointer -ftrapv'"
...
)

The builds are fairly quick, and to cover the "who should be CCed" case, adding a note somewhere to ping either me or @evverx (if he agrees) should help.

@evverx
Copy link
Author

evverx commented Feb 17, 2021

@mrc0mmand I think another GH Action should be good enough to more or less cover systemd (though I'd also add OSS-Fuzz builds that always trigger all sorts of issues in meson) but to judge from #8351 (comment) it seems opening PRs against various projects to trigger their CI would be more helpful in general.

jpakkane pushed a commit that referenced this issue Feb 17, 2021
Clang has a hand `-Wunused-command-line-argument` switch, which when
turned to an error, gets very grump about `-flto-jobs=0` being set in
the compiler arguments (although `-flto=` belongs there). We'll refactor
a bit to put that only in the link arguments.

GCC doesn't have this probably because, a) it doesn't have an equivalent
warning, and b) it uses `-flto=<$numthreads.

Fixes: #8347
@jpakkane
Copy link
Member

though I'd also add OSS-Fuzz builds that always trigger all sorts of issues in meson

One of the reasons for this is that OSS-Fuzz refuses to use Meson's builtin options for sanitizers but instead spams CPPFLAGS, LDFLAGS et al with tens of random flags without any rhyme or reason. It's not surprising that it fails, it's surprising that it works at all.

@evverx
Copy link
Author

evverx commented Feb 17, 2021

One of the reasons for this is that OSS-Fuzz refuses to use Meson's builtin options for sanitizers but instead spams CPPFLAGS, LDFLAGS et al with tens of random flags without any rhyme or reason

@jpakkane according to #4542 (comment), the reason OSS-Fuzz uses all those environment variables is to be compatible with different build systems.

@jpakkane
Copy link
Member

jpakkane commented Feb 17, 2021

Specifically they want to be compatible with the largest amount of build systems by having just one thing that works the same everywhere. The downside of this is that they offload all work to downstream maintainers like us by saying "it works for other build systems" and have us fix the thing by trying to undo their environment butchering. A proper solution would be to add functionality like this to their end:

if building_with_meson:
    add_b_sanitize_option_to_meson_invocation()
else:
    do_the_common_envvar_thing()

This might require us to add some functionality to Meson (I think they use a combination of asan + ubsan, which is not currently supported) and we'd be happy to fix that rather than play the current continuous compiler flag whack-a-mole.

@eli-schwartz
Copy link
Member

OSS-Fuzz is not the only massively heterogeneous environment relying on this standard mechanism, and we shouldn't be shaming them for detecting real bugs in something that meson does specifically support for entirely valid reasons.

@evverx
Copy link
Author

evverx commented Feb 18, 2021

@jpakkane given that only 15 projects (out of 422) use meson there I think if I were an OSS-Fuzz maintainer I wouldn't want to add and maintain that "if" clause either. But, to be fair, when it makes sense the OSS-Fuzz project either fixes or helps to fix incompatibilities of various tools with meson. Thanks to them all the projects using meson now can be built with AFL++/afl-clang-fast: google/oss-fuzz#5084

nirbheek pushed a commit that referenced this issue Feb 20, 2021
Clang has a hand `-Wunused-command-line-argument` switch, which when
turned to an error, gets very grump about `-flto-jobs=0` being set in
the compiler arguments (although `-flto=` belongs there). We'll refactor
a bit to put that only in the link arguments.

GCC doesn't have this probably because, a) it doesn't have an equivalent
warning, and b) it uses `-flto=<$numthreads.

Fixes: #8347
evverx added a commit to evverx/systemd that referenced this issue Feb 20, 2021
This reverts commit c39e362.

Now that meson-0.57.1 (where mesonbuild/meson#8347
is fixed) is out it should be safe to keep rolling forward.
keszybz pushed a commit to systemd/systemd that referenced this issue Feb 20, 2021
This reverts commit c39e362.

Now that meson-0.57.1 (where mesonbuild/meson#8347
is fixed) is out it should be safe to keep rolling forward.
tristan957 pushed a commit to tristan957/meson that referenced this issue Mar 2, 2021
Clang has a hand `-Wunused-command-line-argument` switch, which when
turned to an error, gets very grump about `-flto-jobs=0` being set in
the compiler arguments (although `-flto=` belongs there). We'll refactor
a bit to put that only in the link arguments.

GCC doesn't have this probably because, a) it doesn't have an equivalent
warning, and b) it uses `-flto=<$numthreads.

Fixes: mesonbuild#8347
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants