Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bundle clang+lld+libc++ for Linux #733

Closed
wants to merge 27 commits into from
Closed

Conversation

Kha
Copy link
Member

@Kha Kha commented Oct 20, 2021

Alternative approach to #659: instead of relying on Zig, we bundle everything necessary for standalone compilation ourselves. This way we're kept in control and can use a uniform, mature toolchain across all platforms. And hopefully avoid unexplainable errors that we could not reproduce without Zig.

Specifically, what we bundle in the case of x86-64 Linux is

stage1
├── [237M]  bin
│   ├── [141M]  clang
│   ├── [   3]  ld.lld -> lld
│   └── [ 96M]  lld
├── [ 47K]  include
│   └── [ 43K]  clang
│       ├── [2.8K]  limits.h
│       ├── [ 583]  stdalign.h
│       ├── [1.1K]  stdarg.h
│       ├── [7.1K]  stdatomic.h
│       ├── [ 897]  stdbool.h
│       ├── [3.5K]  stddef.h
│       ├── [ 857]  __stddef_max_align_t.h
│       ├── [ 22K]  stdint.h
│       └── [ 510]  stdnoreturn.h
└── [ 11M]  lib
    ├── [8.5K]  crt1.o
    ├── [2.3K]  crtbegin.o
    ├── [2.7K]  crtbeginS.o
    ├── [1.1K]  crtend.o
    ├── [1.1K]  crtendS.o
    ├── [2.7K]  crti.o
    ├── [2.5K]  crtn.o
    ├── [3.9M]  glibc
    │   ├── [ 23K]  libc_nonshared.a
    │   ├── [1.9M]  libc.so
    │   ├── [ 18K]  libdl.so
    │   ├── [  13]  libgcc_s.so -> libgcc_s.so.1
    │   ├── [ 98K]  libgcc_s.so.1
    │   ├── [1.7M]  libm.so
    │   ├── [143K]  libpthread.so
    │   └── [ 45K]  librt.so
    ├── [  18]  libatomic.so -> libatomic.so.1.2.0
    ├── [  18]  libatomic.so.1 -> libatomic.so.1.2.0
    ├── [ 30K]  libatomic.so.1.2.0
    ├── [1.8M]  libc++.a
    ├── [662K]  libc++abi.a
    ├── [  14]  libc++abi.so -> libc++abi.so.1
    ├── [  16]  libc++abi.so.1 -> libc++abi.so.1.0
    ├── [357K]  libc++abi.so.1.0
    ├── [ 12K]  libc++experimental.a
    ├── [  28]  libc++.so
    ├── [  13]  libc++.so.1 -> libc++.so.1.0
    ├── [1022K]  libc++.so.1.0
    ├── [2.9M]  libgcc.a
    ├── [ 948]  libgmp.la
    ├── [  16]  libgmp.so -> libgmp.so.10.3.2
    ├── [  16]  libgmp.so.10 -> libgmp.so.10.3.2
    └── [636K]  libgmp.so.10.3.2

As can be seen, the bundle size is dominated completely by the clang and lld executables. That Zig manages to combine both of them and their own compiler in an executable smaller than clang is an impressive feat and demonstrates that this bundle could still be optimized, though I don't think it's a dealbreaker for us.

A special issue on Linux is the glibc. We bundle a relatively old version of it, glibc 2.27 from Nixpkgs 19.03, so that Lean programs linked against it work on most somewhat recent distros. For running we always use the system glibc, which is why we put the bundled one in a separate directory.

Apart from the glibc, a Lean program built with this bundle does have more runtime dependencies, namely the libc++ and GMP DSOs:

$ ldd tests/compiler/534.lean.out
	linux-vdso.so.1 (0x00007ffd9dde4000)
	libc++.so.1 => not found
	libc++abi.so.1 => not found
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f5b35e59000)
	libgmp.so.10 => /usr/lib/x86_64-linux-gnu/libgmp.so.10 (0x00007f5b35dd5000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f5b35dcf000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f5b35db2000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f5b35d8f000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f5b35b9d000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f5b35fd2000)

For the tests, I simply set LD_LIBRARY_PATH to the bundle's lib/, but that isn't really an option in general. libc++ (BSD-like/MIT) we should probably link statically (like it seems to be the case with libstdc++ right now), but that is problematic with GMP because of its LGPL license. The best approach I can come up with is to use the "Windows approach": put $ORIGIN (the executable's own directory) in its rpath and copy libgmp.so next to it (which is basically free with hardlinks). The second step should probably be done by Lake, not leanc.

@Kha Kha marked this pull request as draft October 20, 2021 17:15
src/include/lean/lean.h Outdated Show resolved Hide resolved
@gebner
Copy link
Member

gebner commented Oct 21, 2021

Do you have any idea how this would work on NixOS? We need an (OS-provided) compiler wrapper to find the ELF interpreter, include files, etc.

The bundled clang requires some additional libraries:

        libz.so.1 => not found
        libtinfo.so.5 => not found
        libstdc++.so.6 => not found

The libtinfo.so.5 is a library from ncurses 5, I'm not sure if this is installed by default anywhere. It also complains about the ncurses version in nixos:

./bin/clang: /nix/store/dm41nzrymziclrm28q9kv35550ryw6iz-ncurses-6.2-abi5-compat/lib/libtinfo.so.5: no version information available (required by ./bin/clang)
clang: error: no input files

After adding enough stuff to LD_LIBRARY_PATH to actually get this to run, as expected, clang fails to build an executable:

./bin/clang: /nix/store/dm41nzrymziclrm28q9kv35550ryw6iz-ncurses-6.2-abi5-compat/lib/libtinfo.so.5: no version information available (required by ./bin/clang)
/home/gebner/Downloads/lean-4.0.0-linux/bin/clang: /nix/store/dm41nzrymziclrm28q9kv35550ryw6iz-ncurses-6.2-abi5-compat/lib/libtinfo.so.5: no version information available (required by /home/gebner/Downloads/lean-4.0.0-linux/bin/clang)
test.c:1:10: fatal error: 'stdio.h' file not found
#include <stdio.h>
         ^~~~~~~~~
1 error generated.

@Kha
Copy link
Member Author

Kha commented Oct 21, 2021

Btw, the foreign test still uses the system compiler and thus demonstrates that mixing C++ code linked against libstdc++ with Lean code linked against libc++ is A-OK - at least as long as one of the stdlibs is dynamically linked. Which reminds me that libstdc++ being statically linked is actually a Nixpkgs bug, so apart from that linking libc++ statically by default should be fine.

@Kha
Copy link
Member Author

Kha commented Oct 21, 2021

@gebner It looks like we should really compile LLVM from source after all... or we do the same thing as with the Lean executable and just copy those deps as well.

stdio.h is not a header required by Lean programs. I'm assuming that FFI components will often require additional dependencies and thus will be compiled using external toolchains anyway.

@gebner
Copy link
Member

gebner commented Oct 21, 2021

stdio.h is not a header required by Lean programs.

Ok that's correct, but at least stddef.h is required:

> LEAN_PATH=./build/lib /home/gebner/Downloads/lean-4.0.0-linux/bin/lean -R ./. -o ./build/lib/Main.olean -c ./build/ir/Main.c ././Main.lean
> /home/gebner/Downloads/lean-4.0.0-linux/bin/leanc -c -o ./build/ir/Foo.o ./build/ir/Foo.c -O3 -DNDEBUG
/home/gebner/Downloads/lean-4.0.0-linux/bin/clang: /nix/store/dm41nzrymziclrm28q9kv35550ryw6iz-ncurses-6.2-abi5-compat/lib/libtinfo.so.5: no version information available (required by /home/gebner/Downloads/lean-4.0.0-linux/bin/clang)
In file included from ./build/ir/Foo.c:4:
/home/gebner/Downloads/lean-4.0.0-linux/include/lean/lean.h:8:10: fatal error: 'stddef.h' file not found
#include <stddef.h>
         ^~~~~~~~~~
1 error generated.
external command /home/gebner/Downloads/lean-4.0.0-linux/bin/leanc exited with status 1
build error: external command /home/gebner/Downloads/lean-4.0.0-linux/bin/leanc exited with status 1
error: build failed

BTW, I also had to add libc_nonshared.a before I got this far (which is not present in the download).

And even if all include headers are present, the resulting binaries won't run:

$ echo 'int main() { return 0; }' >test.c
$ clang test.c
clang: /nix/store/dm41nzrymziclrm28q9kv35550ryw6iz-ncurses-6.2-abi5-compat/lib/libtinfo.so.5: no version information available (required by clang)
/home/gebner/Downloads/lean-4.0.0-linux/bin/clang: /nix/store/dm41nzrymziclrm28q9kv35550ryw6iz-ncurses-6.2-abi5-compat/lib/libtinfo.so.5: no version information available (required by /home/gebner/Downloads/lean-4.0.0-linux/bin/clang)
$ ./a.out
Failed to execute process './a.out'. Reason:
The file './a.out' does not exist or could not be executed.
$ file ./a.out
./a.out: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, with debug_info, not stripped

@gebner
Copy link
Member

gebner commented Oct 21, 2021

@gebner It looks like we should really compile LLVM from source after all... or we do the same thing as with the Lean executable and just copy those deps as well.
I'm assuming that FFI components will often require additional dependencies and thus will be compiled using external toolchains anyway.

I'm not sure I understand the intent behind this PR then. If users are supposed to use their own toolchain anyhow, why do we bundle it? Wouldn't it be easier to add an elan install-clang command that downloads a known good clang release for users on old linux distributions?

@Kha
Copy link
Member Author

Kha commented Oct 21, 2021

Ah, the cpack configuration must be wrong, all the files listed above should be included! I'll take a look.

Regarding NixOS, it might be best to completely ignore the bundle and keep using stdenv? We can make it so that when LEAN_CC is defined, none of the new flags (like --sysroot) are applied.

@gebner
Copy link
Member

gebner commented Oct 21, 2021

Regarding NixOS, it might be best to completely ignore the bundle and keep using stdenv? We can make it so that when LEAN_CC is defined, none of the new flags (like --sysroot) are applied.

That's the only solution I can see. But if we do rm -rf clang, then I'm not sure what the point of the bundle is.

@Kha
Copy link
Member Author

Kha commented Oct 21, 2021

The purpose of this PR is to enable building of pure Lean programs (including precompiling modules) without system dependencies. That a C compiler is bundled & used for this should be regarded as an implementation detail. As soon as additional C code comes into play, it's likely that so do additional headers from /usr/include, at which point I don't expect the bundle to behave well (so use your system compiler for foreign code). But maybe that is too pessimistic and we can add at least the libc headers in the future.

All this work is also not super important on Linux to begin with because usually a C toolchain is already present there. It's mostly a PoC before attempting the same on macOS and Windows, where it's more important.

@Kha
Copy link
Member Author

Kha commented Oct 21, 2021

And that none of this would be necessary if only everyone used Nix goes without saying :) .

@gebner
Copy link
Member

gebner commented Oct 21, 2021

The purpose of this PR is to enable building of pure Lean programs (including precompiling modules) without system dependencies.

I see, but I'm not sure if this is such a big issue (on Linux). Even rustup requires a system compiler.

Once we start bundling half a linux distribution, we might as well distribute docker images...

It's mostly a PoC before attempting the same on macOS and Windows, where it's more important.

Completely agreed here, it's much harder to install a working C compiler on windows.

@Kha
Copy link
Member Author

Kha commented Oct 21, 2021

Even rustup requires a system compiler.

That's true, I was actually surprised by this and had to double-check! But it is worth keeping in mind that while Rust is a programming language, that is only part of the story for Lean. If efficiently working on mathlib requires a C compiler for precompilation, it should work out of the box.

@gebner
Copy link
Member

gebner commented Oct 22, 2021

Before merging this, could you please make a tag (so that binary releases get pushed) and give me a few days to fix and test the elan package in nixpkgs?

@Kha
Copy link
Member Author

Kha commented Oct 22, 2021

@gebner Can do. Could you try if everything "just works" now if LEAN_CC is set?

@gebner
Copy link
Member

gebner commented Oct 22, 2021

Can do. Could you try if everything "just works" now if LEAN_CC is set?

I've built lake successfully with env LEAN_CC=gcc. Is there anything else that I should try?

@Kha
Copy link
Member Author

Kha commented Oct 22, 2021

Thanks, that makes me fairly confident that this PR introduces no observable difference in behavior as long as LEAN_CC is set

@Kha
Copy link
Member Author

Kha commented Oct 23, 2021

The best approach I can come up with is to use the "Windows approach": put $ORIGIN (the executable's own directory) in its rpath and copy libgmp.so next to it (which is basically free with hardlinks). The second step should probably be done by Lake, not leanc.

@tydeu What do you think, should Lake copy libgmp next to each executable? I'm assuming this is already an issue on master, e.g. a Lean program built in a mingw shell should fail to find libgmp when run outside of the shell.

@tydeu
Copy link
Member

tydeu commented Oct 23, 2021

@Kha you are correct, running the executable outside of mingw does crash due to not being able to find libgmp. I am suprised I never stumbled onto that before!

More to your point, the copying should probably occur during the creation of a "distribution" of package/application, which Lake does not yet provide a means of doing, However, it is certainly something on the TODO list. Such a dist folder would also want to be free of intermediate or development build artifacts like trace files and give users the ability to hook in additional shared libraries to copy.

However, this does raise the question of whether it might be worth it in the medium to long term to move from libgmp to a different multi-precision library that can be linked statically to make Lean's core self-contained.

@Kha
Copy link
Member Author

Kha commented Oct 24, 2021

the copying should probably occur during the creation of a "distribution" of package/application

If libgmp is bundled with Lean and not installed in the system, which can already be the case today, you won't even be able to run the binary yourself unless you set LD_LIBRARY_PATH/... to the specific Lean toolchain's lib/ directory (recall that elan does not change the environment, so it can't do it for you). So I think copying should happen immediately.

However, this does raise the question of whether it might be worth it in the medium to long term to move from libgmp to a different multi-precision library that can be linked statically to make Lean's core self-contained.

That would be nice, but I'm not aware of a serious alternative. Writing (a verified) one in Lean with at least reasonable performance might be a nice student project though...

@melted
Copy link

melted commented Oct 24, 2021

That would be nice, but I'm not aware of a serious alternative. Writing (a verified) one in Lean with at least reasonable performance might be a nice student project though...

Openssl has a bignum library that is performance-oriented https://github.com/openssl/openssl/blob/master/crypto/bn. I guess the problem there is that it would have to be copied as it doesn't have its own git. (The API: https://github.com/openssl/openssl/blob/master/include/openssl/bn.h)

Though personally, I would be happy with just a simple one (which there are a lot of), I just want unbounded ints, not good performance with millions of digits. It would be nice with a GMP API compatible one, so both would be an option. I looked around a bit and came across whyMP https://gitlab.inria.fr/why3/whymp (Oops, that one is GPLv3 too, so that doesn't work)

@tydeu
Copy link
Member

tydeu commented Oct 24, 2021

@Kha

If libgmp is bundled with Lean and not installed in the system, which can already be the case today, you won't even be able to run the binary yourself unless you set LD_LIBRARY_PATH/... to the specific Lean toolchain's lib/ directory (recall that elan does not change the environment, so it can't do it for you). So I think copying should happen immediately.

Okay, I will get to work on that soon. Do you mind creating an issue for it over on leanprover/lake? Also, is libgmp the only library that needs copying -- what about libleanshared, etc.?

@gebner
Copy link
Member

gebner commented Oct 25, 2021

Last time I checked, https://github.com/libtom/libtommath was another reasonable non-copyleft licensed alternative to GMP.

My impression is that nobody in the Lean ecosystem makes "serious" use of big integers at the moment (let's say >100 digits). Using an asymptotically slower but easier to distribute implementation is a reasonable choice, certainly as a default (keeping GMP as a build-time option).

@Kha
Copy link
Member Author

Kha commented Oct 25, 2021

Packaged in Ubuntu, MSYS2, and Homebrew, looks pretty good to me.

@Kha
Copy link
Member Author

Kha commented Nov 15, 2021

Closing in favor of #795

@Kha Kha closed this Nov 15, 2021
ChrisHughes24 pushed a commit to ChrisHughes24/lean4 that referenced this pull request Dec 2, 2022
…eanprover#733)

`linarith` needs various things that haven't been ported yet. This PR adds various sorried stubs in ad-hoc ported files ahead of where we're currently up to.

Co-authored-by: Scott Morrison <[email protected]>
ChrisHughes24 pushed a commit to ChrisHughes24/lean4 that referenced this pull request Dec 2, 2022
TODO:
* Before merge
- [x] fix a bug in linarith in mathlib3 I just found ...
- [x] depends on: leanprover#519 
- [x] style lint
- [x] docs
- [X] move theory stubs to a separate PR, for easier tracking: leanprover#733 
- [x] failing to parse the `LinarithConfig` option

* Before or after merge?
- [ ] Implement the `removeNe` preprocessor.
- [ ] Add support for restricting to a single type. How to store a `Type` in `LinarithConfig`?

* After merge
- [ ] Teach `norm_num` to solve `example [LinearOrderedRing α] : (0 : α) < 37 := by norm_num`.
- [ ] Port `zify_proof` (plumbing for `zify`), and add the `natToInt` preprocessor.
  Mostly done now, but see leanprover#741 before all the tests will work.
- [ ] Port `cancel_denoms` tactic, and add the `cancelDenoms` preprocessor.
- [ ] Add the `nlinarith` preprocessor and front-end syntax.


Co-authored-by: Mario Carneiro <[email protected]>
Co-authored-by: Scott Morrison <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants