-
-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mpifort 5.0.1 fails with undefined refs to memcpy@GLIBC_2.14 and clock_gettime@GLIBC_2.17 #143
Comments
Thanks for the report Charles! 🙏 This is likely because this feedstock is now building with a CentOS 7 image on That said, the packages should carry this constraint to ensure that they are only installed on systems with a new enough GLIBC. However the GLIBC constraint wasn't being included before. PR ( #145 ) should fix that An open question is whether older GLIBC are still of interest to maintain support for here. Will defer to the feedstock maintainers on that question |
Actually sorry I misspoke, looking at the info provided above can see It's possible adding |
Thanks Jack. Yes, this was observed on an up-to-date Gentoo Linux system with current glibc 2.38. I have not previously had any issues with conda-forge packages on this system. Adding sysroot to the build deps sounds like the right solution (probably should be used by default for all packages, IMO) |
The Packages are building and should be uploaded soon Please look for a |
I'm afraid that the problem persists in both OpenMPI 5.0.1 and 5.0.2 as packaged by conda-forge. The last usable version is 5.0.0 The same failure is observed with both a very recent system (Gentoo linux with glibc 2.39) and and older system (CentOS7 with glibc 2.17) Both complain about 'memcpy@GLIBC_2.14' and 'clock_gettime@GLIBC_2.17'. (These versioned symbols sure are a pain). |
That last comment was from me, I was logged in under a different GitHub account without realizing. One question I have is - what changed between OpenMPI 5.0.0 and 5.0.1? Can we just go back to the way 5.0.0 was getting built? That version does not suffer from the portability problems. |
The most likely problem is that the the 5.0.1 image was built in a different, newer docker image. You should probably as the core conda-forge team. This is ultimately not an Open MPI issue or the fault of this feedstock (although I could be partially wrong). |
Thanks dalcinl. How do I bring this to the attention of the core conda-forge team? |
I usually contact the team via Gitter https://conda-forge.org/community/getting-in-touch/#gitter-and-element |
@jakirkham Do you think we are somehow messing things up in this feedstock? |
The issue they are seeing is they are on newer systems (GLIBC 2.28+) and are having trouble resolving symbols that should be available on their systems as we built on (GLIBC 2.17+) IOW the symbols should be available in their cases, but for some reason they are not The bug may very well be in our build, but am a little fuzzy on how it is occurring |
Could one of you seeing this error please trying installing |
Tried adding the GLIBC constraint to those packages directly ( #147 ). Maybe that helps? |
@jakirkham |
It also works with |
Thanks Charles! 🙏 Yeah that's what I was wondering about Then I think we should try PR: #147 |
New packages are building. Will probably be a bit before they upload and mirror to CDN Please test out tomorrow and let us know how it goes |
It's still broken, or else I'm not seeing new packages. Testing this should be very easy. Just do bash$ mamba create -n test; mamba activate test
(test)$ mamba install openmpi gfortran
Package Version Build Channel Size
──────────────────────────────────────────────────────────────────────────────────────
Install:
──────────────────────────────────────────────────────────────────────────────────────
+ mpi 1.0 openmpi conda-forge Cached
+ _libgcc_mutex 0.1 conda_forge conda-forge Cached
+ libstdcxx-ng 13.2.0 h7e041cc_5 conda-forge Cached
+ ld_impl_linux-64 2.40 h41732ed_0 conda-forge Cached
+ ca-certificates 2024.2.2 hbcca054_0 conda-forge Cached
+ libgomp 13.2.0 h807b86a_5 conda-forge Cached
+ _openmp_mutex 4.5 2_gnu conda-forge Cached
+ libgcc-ng 13.2.0 h807b86a_5 conda-forge Cached
+ libiconv 1.17 hd590300_2 conda-forge Cached
+ libsanitizer 13.2.0 h7e041cc_5 conda-forge Cached
+ openssl 3.2.1 hd590300_1 conda-forge Cached
+ icu 73.2 h59595ed_0 conda-forge Cached
+ xz 5.2.6 h166bdaf_0 conda-forge Cached
+ libzlib 1.2.13 hd590300_5 conda-forge Cached
+ libgfortran5 13.2.0 ha4646dd_5 conda-forge Cached
+ libnl 3.9.0 hd590300_0 conda-forge Cached
+ libevent 2.1.12 hf998b51_1 conda-forge Cached
+ libxml2 2.12.6 h232c23b_1 conda-forge Cached
+ libgfortran-ng 13.2.0 h69a702a_5 conda-forge Cached
+ libhwloc 2.9.3 default_h554bfaf_1009 conda-forge Cached
+ openmpi 5.0.3 h7fc1de5_100 conda-forge 15MB
+ libgcc-devel_linux-64 13.2.0 ha9c7c90_105 conda-forge Cached
+ kernel-headers_linux-64 2.6.32 he073ed8_17 conda-forge Cached
+ sysroot_linux-64 2.12 he073ed8_17 conda-forge Cached
+ binutils_impl_linux-64 2.40 hf600244_0 conda-forge Cached
+ gcc_impl_linux-64 13.2.0 h338b0a0_5 conda-forge Cached
+ gfortran_impl_linux-64 13.2.0 h76e1118_5 conda-forge Cached
+ gcc 13.2.0 hd6cf55c_3 conda-forge Cached
+ gfortran 13.2.0 h98b45c4_3 conda-forge Cached
....
(test)$ mpifort /tmp/test_mpi.f90
/home/cgw/miniforge3/envs/test/bin/../lib/gcc/x86_64-conda-linux-gnu/13.2.0/../../../../x86_64-conda-linux-gnu/bin/ld: /home/cgw/miniforge3/envs/test/lib/libmpi_mpifh.so: undefined reference to `memcpy@GLIBC_2.14'
/home/cgw/miniforge3/envs/test/bin/../lib/gcc/x86_64-conda-linux-gnu/13.2.0/../../../../x86_64-conda-linux-gnu/bin/ld: /home/cgw/miniforge3/envs/test/lib/./libpmix.so.2: undefined reference to `clock_gettime@GLIBC_2.17'
collect2: error: ld returned 1 exit status |
|
Could you show the full output? I don't see the |
That is the full package list, the package is called |
|
Sorry I meant |
What? bash$ which mpifort
which: no mpifort in (/home/cgw/Applications/.bin:/home/cgw/miniforge3/condabin:/home/cgw/bin:/home/cgw/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/bin:/usr/lib/llvm/18/bin:/usr/lib/llvm/17/bin)
bash$ mamba activate test
(test)$ which mpifort
/home/cgw/miniforge3/envs/test/bin/mpifort I have been using |
@leofang What is the purpose of the |
The compiler wrapper packages It appears that when using CUDA 11 to build (which is what this feedstock does) we're pinned at gfortran 11: What would be the reason that you want to ignore the ABI compatibility issue and use gfortran 13? |
I have upgraded my
and then in my
I have yet to get a Mac to try the osx build on, so I'm not sure that that works yet, but it fixes the Linux build at least. What this does is to specify that we want at least glibc 2.17 for the package we're building, which ensures that glibc 2.17 is available so that we can link to it properly. Of course this is strictly speaking incorrect if the present package works with 2.12, but since 2.17 will soon be the standard for everything anyway, it doesn't matter much. |
@LourensVeen, thank you! I'll give that a shot. |
I'm not sure why building with sysroot 2.17 creates a package that claims to be compatible with sysroot 2.12, or if it's something particular about openmpi that isn't universal. This comes up in this package's own tests, where sysroot 2.12 is installed in the test environment. I don't know why that is. FWIW, compiling with $LDFLAGS avoids this problem, at least in openmpi's own tests (#159) mpif77 test.f # fails to find versioned glib symbols
mpif77 $LDFLAGS test.f # succeeds I'm not actually sure which LDFLAG is responsible (maybe Does it make sense to put sysroot 2.17 in run_constrained for this package? And if so, do we need repodata patches? And does it make sense to do that in general in sysroot itself, or just here? |
From this comment, it appears the missing flag is:
maybe we should try to get this into the compiler wrapper's default flags |
#159 appears to confirm that what was missing was I think setting: export OMPI_LDFLAGS="-L$PREFIX/lib -Wl,--allow-shlib-undefined -Wl,-rpath,$PREFIX/lib" will fix anything using the compiler wrapper and building with sysroot 2.12, and #159 will make sure that's the default, I think actually resolving this issue. In general, anything using $LDFLAGS (everything should) should not encounter this problem, but notably I believe that CMake's |
I agree that that would probably (I didn't check it) avoid the build failure, but it wouldn't actually solve the problem, just disable the error at that particular point and kick the can down the road. The built package would still have a dependency on |
I don't want to speak with too much confidence because I'm a bit out of my depth, but I don't think that's the case. For example, in #159, the test environments for |
Seems like a big hammer to me. I feel we need alignment with @conda-forge/core regarding the discussion conda-forge/linux-sysroot-feedstock#63 started by @h-vetinari (EDIT: I see @minrk had also raised the discussion here conda-forge/conda-forge.github.io#2102 (comment)). I am not comfortable adding this flag unconditionally. It only solves our problem but I do not believe Open MPI is the only project affected by glibc symbol issues. |
It's the only package that compiles other things and does not respect LDFLAGS. |
@minrk: Good point (and a working example is hard to argue with 😄). I failed to consider that the dynamic linker would come from Conda and find the sysroot-installed glibc, but that the loader comes from the system and wouldn't be bothered by it. Just disabling the check still feels wrong to me though, and I'm worried that it could cause other problems. What if you tried to compile some MPI code that itself needs a newer glibc than you have available on your system? I guess it would fail when you try to run the program, but it might be hard to debug, and you would expect it to fail at link time. This could affect Conda too if an MPI-using package got an update that introduced a dependency on a newer glibc than Conda uses (say 2.25). You probably wouldn't notice outside of Conda (because you have a much newer glibc everywhere), and the package build would succeed (because the check is disabled), but a user trying to run the package on an older system with glibc>=2.17<2.25 would still get an error. I guess we'd have to hope there's a test that actually runs things and fails the build that way. |
FWIW, it's cmake's FindMPI, not openmpi, that is ignoring compiler flags. I think the mistake I made in #158 is to assume that the mpi compiler wrappers should be responsible for respecting $CFLAGS/$LDFLAGS, but that's not right - they should be minimal extensions of $CC/$FC etc. with just enough to find/link mpi.h/libmpi. $CFLAGS/$LDFLAGS should be passed to the compiler wrappers, just like regular compilers, which happens as expected after FindMPI succeeds, but not during for some reason. Sounds plausibly like a CMake bug to me. Within the context of conda-forge where the linked glibc may be older than that of your dependencies,
@LourensVeen I think the dependency on Also, note that disabling the check in the openmpi compiler wrapper is just applying a subset of what's already applied to all conda-forge-built shared libraries via $LDFLAGS. It's just that for some reason FindMPI checks that mpicc and friends work without $LDFLAGS, and this is the only flag where that really matters. |
I was thinking of a scenario where the packager wouldn't notice the new dependency, and fail to update the sysroot dependency. But that's arguably a bug in that package then, and anyway it probably wouldn't build on conda-forge to begin with, because the Docker container doesn't have the new glibc available. Okay, I'm out of arguments. I still don't like it conceptually (I'd prefer the solution in conda-forge/linux-sysroot-feedstock#63) but I can't see it breaking anything. And I have to test, but this probably fixes my problem, so thanks! |
But I don't think that can happen either, because openmpi itself still carries the newer glibc dependency, so you won't be able to install it due to an unsolvable dependency. I don't think the downstream package gets a glibc dependency that's not represented in the requirements of the package or its dependencies, though I could have misunderstood something. |
@leofang I'm trying to understand your objection to including the flag by default. This flag is in default $LDFLAGS, so it is already used on all shared libraries compiled on conda-forge. This is conda-forge-wide, not specific to openmpi. This change is only making the compiler wrapper more consistent with every link command called on conda-forge.
Including it in the wrapper is also only setting a default, not unconditional, and overridable with |
Thanks all for the helpful discussion here and weighing potential solutions! 🙏 Wanted to follow up on one point...
If we are able to construct a simple example of this behavior and include it in a new CMake issue, we can work to address it |
I think I might be conflating different issues (#158 fixed two issues, one of which fixed FindMPI and it was FCFLAGS-related, not link-related) and getting things wrong. When I run tests, FindMPI definitely does use LDFLAGS. I've re-read, and the original post by @charlesgwaldman I think is compiling in a user environment with confirming cmake works with $LDFLAGScmake_minimum_required(VERSION 3.28)
project(test LANGUAGES Fortran)
find_package(MPI) # docker run --rm -v $PWD:/io -w /io --platform linux/amd64 -it condaforge/miniforge3
conda create -n testbuild cmake make gfortran openmpi=5.0.3=*_104
conda activate testbuild
cmake -B build . gives
but after adding
gives:
I think all of the cases where this error is coming up are attributable to $LDFLAGS not being passed, either in package build systems or user environments, and not cmake itself, e.g.:
Since runtime compilation in user environments is a common and reasonable thing to do, I think it's a valid question to ask: should the mpi compilers work in user environments without the conda-build compiler activation scripts? If so, the answer is either:
(or both) Note that 2. fixes this issue in all situations, because I think we have learned that building downstream packages with an older sysroot is actually fine, whereas 1. fixes it for user environments, test environments, and cross compiled packages (because run_constrained on sysroot will end up preventing downstream packages from using older sysroot in the build, which I think we've learned actually works fine. In practice, though, once openmpi is on newer glibc, all downstream dependencies might as well update since runtime will require newer glibc due to the dependency so there's no benefit to holding back. |
I'm running into this currently even without MPI. If you try to compile against a shared library that depends on another shared library, then that other shared library needs to be available on the linker's search path, for which you need to |
Yes, that's standard for any prefix install, you need |
I see. I guess this is what I missed. Thanks for the explanation. I mainly wanted to ensure we don't come up with a solution specific to this feedstock. Infrastructure-wide consistency is important. |
closing this, since I think we can now safely say that the cause of all known cases here is not passing $LDFLAGS, not any problem in the openmpi package. It is a generic issue for packages requiring newer-than-default sysroot (conda-forge/linux-sysroot-feedstock#68). |
Solution to issue cannot be found in the documentation.
Issue
I am using cmake, gfortran, and openmpi from conda-forge to compile a Fortran package. With cmake 3.28.3, gfortran 13.2.0 and openmpi 5.0.0 everything works. When openmpi upgraded to the latest 5.0.1 version I started getting this error:
Installed packages
Environment info
The text was updated successfully, but these errors were encountered: