Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

shared library naming and versioning #5

Open
jeffhammond opened this issue Nov 18, 2022 · 14 comments
Open

shared library naming and versioning #5

jeffhammond opened this issue Nov 18, 2022 · 14 comments

Comments

@jeffhammond
Copy link
Member

jeffhammond commented Nov 18, 2022

Problem

We have to standardize the name of the MPI shared library.

Proposal

libmpi.$SUFFIX.$VERS is widely used already.

We could also pick a new convention, such as libmpi5.$SUFFIX.$VERS, so as to allow implementations to support both their existing and the new ABI in different libraries.

Changes to the Text

Impact on Implementations

Impact on Users

References and Pull Requests

@jeffhammond
Copy link
Member Author

Related: pkg-config.

@jedbrown
Copy link

Note that a typical soname is libmpi.so.5, which will typically be a symlink to a more specific version libmpi.so.5.123. I'm not sure a libmpi5.so.$VERS is needed since we can have libmpich.so and libopenmpi.so if implementations want to keep an internal ABI in a separate library.

ELF symbol versioning is a powerful tool, but maybe out of scope here because not all systems are ELF.

@gonzalobg
Copy link

gonzalobg commented Dec 18, 2022

We have to standardize the name of the MPI shared library.

Why?

I don't think we have to do this. In particular, on MacOS (.dyld), Linux (.so), and Windows (.dll), these shared libraries need to have different names by necessity.

As long as the MPI compiler wrapper links against an MPI ABI stub that has the platform symbols, which library then provides those symbols does not matter AFAICT. An implementation could expose its implementation into multiple .so's on Linux, requiring the user to LD_PRELOAD multiple libraries to get all the symbols, and that would be fine.

I'd rather solve this problem after having an MPI ABI, if this actually turns out to be a problem in practice.

@jeffhammond
Copy link
Member Author

  1. Because some applications might want to dlopen it.
  2. Because that makes it easy for users to build in one place and know that the application will find the SO on another system without any extra information.

We already have libmpi.so as a de facto standard right now.

@gonzalobg
Copy link

Because some applications might want to dlopen it.

Applications that want to use dlopen today have to handle incompatible MPI ABIs, which is a significantly harder problem than just having some logic to pick one of many .so names...

If this turns out to be a common thing people want to do, and this turns to be a problem, we could always sit down again then and solve it; its kind of orthogonal to the rest of the MPI ABI problem.

Because that makes it easy for users to build in one place and know that the application will find the SO on another system without any extra information.

I can imagine that an HPC Center might want to just configure all their compiler toolchains to just link some libmpi_stub.so that contains symbols without any implementation, and just have srun --mpi=openmpi dynamically link a different MPI library for which the user does not have to care on which path it is, what name it has (is it libmpi_instrumented42.so), etc.

If we wanted to solve this problem it wouldn't just be "library names", but also different kind of library files on different platforms (dyld, so, dlls, etc.), link flags (like RPATH!), environment required to find the library, maybe talk about LD_PRELOAD support, etc.

Right now, I at least don't have this problem because I don't have a stable MPI ABI that could cause it. I think that if this turn out to be a real problem in the future, we can always sit down again with more data about the problem and try to fix it, but I am a bit worried about spending too much time "today" on this.

@jedbrown
Copy link

It takes a bit of hackery to link libmpi_stub.so (which puts libmpi_stub.so.5 as soname) and get it to resolve to a different name (libmpi_instrumented42.so.12). In practice, I expect it would be done with a single soname resolving at different paths. And surely this is something we want to support because it would drastically simplify HPC administration and use compared to present.

However, I think it'll be a confusing mess to have libmpi.so with various vendor-specific ABI and libmpi5.so as a standard ABI all in the same filesystem and possibly in the same directories. If we call it libmpi5.so then vendors should be discouraged from creating anything called libmpi.so.

@bwbarrett
Copy link

If the goal of the ABI work is to allow an application to be compiled against (say) MPICH and run against (say) Open MPI without relinking and without major pain (other than setting an LD_LIBRARY_PATH, potentially), then the name of the library absolutely needs to be standardized. Not standardizing the name of the ABI-compliant library means that users would have to figure out how to modify the library dependency table for the application in question, which will be beyond the target audience.

We also should not prohibit libmpi.$SUFFIX, because practically speaking, existing MPIs need to also provide ABI compliance with their existing ABI for the foreseeable future for all the same reasons.

@jeffhammond
Copy link
Member Author

Yeah, I think we say that libmpi_abi.so (or whatever it will be) must support the standard ABI, and that libmpi.so may exist but does not necessarily support the standard ABI. This seems to meet the backwards-compatibility requirements folks have.

@gonzalobg
Copy link

gonzalobg commented Jan 12, 2023 via email

@bwbarrett
Copy link

What platforms are going to define anything about MPI? No one outside of a small HPC community cares about MPI, so we need to define what we need.

I agree that pulling in a definition of linking is problematic. But providing no guidance is also problematic, because it's not that anyone will go rouge, but more that everyone will be working in parallel and not talking to each other. If it's not written in the spec, then there's flexibility in implementation. So at some point, whether it is part of the standard or a more flexible definition outside of the standard document itself, we have to write these things down.

@gonzalobg
Copy link

gonzalobg commented Jan 12, 2023 via email

@hzhou
Copy link

hzhou commented Feb 6, 2023

My suggestion is to use libmpi_abi.so, and the so versions should also be standardized.

libmpi.so is currently used by native abi of various implementations, and I don't think it will ever get cleaned up. Thus letting the new ABI version of the library use the same name will cause confusion and sabotage the ABI goal. With libmpi_abi.so -- hopefully no existing implementation uses it -- we ensure it is the ABI version of library we are linking with.

MPICH's plan is to produce both libmpi.so and libmpi_abi.so. The former is the current MPICH ABI (including intel MPI, Cray MPI, etc.). The latter is the MPI ABI this working group will produce.

@jeffhammond
Copy link
Member Author

This plan is fine with me. The name of the SO is one of the smaller problems we have here 😄

@qkoziol
Copy link

qkoziol commented Feb 7, 2023

Yes, agree also

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants