-
Notifications
You must be signed in to change notification settings - Fork 864
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
documentation: need to update how-to-build section of Open MPI for the v5.0.0 release #10657
Comments
@hppritcha @awlauria - Can you progress this? |
I will update - https://docs.open-mpi.org/en/v5.0.x/launching-apps/slurm.html to be correct. its inaccurate for the 5.0.x and newer releases. |
related to open-mpi#10657 Signed-off-by: Howard Pritchard <[email protected]>
I've tried building the 5.0.x branch with: hwloc: 1.11.0 as the table in the docs indicates, and the build fails with:
That error code is from the PMIx 4.2 branch, so I'm going to try building with the PMIx 4.2.4 release candidate next. INPUT: Is there any reason to try earlier releases from the openpmix 4.2 branch? I would be inclined to tell people to use the latest release from that branch. |
Configure should have error'd out as OMPI v5 requires a minimum of PMIx v4.2.4 and PRRTE v3.0.1 (when released). |
Agree. Is there any reason to allow earlier versions of PRRTE 4.2.x? |
FYI - I've verified that the 5.0.x branch currently works with PMIx 4.2.4 and PRRTE 2.0.2. I'll check again when we have frozen the branch for code changes before the 5.0 release. |
Errr...there is no way it should work with PRRTE 2.0.2 - nor would you ever want to advise someone to use that combination. You really need PRRTE 3.0.1 as your minimum supported version. |
@jsquyres @gpaulsen @awlauria @bwbarrett - Is PRRTE 3.0.1 the best minimum for the OMPI 5.0 release? |
If Ralph says that PRRTE 3.0.1 is the right minimum to support, that's what we should do. There's not a big install base of PRRTE outside of Open MPI today, so I don't think there's a huge impact to changing the minimum supported PRRTE version. It looks like we need a CR to change the value in the VERSION file. We've tried (failed, but tried) to be more conservative bumping up the minimum supported PMIx version, since things like SLURM already are being built with PMIx and systems will have PMIx libraries already installed. I see you tested with PMIx 4.2.4 but our minimum requirement is 4.1.2, and it would be good to understand why we'd have to update. |
PRRTE 3.0.1 is required to support OMPI's v5.0 feature list and your desired cmd line options. PMIx is a little more complex. You'll need to embed v4.2.4 in order to build the embedded PRRTE, so that is pretty much required. If you are not building PRRTE, then you could possibly get by with some lesser version - you'd have to check and see, depends on the definitions in the OMPI code and how you have protected them. If you are working with external PRRTE and PMIx, then you could connect OMPI to a lesser PMIx since PMIx is cross-version compatible (and PRRTE would be pointing at PMIx v4.2.4 or above). Not sure how you would modify configure to handle all those cases, but I imagine it could be done? 🤷♂️ |
I posted the errors from building w/PMIx 4.1.2 earlier in this thread^^ |
Once PRRTE 3.0.1 is out the door, I'll start updating the docs, submodules, and autoconf things. |
We need to have a discussion about whether we should update the PMIx required version or not. I'm getting really nervous about our continued bumping of required PMIx version. When we started 5.0.0x, the goal was the 3.x series (same as OMPI 4.x) and we've long since lost that. |
I don't recall that discussion - it is simply impossible given the OMPI v5 feature list. You need something in the v4.x series at the minimum. If you want to cover Sessions and ULFM, then you are talking v4.2. |
At the 7/11 developer meeting we discussed this issue and thought that having a telecon to discuss how to proceed might be the most productive way to reach agreement. Can anyone who is interested in attending to discuss please look at my post on the Open MPI slack #general channel to schedule a time? FYI - Slack link https://open-mpi.slack.com/archives/CDGMNGZDY/p1689719829317329 PS. - closing the meeting time poll @ 5pm CT on 7/19 @rhc54 @bwbarrett @hppritcha @jsquyres @awlauria @gpaulsen @janjust @wenduwan @lrbison |
I will make some test builds to see which pieces of OMPI require newer PRRTE. |
I suspect you mean PMIx - OMPI doesn't build against PRRTE and therefore is insensitive (at the build level) to the PRRTE version. OMPI has, however, a functional dependency on PRRTE version for the Sessions and ULFM features. |
Since this issue appears to specifically target the PRRTE support, I doubt there is anything further I can contribute in terms of a meeting. If you want the Sessions and ULFM features to work in OMPI v5, then you need PRRTE v3.0.1. It truly is that simple. Note that those features will not be available when using a direct launch environment. |
Oh, and do keep in mind that the cmd line options, default placement behaviors, |
Yes, PMIx, apologies for mixing them |
After reviewing the above plus some notes, and pondering a bit, I believe there is confusion over what is being decided here. Let's try to clear things up a bit. There are TWO modes for OMPI v5 build and operation: Mode 1: Direct-launch only However, if you are willing to disable those features, then you can extend backward to PMIx v3. I have created a somewhat hacky patch that gets you there (see https://gist.github.com/rhc54/9c615b86f2d43db1c15911390667ab10). You probably would want to clean this up a bunch by adding configure-level logic that disables Sessions and ULFM if the PMIx version is less than 4.0 instead of all the Important note: I don't believe that the OMPI you get this way will support debuggers such as DDT or TotalView as it lacks MPIR integration. I suppose you could add that back in if you want - up to you. Mode 2: OMPI with internal PRRTE If you want to allow users to configure with an external PMIx and the internal PRRTE, then the PRRTE configure will error out if the external PMIx version is less than v4.2.4. How you want to capture all that in your table is up to you! |
Next Wednesday (7/26) at 10-11am CT is the best time, I’ll get an invite out to those who responded. For others, here’s the dial in info: You have been invited to an online meeting, powered by Amazon Chime. Click to join the meeting: https://chime.aws/6617440541 Call in using your phone: To connect from an in-room video system, use one of the following Amazon Chime bridges: Download Amazon Chime at https://aws.amazon.com/chime/download |
More data for this issue: if I disable fault tolerance (with --with-ft=no), I can successfully build and run test programs (with mpirun) on my laptop that use Sessions calls with this set of dependencies:
Here's my configure line: ./configure --disable-mpi-fortran --disable-oshmem-fortran --enable-sphinx --with-libevent=/Users/qkoziol/dev/subspace/OpenMPI/ompi_build_dependencies/install --with-hwloc=/Users/qkoziol/dev/subspace/OpenMPI/ompi_build_dependencies/install --with-pmix=/Users/qkoziol/dev/subspace/OpenMPI/ompi_build_dependencies/install --with-prrte=/Users/qkoziol/dev/subspace/OpenMPI/ompi_build_dependencies/install --prefix=/Users/qkoziol/dev/subspace/OpenMPI/ompi_build_dependencies/install --with-ft=no (disabling FORTRAN probably doesn't have anything to do with this issue :-) ) I'll give it a try on an actual cluster now, and add more info when I have it. Update: pasting my simple test program that uses Sessions calls:
|
I'll only reiterate - do not use PRRTE v2.0.2. It is not supported, it has known issues, and you will come to regret it. You can do it if you like - just don't come to me with any issues you encounter. |
Understood |
Further testing bears out @rhc54's recommendation here - PRRTE v.2.0.2 does not work for me on AWS clusters. Therefore: we should be using PRRTE 3.0.x as the minimum for OMPI 5.0. Which means a hard requirement of PMIx 4.2.x, since PRRTE 3.0.x requires at least that version. |
I'm not certain there's much point in meeting to discuss the options, with this current situation. OMPI 5.0.x needs PRRTE 3.0.x and therefore must also require PMIx 4.2.x. |
As I noted earlier, this isn't technically a correct statement. It reflects what should be embedded in OMPI v5, but that isn't the same as what needs to be required during configure. The problem is that we never had a split in that logic for prior releases, but now we do. How that gets reflected in the configure script and/or the documentation is something you folks will need to decide. Maybe the logic is too complex and you just put a blanket minimum requirement on OMPI v5. Or maybe you add configure logic such that the minimum PMIx requirement changes based on other configure options. Or maybe you come up with some other scheme. Regardless, it isn't as simple as just defining what should be embedded. |
Results from our discussion today:
|
Note that the description of this behavior is part of #11734 |
Not quite accurate. It has nothing to do with the version of PMIx. The limitation stems from Slurm itself not having implemented the host-level support for Sessions and FT. Ditto for Cray environment.
Again, this has nothing to do with the version of PMIx used by Slurm. The problem is that OMPI v5 only supports PMIx-based attachment methods. So even if Bottom line: if you want to use a debugger on OMPI v5 applications, you need to get a PMIx-enabled debugger and use |
related to open-mpi#10657 Signed-off-by: Howard Pritchard <[email protected]>
related to open-mpi#10657 Signed-off-by: Howard Pritchard <[email protected]>
The issues around debugging are covered by the https://docs.open-mpi.org/en/v5.0.x/app-debug/parallel-debug.html and https://docs.open-mpi.org/en/v5.0.x/app-debug/mpir-tools.html sections in the docs. I'll submit a PR for updating the version #'s in the "required support libraries" table today. |
PR to bump the minimum VERSION #'s is up: #11875 |
related to open-mpi#10657 Signed-off-by: Howard Pritchard <[email protected]> (cherry picked from commit a524bd9)
We should make sure this table - https://docs.open-mpi.org/en/v5.0.x/installing-open-mpi/required-support-libraries.html?highlight=prrte
before releasing 5.0.0.
We should also add a section about how to support native launch (esp. via srun) with the 5.0.x release stream.
The text was updated successfully, but these errors were encountered: