-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Redesign the command line system to provide isolation #1176
Conversation
bot:ibm:retest |
As the number of organizations using PRRTE grows, we are beginning to see conflicts surrounding the command line interface. This is the primary interaction point to the user, so it is natural that organizations want to customize it - i.e., to "rebrand" PRRTE to match their needs. The original code utilized a custom command line parser that was rigidly tied to the needs of OpenMPI. Modifying this to create the desired independence would be challenging. Instead, we chose to replace that code with use of the standard "getopt_long" function. This reduced the amount of custom code, though it still requires one to properly deal with the input/output to/from that parser. This represents an initial working prototype of the revised system. Note that it does change the schizo interface. Some minor configuration changes have also crept into this PR and will be separated out prior to commit. Main purpose of this PR is to provide the community an opportunity to look over the changes and provide feedback. Updates to the code will continue. In particular, we need to seperate out the map/rank/bind policy assignments so that the individual organizations can each choose their own defaults. Signed-off-by: Ralph Castain <[email protected]>
Allow each schizo personality to define its own default mapping, ranking, and binding policies. Any they choose not to define will fallback to the PRRTE defaults. Add an MCA param to each schizo component allowing it to silence deprecation warnings. Default the PRRTE component to output them. Default the OMPI component to silence them. Update the OMPI component to set its own ranking policy for the PPR mapping option - leave all else to the defaults. Signed-off-by: Ralph Castain <[email protected]>
Only output them once, when we first parse the command line. Applications submitted via PMIx_Spawn should not generate warnings Signed-off-by: Ralph Castain <[email protected]>
Signed-off-by: Ralph Castain <[email protected]>
application name Don't look at the app's arguments Signed-off-by: Ralph Castain <[email protected]>
weird - getting error about some nvidia library not loading. let's try again bot:ibm:retest |
@jjhursey Looks like the containers for IBM's CI are missing something? Could you take a look? |
Ah I think this is the same hwloc issue we hit on the OMPi side. Let me try to change the configure arguments. 1 min |
bot:ibm:retest |
@jjhursey I think the problem is that you needed to add those configure options to the HWLOC build when constructing the container. I'm not sure where you have that, or if you can simply have it go back to pre-2.5 HWLOC to avoid the problem. |
The container is rebuilding now. Should be an hour or so until it is ready. I'll check back on it tonight. |
bot:ibm:retest |
The hwloc problem seems to be solved, but it looks like there is a problem with the mpir-shim now. |
Yeah, I tried on the other PRs and they all hit the same problem. I'm going to try and locally reproduce, but it may be another symptom of something in the container. This PR was passing last time it was modified - all I did was resolve a minor conflict in the copyrights. |
@jjhursey I'm afraid it is still something in those containers - I tested locally with my containers and the mpir-shim tests pass. Sorry to bother, but can you see what else might have changed? I would suggest backing down the HWLOC version as a starting point as that might have been what triggered the problem. |
bot:ibm:xl:retest |
I turned off the mpir-shim test to see how much further we would get - we failed when the debug tests tried to do an indirect version. It therefore appears that we fail whenever PMIx does a fork/exec of the launcher. I'll try with HWLOC 2.7 on my container - it was using v2.2 before - just in case that change remains the source of the trouble, even with the modified configure line. |
Let's follow up on Slack with the Container issue since it's not directly related to this ticket. I can backlevel it, but I'd also like to know what's going on here. The only thing that should have changed is the PGI compiler and HWLOC levels. |
bot:ibm:retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments/questions. Overall looks good to me.
Signed-off-by: Ralph Castain <[email protected]>
Thanks - did you see my other comments? I had a few Q's on the ompi command line options. |
@rhc54 minor item that may have always been in this way, but i noticed that some of the |
Yeah, there is some scrub work to be done on the help output. I just copied it as-is for now and figured we would iterate from there. |
Signed-off-by: Ralph Castain <[email protected]>
Made a quick pass through things adding a new dummy tool/personality and worked as expected (minus a few goofs on my part). Overall seems to look good. |
Okay, we'll go with this for now - any issues can be dealt with as they surface. |
Redesign the command line system to provide isolation
As the number of organizations using PRRTE grows, we are
beginning to see conflicts surrounding the command line
interface. This is the primary interaction point to the
user, so it is natural that organizations want to customize
it - i.e., to "rebrand" PRRTE to match their needs.
The original code utilized a custom command line parser
that was rigidly tied to the needs of OpenMPI. Modifying
this to create the desired independence would be challenging.
Instead, we chose to replace that code with use of the
standard "getopt_long" function. This reduced the amount
of custom code, though it still requires one to properly
deal with the input/output to/from that parser.
This represents an initial working prototype of the revised
system. Note that it does change the schizo interface. Some
minor configuration changes have also crept into this PR
and will be separated out prior to commit.
Main purpose of this PR is to provide the community an
opportunity to look over the changes and provide feedback.
Updates to the code will continue. In particular, we need
to separate out the map/rank/bind policy assignments so
that the individual organizations can each choose their
own defaults.
Signed-off-by: Ralph Castain [email protected]