Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When using btls always consider the self component #1482

Closed
wants to merge 3 commits into from

Conversation

hjelmn
Copy link
Member

@hjelmn hjelmn commented Mar 21, 2016

This PR also contains the necessary infrastructure to support this new feature. This PR reflects the reality that btl/self is always needed when using btls for communication.

@hjelmn
Copy link
Member Author

hjelmn commented Mar 21, 2016

@jsquyres, @bosilca Comments?

@jsquyres jsquyres added this to the v2.1.0 milestone Mar 21, 2016
enum {
/** Always consider this component for selection. For this flag to
* work properly the component must always be built statically. */
MCA_BASE_COMPONENT_FLAG_ALLWAYS_CONSIDER = 1,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ALWAYS has one L, not 2. 🙊

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hah, misspelled it once and emacs was happy to complete it. fixing.

This commit adds a new flag to the mca_base_component_t structure that
specifies component flags. The only flag currently support is
MCA_BASE_COMPONENT_FLAG_ALLWAYS_CONSIDER which forces the selection
logic in the MCA base to always consider a component whether or not it
was specified. In order to use this support a component must be built
statically and not dynamically. For example, if btl:self is built
statically and sets this flag in its component then -mca btl
vader,openib would provide vader, openib, and self.

Signed-off-by: Nathan Hjelm <[email protected]>
This commit makes it so the btl/self component will always be
considered for selection in the btl framework.

Signed-off-by: Nathan Hjelm <[email protected]>
@jjhursey
Copy link
Member

Just to clarify, with this commit the user need only:
-mca btl tcp
to get the current tcp,self. Which would be the same if they say:
-mca btl tcp,self

What happens if the user does:
-mca btl ^self

From my reading of the PR it look like this will be silently ignored by the MCA system. Correct?
I think we should throw an error in that last case. Something like: You tried to exclude a component that is flagged as always to be considered

@hjelmn
Copy link
Member Author

hjelmn commented Mar 21, 2016

@jjhursey That is correct and I agree we should probably print an error is they try to exclude a component that can not be excluded. Will update the PR shortly.

This commit adds a show_help() message when a user explicitly attempts
to disable a component that can not be disabled. After printing the
show_help message the component is enabled and execution continues.

Signed-off-by: Nathan Hjelm <[email protected]>
@hjelmn
Copy link
Member Author

hjelmn commented Mar 21, 2016

Ok, the show help is now in there as a warning:

➜  examples git:(always_self) ✗ mpirun -n 2 -mca btl ^self ./ring_c
--------------------------------------------------------------------------
A request was made to exclude a component from consideration that
must always be considered.

Framework: btl
Component: self
--------------------------------------------------------------------------
Process 0 sending 10 to 1, tag 201 (2 processes in ring)
Process 0 sent to 1
Process 0 decremented value: 9
Process 0 decremented value: 8
Process 0 decremented value: 7
Process 0 decremented value: 6
Process 0 decremented value: 5
Process 0 decremented value: 4
Process 0 decremented value: 3
Process 0 decremented value: 2
Process 0 decremented value: 1
Process 0 decremented value: 0
Process 0 exiting
Process 1 exiting
[mordor:85215] 3 more processes have sent help message help-mca-base.txt / framework-param:exclude-always-component
[mordor:85215] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

@bosilca
Copy link
Member

bosilca commented Mar 22, 2016

As no other BTL takes care of local communications, how it is possible that the initialization stage was successfully completed ?

@hjelmn
Copy link
Member Author

hjelmn commented Mar 22, 2016

Initialization completed because the self btl is forced. That is the idea behind the PR. Allow components to be forced on.

[framework-param:exclude-always-component]
A request was made to exclude a component from consideration that
must always be considered.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make this super clear for the user (per George's comment) that this component will be considered anyway.

Maybe change to:

Warning: A request was made to exclude a component from consideration that
must always be considered. This component (noted below) will -not- be excluded
from consideration. The program will continue running with this component being
made available for selection.

Framework: %s
Component: %s

What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My $0.02: if someone tries to disable it and it can't be disabled, we should abort (in line with the philosophy of "a human asked for X, and OMPI can't provide X, so OMPI needs to abort and let a human figure it out").

@bosilca
Copy link
Member

bosilca commented Mar 22, 2016

I understand that having to add self might became an annoyance. However, it is unexpected, that despite the user explicit request not to select self, we forced it's inclusion.

We are limiting the flexibility of our MCA components. With this patch, even if someone implements a different self component, she will not be able to test/run it on an installed Open MPI. If instead of forcing self indiscriminately, we strictly abide by the meaning of the inclusion/exclusion parameters, we do not prevent such developments.

Would it be enough to add "self" to the mca_btl_base_include variable to obtain a similar effect ?

@hjelmn
Copy link
Member Author

hjelmn commented Mar 22, 2016

@bosilca I agree that it might limit the usefulness of the MCA system for developers. The idea is to make it so users do not have to worry about including a component that should always be there. I can add yet another MCA parameter to disable this behavior so developers who know what they are doing can disable forced selection.

@jjhursey
Copy link
Member

@hjelmn Having a parameter to override the forced loading seems a bit odd to me in terms of semantics. It also causes a problem for my eventual usecase - where I want to force a component to be there always with no means for the user to exclude it. However, I could propose to add that back end when the time comes.

@hjelmn
Copy link
Member Author

hjelmn commented Mar 22, 2016

@jjhursey Good point. Also, right now there is a way to override self if a replacement is being tested. A developer can simply set btl_self_exclusivity to 0.

@bosilca
Copy link
Member

bosilca commented Mar 23, 2016

I just noticed this is breaking the backward compatibility with the other version by decreasing the length of the component name. Not sure the expected benefit outweights the drawback.

So let me reask my question: Would it be enough to add "self" to the mca_btl_base_include variable (during the MCA parameter registration) to obtain a similar effect (improving our user friendliness), but without breaking backward compatibility nor limiting the scope of the MCA ?

@hjelmn
Copy link
Member Author

hjelmn commented Mar 23, 2016

The component name has not been shortened. I took 4 bytes out of the reserved to handle the flags so this is a backwards compatible change as reserved is always 0. As for adding self to the mca_btl_base_include parameter, there is currently no such parameter. The btl parameter is the only selection parameter and if we set it before registration then it can get overridden by the environment, a file, or the command line. That is what we have now. We can add it after registration which will have more or less the same effect as this PR but moves the responsibility down to the base. I have no objections to handling this functionality in that way but I think this PR provides a cleaner solution. If a component is always required by not also require it to be static.

@hjelmn
Copy link
Member Author

hjelmn commented Mar 23, 2016

Also, if we use the selection parameter it will need to be added in both the framework register an open functions because of MPI_T.

@jsquyres
Copy link
Member

@hjelmn Does this summarize the current status of this PR:

  • @bosilca is worried about the loss of flexibility in the MCA system
  • @jjhursey would like to see this because he has a different use case for the same functionality (a licensing/DRM component)
  • @jsquyres asked for the system to error (not warn) when a user asks to exclude a component that cannot be excluded (I think @jjhursey will need this, too)

@ibm-ompi
Copy link

Test passed.

@jsquyres jsquyres modified the milestones: Future, v2.1.0 Sep 26, 2016
Copy link
Member

@jsquyres jsquyres left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a Signed-off-by line to this PR's commit.

@hjelmn hjelmn closed this Aug 31, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants