-
Notifications
You must be signed in to change notification settings - Fork 865
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Show MPI connectivity map during MPI_INIT #30
Comments
Imported from trac issue 1207. Created by jsquyres on 2008-02-06T13:06:04, last modified: 2014-04-22T16:23:42 |
Trac comment by bosilca on 2008-02-06 13:20:24: At one point we should starting of thinking about how to trim down the size of the MPI shared library. While I agree that such information is useful for the user, I don't think it need to go deeply inside the library. I see it more like an additional tool/utility bundled with Open MPI. |
Trac comment by jjhursey on 2008-02-06 15:15:15: I agree that conceptually this could be a useful tool (or really an addition to the Actually when I was originally designing The limitation is that tools connect though the HNP which is an ORTE layer application, so it has no (or extremely limited) knowledge of OMPI layer constructs. So a tool is unable to access information about OMPI level collective and point-to-point constructs, for example. In the short term, it is easiest to implement this at the moment is to have the Rank 0 process dump this information. In the long term we may want to consider looking again at how tools interact with the MPI job, and think about how we can create a |
Trac comment by jjhursey on 2008-02-07 07:21:41: As another idea for a compressed representation. It would be useful to only display the unique set of parameters used in the job. This is really what MTT is going to want in the short term since capturing and querying a 2D space of connectivity information is difficult. So the following:
Would be represented as:
|
Trac comment by tdd on 2008-02-07 08:08:59: Replying to [comment:1 bosilca]:
I think I disagree here in that you really want this information coming from the actual code so one could detect issues with the actual (B/M)TL picking algorithm and risk possible divergence of the utility and the actual code. I also think you lose a nice quick way for a user to confirm their run really is going the appropriate TLs without having to run 2 programs. I know that last point may seem silly. Note, Sun's original CT base had this feature as a part of an env-var named MPI_SHOW_INTERFACES where it would show different amounts of verbosity for each level thus giving one either a broad idea how things are connected to a real detail view of all the decisions the library considers chosing for BTLs and interfaces. This proved incredibly helpful in debugging complicated customer networks. |
Trac comment by jsquyres on 2008-02-07 08:24:12: Replying to [comment:4 tdd]:
...I think George is talking about a different issue (just overall reducing the size of the MPI library). I agree that this is a good thing to do, and perhaps we can modularize features like this (e.g., make the display map functionality be a DSO plugin that is loaded on demand), but I think that that is outside the scope of this ticket. Please create a new ticket for that kind of functionality; the display map functionality can easily be fit into a plugin framework someday if desired. Thanks. |
Trac comment by jsquyres on 2008-02-07 08:25:11: Replying to [comment:4 tdd]:
Terry: can you include some samples of what the output looked like from when users invoked MPI_SHOW_INTERFACES? |
Trac comment by tdd on 2008-02-07 09:04:38: Replying to [comment:6 jsquyres]:
Ok first here is a description of the option which doesn't completely jive with what you are proposing:
The following are some examples of its usage:
|
Trac comment by jjhursey on 2008-02-07 10:28:30: (In [17398]) A quick try at ticket refs https://svn.open-mpi.org/trac/ompi/ticket/1207. Here we are processing the BML structure attached to ompi_proc_t well after Currently only Rank 0 displays data, and makes no attempt to gather information Examples from this commit on 2 nodes of IU's Odin Machine:
In this example you can see that we have 2 tcp connections to odin002 for each
The above also occurs when passing no mca arguments. But here you can see that |
Trac comment by jsquyres on 2008-02-09 08:07:37: We had a long discussion about this on the phone (Terry, George, = Print Connectivity Map =
= New "preconnect all" functionaliy =
|
Trac comment by jsquyres on 2008-02-25 21:47:03: In taking a first-pass at the "print the map" functionality, I'm running into two problems:
I wanted to run this by everyone before doing it, since it would be a bit bigger change than we thought... |
Trac comment by jsquyres on 2008-03-19 16:10:25: George and I talked about this...
|
Trac comment by jsquyres on 2008-03-19 16:10:47: (In [17881]) Playground for implementing the "print the MPI connection map" |
Trac comment by jsquyres on 2008-03-19 16:16:36: Split the "new MPI preconnect" functionality out into its own ticket: https://svn.open-mpi.org/trac/ompi/ticket/1249. |
Trac comment by jsquyres on 2008-05-29 20:12:06: This unfortunately didn't make the cut for v1.3. |
Trac comment by jsquyres on 2008-07-24 18:59:23: See the SVN tree source:/tmp-public/connect-map; Josh did some initial work in there. |
Trac comment by rhc on 2008-08-22 11:52:17: Jeff asked that I add this here - it represents a request from some power-users at LANL, but I suspect others may want it too:
|
Trac comment by jsquyres on 2009-01-12 13:14:44: This really needs to get done for v1.4. Bumping up to critical. |
Trac comment by jsquyres on 2009-05-07 07:44:16: With the change in release methodology, what we used to call "v1.4" is now called "v1.5". |
Trac comment by jsquyres on 2011-07-12 10:31:01: Bumping to v1.5.5. |
Trac comment by brbarret on 2013-01-09 12:13:40: This isn't a critical issue for 1.7. |
HCOLL: Fix hcoll supported datatype checks corretcly
@gpaulsen @markalle We had a lengthy discussion about this connectivity map yesterday during the 2016 Feb Dallas Open MPI dev meeting, and then a further lengthy conversation about this at dinner last night. Main points:
|
IBM is taking on this feature enhancement. |
See PR #2825 |
Moving to 3. x as it probably will not get in to 2.1. |
@jjhursey I'm going to punt this off any milestone, since it looks like it has died out on your side. |
That's fine. I'll add this to the face-to-face meeting to see where we are at again. @markalle maybe we can chat about this again before the meeting sometime. |
Some discussion at the March 2018 Face-to-Face meeting. Step 1: Display basic table for pt2pt connections (output only)
Step 2: Future
|
…open-mpi#30) Signed-off-by: Joseph Schuchart <[email protected]>
Add support for fallback to previous coll module on non-commutative operations (open-mpi#30) Replace mutexes by atomic operations. Use the correct nbc request type (open-mpi#31) * coll/base: document type casts in ompi_coll_base_retain_* Other minor fixes. Signed-off-by: George Bosilca <[email protected]> Signed-off-by: Joseph Schuchart <[email protected]>
- Add support for fallback to previous coll module on non-commutative operations (#30) - Replace mutexes by atomic operations. - Use the correct nbc request type (for both ibcast and ireduce) * coll/base: document type casts in ompi_coll_base_retain_* - add module-wide topology cache - use standard instead of synchronous send and add mca parameter to control mode of initial send in ireduce/ibcast - reduce number of memory allocations - call the default request completion. - Remove the requests from the Fortran lookup conversion tables before completing and free it. Signed-off-by: George Bosilca <[email protected]> Signed-off-by: Joseph Schuchart <[email protected]> Co-authored-by: Joseph Schuchart <[email protected]>
This is a meta commit, that encapsulate all the ADAPT commits in the master into a single PR for 4.1. The master commits included here are: fe73586, a4be3bb, d712645, c2970a3, e59bde9, ee592f3 and c98e387. Here is a detailed list of added capabilities: * coll/adapt: Fix naming conventions and C11 atomic use * coll/adapt: Remove unused component field in module * Consistent handling of zero counts in the MPI API. * Correctly handle non-blocking collectives tags * As it is possible to have multiple outstanding non-blocking collectives provided by different collective modules, we need a consistent mechanism to allow them to select unique tags for each instance of a collective. * Add support for fallback to previous coll module on non-commutative operations (#30) * Replace mutexes by atomic operations. * Use the correct nbc request type (for both ibcast and ireduce) * coll/base: document type casts in ompi_coll_base_retain_* * add module-wide topology cache * use standard instead of synchronous send and add mca parameter to control mode of initial send in ireduce/ibcast * reduce number of memory allocations * call the default request completion. * Remove the requests from the Fortran lookup conversion tables before completing and free it. * piggybacking Bull functionalities Signed-off-by: Xi Luo <[email protected]> Signed-off-by: George Bosilca <[email protected]> Signed-off-by: Marc Sergent <[email protected]> Co-authored-by: Joseph Schuchart <[email protected]> Co-authored-by: Lemarinier, Pierre <[email protected]> Co-authored-by: pierrele <[email protected]>
This was implemented in
|
It has long been discussed, and I swear there was a ticket about this
at some point but I can't find it now. So I'm filing a new one --
close this as a dupe if someone can find an older one.
OMPI currently uses a negative ACK system to indicate if high-speed
networks are not used for MPI communications. For example, if you
have the openib BTL available but it can't find any active ports in a
given MPI process, it'll display a warning message.
But some users want a ''positive'' acknowledgement of what networks
are being used for MPI communications (this can also help with
regression testing, per a thread on the MTT mailing list). HP MPI
offers this feature, for example. It would be nice to have a simple
MCA parameter that will cause MCW rank 0 to output a connectivity map
during MPI_INIT.
Complications:
communications with each MPI process peer; we only know which ones
we'll try to use when connections are actually established (per
OMPI's lazy connection model for the OB1 PML). But I think that
even outputting this information will be useful.
E.g., MCW rank 0 may use the sm btl to communicate with some MPI
processes, but a different btl to communicate with others. This is
almost certainly a different view than other processes have. The
connectivity information needs to be conveyed on a process-pair
basis (e.g., a 2D chart).
to the PML API.
A first cut could display a simple 2D chart of how OMPI thinks it may
send MPI traffic from each process to each process. Perhaps something
like (OB1 6 process job, 2 processes on each of 3 hosts):
Note that the upper and lower triangular portions of the map are the
same, but it's probably more human-readable if both are output.
However, multiple built-in output formats could be useful, such as:
It may also be worthwhile to investigate a few huersitics to compress
the graph where possible. Some random ideas in this direction:
the openib BTL is being used for inter-node connectivity ''except''
for one node (where IB is malfunctioning, and OMPI fell back to
TCP) -- this is a common case that users/sysadmins want to detect.
Another useful concept might be to show some information about each
endpoint in the connectivity map. E.g., show a list of TCP endpoints
on each process, by interface name and/or IP address. Similar for
other transports. This kind of information can show when/if
multi-rail scenarios are active, etc. For example:
With more information such as interface names, compression of the
output becomes much more important, such as:
Note that these ideas can certainly be implemented in stages; there's
no need to do everything at once.
The text was updated successfully, but these errors were encountered: