Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

coll/base: add support for component name in output #12504

Merged
merged 1 commit into from
May 1, 2024

Conversation

edgargabriel
Copy link
Member

add the ability to output which component provides which collective operation. The feature is controlled by the mca_coll_base_verbose variable. Specifically

mca_coll_base_verbose > 0 and < 20:

  • output will be provided for MPI_COMM_WORLD only, and only for the blocking and non-blocking collectives

mca_coll_base_verbose = 20:

  • output will be provided for all communicators, but only for blocking and non-blocking collectives

mca_coll_base_verbose > 20:

  • output will be provided for all communicators and all collectives (including persistent and ft)

Note that the values are up for negotiation. I am also open to use an entirely new mca parameter that would allow for more natural specification of which communicator/operation we want the output for.

The output looks as follows (small sample):

coll:base:comm_select: communicator MPI_COMM_WORLD rank 0 alltoallv -> tuned
coll:base:comm_select: communicator MPI_COMM_WORLD rank 0 alltoallw -> basic
coll:base:comm_select: communicator MPI_COMM_WORLD rank 0 barrier -> tuned
coll:base:comm_select: communicator MPI_COMM_WORLD rank 0 bcast -> tuned
coll:base:comm_select: communicator MPI_COMM_WORLD rank 0 exscan -> accelerator
coll:base:comm_select: communicator MPI_COMM_WORLD rank 0 gather -> tuned
coll:base:comm_select: communicator MPI_COMM_WORLD rank 0 gatherv -> basic
coll:base:comm_select: communicator MPI_COMM_WORLD rank 0 reduce -> accelerator
coll:base:comm_select: communicator MPI_COMM_WORLD rank 0 reduce_scatter -> tuned
coll:base:comm_select: communicator MPI_COMM_WORLD rank 0 scan -> accelerator
coll:base:comm_select: communicator MPI_COMM_WORLD rank 0 scatter -> tuned
coll:base:comm_select: communicator MPI_COMM_WORLD rank 0 scatterv -> basic
coll:base:comm_select: communicator MPI_COMM_WORLD rank 0 neighbor_allgather -> basic
coll:base:comm_select: communicator MPI_COMM_WORLD rank 0 reduce_local -> basic
coll:base:comm_select: communicator MPI_COMM_WORLD rank 0 iallgather -> libnbc
coll:base:comm_select: communicator MPI_COMM_WORLD rank 0 iallgatherv -> libnbc
...

add the ability to output which component provides which collective
operation. The feature is controlled by the mca_coll_base_verbose
variable. Specifically

mca_coll_base_verbose > 0 and < 20:
  - output will be provided for MPI_COMM_WORLD only, and only
    for the blocking and non-blocking collectives

mca_coll_base_verbose = 20:
  - output will be provided for all communicators, but only
    for blocking and non-blocking collectives

mca_coll_base_verbose > 20:
  - output will be provided for all communicators and all
    collectives (including persistent and ft)

Note that the values are up for negotiation. I am also open
to use an entirely new mca parameter that would allow for more
natural specification of which communicator/operation we want the
output for.

Signed-off-by: Edgar Gabriel <[email protected]>
@edgargabriel
Copy link
Member Author

@bosilca convinced me that the hash-table is not required :-), so second attempt does not use internally a hash-table for mapping module pointers to component names. Thank you !

Copy link
Member

@bosilca bosilca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

We should mention that this is only printing the highest priority module for each collective, which in some cases might not be the one executing the collective, when for whatever reason it uses the fallback mechanism to delegate the collective to another module. Unfortunately, there is no way to track that down, so this is really the best we can do right now.

@bosilca bosilca merged commit 8b4237c into open-mpi:main May 1, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants