Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use MPI_Allreduce for error code checking #595

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

wkliao
Copy link
Member

@wkliao wkliao commented Jul 3, 2024

This is to fix #520

@jayeshkrishna
Copy link
Contributor

@dqwu : Let us discuss this change before merging it into master (performance impact, should this be tied to the user error handler etc)

@wkliao
Copy link
Member Author

wkliao commented Jul 18, 2024

Actually, when PnetCDF is used, all the MPI communications, bcast or allreduce.
used for inq APIs' error checking are not necessary, because PnetCDF
guarantees the metadata consistency among all MPI processes.

I can make more changes to this PR.

@wkliao
Copy link
Member Author

wkliao commented Jul 19, 2024

I think using MPI_Bcast for inquired arguments is necessary when
the number of I/O tasks is less than the total number of MPI tasks.

However, when all MPI tasks are I/O tasks, those MPI_Bcast calls used
for error codes and inquiring arguments are not necessary for PnetCDF.
The cost of MPI_Bcast can be expensive when the number of variables
to be "inquired" is large.

When other I/O methods is used, I think Allreduce is the right way
to check the error codes, especially for those libraries that do not
guarantee the metadata consistency.

@jayeshkrishna
Copy link
Contributor

@wkliao : Is it possible to get meta-data consistency with PnetCDF across all MPI processes while using a subset of these processes for all read/write operations (meta-data/file_header is available for all processes in MPI_COMM_WORLD, or a parent mpi comm, for ex while you use a subset MPI Comm for read/write)?

@wkliao
Copy link
Member Author

wkliao commented Jul 21, 2024

The metadata consistency applies to the MPI processes in the
MPI communicator used in the file create/open call. File create/open
call returns a file handler, ncid, which is used by a process
to do I/O and retrieve metadata.

If Scorpio created a sub-communicator and used it to call file create/open,
then those MPI processes in MPI_COMM_WORLD but not in the
sub-communicator will not obtain an ncid and therefore not be able to
retrieve the metadata.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

error code checking
3 participants