Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ncmpi_wait_all called in a loop #599

Open
wkliao opened this issue Jul 19, 2024 · 2 comments
Open

ncmpi_wait_all called in a loop #599

wkliao opened this issue Jul 19, 2024 · 2 comments
Assignees

Comments

@wkliao
Copy link
Member

wkliao commented Jul 19, 2024

for(int k = 0; k < nreq_blocks; k++)
{
assert(req_block_ends[k] >= req_block_starts[k]);
rcnt = req_block_ends[k] - req_block_starts[k] + 1;
LOG((1, "ncmpi_wait_all(file=%s, ncid=%d, request range = [%d, %d], num pending requests = %d)", pio_get_fname_from_file(file), file->pio_ncid, req_block_starts[k], req_block_ends[k], nreqs));

The above code fragment calls ncmpi_wait_all in a loop of nreq_blocks times,
which may perform less efficiently than just making a single call, for example,

ierr = ncmpi_wait_all(file->fh, NC_REQ_ALL, NULL, NULL);

This flushes all the pending nonblocking requests.
FYI. NC_REQ_ALL was first introduced in PnetCDF 1.7.0.

@jayeshkrishna
Copy link
Contributor

@dqwu : Some items to consider here before we make any changes,

Please verify if the performance is indeed affected by the code above (I am not sure). Also, ensure that large requests (requests > 2GB/INT_MAX etc) work if we remove the code above with newer versions of PnetCDF. See the PR that introduced the change (for potential reason for introducing the change) and see if simplifying the code helps here with the latest versions of PnetCDF.

Also, if we offload waiting on all requests to PnetCDF do we miss out on detailed error messages related to specific requests (Would we end up providing non-specific error messages to the user when things/writes fail?)?

@wkliao
Copy link
Member Author

wkliao commented Jul 21, 2024

If breaking an ncmpi_wait_all call into multiple ones was
due to the >2GB limitation, then it makes sense.

FYI. In PnetCDF 1.13.0, this limitation has been lifted.
However, I think MPI-IO has not fully supported > 2GB yet.
ROMIO has been updated to support this feature recently,
yet to propagate to all MPI vendors.

As for the error code, when NC_REQ_ALL is used,
ncmpi_wait_all returns the first error encountered.

I am not sure whether individual request error code is helpful
for Scorpio, as most of the errors will be captured when posting
the nonblocking API calls, e.g. type error, boundary error, etc.
The errors occurred in ncmpi_wait_all are related to MPI-IO.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants