ncmpi_wait_all called in a loop #599

wkliao · 2024-07-19T19:14:20Z

Lines 2180 to 2185 in b5393ec

    
           for(int k = 0; k < nreq_blocks; k++) 
        
           { 
        
               assert(req_block_ends[k] >= req_block_starts[k]); 
        
               rcnt = req_block_ends[k] - req_block_starts[k] + 1; 
        
               LOG((1, "ncmpi_wait_all(file=%s, ncid=%d, request range = [%d, %d], num pending requests = %d)", pio_get_fname_from_file(file), file->pio_ncid, req_block_starts[k], req_block_ends[k], nreqs));

The above code fragment calls ncmpi_wait_all in a loop of nreq_blocks times,
which may perform less efficiently than just making a single call, for example,

ierr = ncmpi_wait_all(file->fh, NC_REQ_ALL, NULL, NULL);

This flushes all the pending nonblocking requests.
FYI. NC_REQ_ALL was first introduced in PnetCDF 1.7.0.

The text was updated successfully, but these errors were encountered:

jayeshkrishna · 2024-07-21T18:58:01Z

@dqwu : Some items to consider here before we make any changes,

Please verify if the performance is indeed affected by the code above (I am not sure). Also, ensure that large requests (requests > 2GB/INT_MAX etc) work if we remove the code above with newer versions of PnetCDF. See the PR that introduced the change (for potential reason for introducing the change) and see if simplifying the code helps here with the latest versions of PnetCDF.

Also, if we offload waiting on all requests to PnetCDF do we miss out on detailed error messages related to specific requests (Would we end up providing non-specific error messages to the user when things/writes fail?)?

wkliao · 2024-07-21T22:30:50Z

If breaking an ncmpi_wait_all call into multiple ones was
due to the >2GB limitation, then it makes sense.

FYI. In PnetCDF 1.13.0, this limitation has been lifted.
However, I think MPI-IO has not fully supported > 2GB yet.
ROMIO has been updated to support this feature recently,
yet to propagate to all MPI vendors.

As for the error code, when NC_REQ_ALL is used,
ncmpi_wait_all returns the first error encountered.

I am not sure whether individual request error code is helpful
for Scorpio, as most of the errors will be captured when posting
the nonblocking API calls, e.g. type error, boundary error, etc.
The errors occurred in ncmpi_wait_all are related to MPI-IO.

jayeshkrishna assigned dqwu Jul 21, 2024

jayeshkrishna added the enhancement label Jul 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ncmpi_wait_all called in a loop #599

ncmpi_wait_all called in a loop #599

wkliao commented Jul 19, 2024

jayeshkrishna commented Jul 21, 2024

wkliao commented Jul 21, 2024

ncmpi_wait_all called in a loop #599

ncmpi_wait_all called in a loop #599

Comments

wkliao commented Jul 19, 2024

jayeshkrishna commented Jul 21, 2024

wkliao commented Jul 21, 2024