Asynchronous I/O requirements #489

jayeshkrishna · 2022-10-17T19:45:45Z

These are suggested features for the asynchronous I/O support in the library.

@ambrad :

Summit: 6 resource sets (RS) per node. 1 rank per RS drives the 1 GPU/RS. a second rank per RS drives I/O for the RS.
can choose async/sync per file. e.g., do it only for restart files.
if MPI comm b/w for I/O can be large, provide an option to throttle it. e.g., a mid-run restart file could take several hours to write asynchronously and with throttled comm. perhaps there would have to be a throttle-override call that would tell PIO to run at max speed at the end of a run.
all operations in addition to writing that can be async should support it: e.g., openfile, initdecomp, closefile
related: support ADIOS vs PNETCDF per file.

rljacob · 2022-10-17T19:54:02Z

Before we get all the way to per file, would per-file-type work? all restart ADIOS vs. all history PNETCDF ?

ambrad · 2022-10-17T19:55:11Z

Tagging others who might participate in this discussion: @PeterCaldwell @mt5555 @bartgol @AaronDonahue

PeterCaldwell · 2022-10-17T19:55:54Z

Regarding Rob's comment above: I suspect just having restarts in ADIOS would be sufficient. I'm not sure we have good timing info for restarts versus normal output. What fraction of our writing time is spent on restarts? I think the answer is "most"... I also can't imagine wanting some output files (not including restart files) in ADIOS and others in PNETCDF. Why wouldn't we have all outputs in a single format?

PeterCaldwell · 2022-10-17T20:00:51Z

Interesting requirement from Andrew about needing to tell I/O to sprint during the last restart-file write at the end of a run. Otherwise the model would be done running and would be waiting around for the small I/O process to finish. I think ICON avoids this by actually launching separate slurm jobs for the output writing so the main code could quit while restarts were still getting produced.

mt5555 · 2022-10-17T22:02:37Z

We should discuss the async model. Years ago, we were spending most of our time aggregating the data down to the I/O processors, and it was clear that just putting the I/O processors on their own ranks was not going to save us any time. Another option is to have the application create async MPI ranks, for which it can send data quickly and then have those tasks call the synchronous SCORPIO. It may be easier for the application to do this - i.e. in the scream case, with 6 MPI tasks per node, SCREAM would just create 6 more MPI tasks per node to collect data and then have those tasks call SCORPIO. I'm not sure which is the right approach - but we need to think about these options.

bartgol · 2022-10-17T22:13:24Z

I think the most important use case for us is when we run on GPUs. In this case, we do have to copy back to host for I/O anyways, so my thought was to pass the data to other MPI rank during this copy: if the deep_copy back to host is performed by separate IO ranks, then the compute ranks are free to reuse the device arrays, and continue the simulation. There is no aggregation in this model, it's simply a "hand-off" of data from one rank to another.

That said, I don't know if this is something we can implement in SCORPIO, since scorpio requires host pointers. It seems like an approach that the app using scorpio has to implement.

jayeshkrishna added the enhancement label Oct 17, 2022

jayeshkrishna self-assigned this Oct 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Asynchronous I/O requirements #489

Asynchronous I/O requirements #489

jayeshkrishna commented Oct 17, 2022 •

edited

Loading

rljacob commented Oct 17, 2022

ambrad commented Oct 17, 2022

PeterCaldwell commented Oct 17, 2022 •

edited

Loading

PeterCaldwell commented Oct 17, 2022

mt5555 commented Oct 17, 2022

bartgol commented Oct 17, 2022 •

edited

Loading

Asynchronous I/O requirements #489

Asynchronous I/O requirements #489

Comments

jayeshkrishna commented Oct 17, 2022 • edited Loading

rljacob commented Oct 17, 2022

ambrad commented Oct 17, 2022

PeterCaldwell commented Oct 17, 2022 • edited Loading

PeterCaldwell commented Oct 17, 2022

mt5555 commented Oct 17, 2022

bartgol commented Oct 17, 2022 • edited Loading

jayeshkrishna commented Oct 17, 2022 •

edited

Loading

PeterCaldwell commented Oct 17, 2022 •

edited

Loading

bartgol commented Oct 17, 2022 •

edited

Loading