Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Asynchronous I/O requirements #489

Open
jayeshkrishna opened this issue Oct 17, 2022 · 6 comments
Open

Asynchronous I/O requirements #489

jayeshkrishna opened this issue Oct 17, 2022 · 6 comments
Assignees

Comments

@jayeshkrishna
Copy link
Contributor

jayeshkrishna commented Oct 17, 2022

These are suggested features for the asynchronous I/O support in the library.

@ambrad :

  • Summit: 6 resource sets (RS) per node. 1 rank per RS drives the 1 GPU/RS. a second rank per RS drives I/O for the RS.
  • can choose async/sync per file. e.g., do it only for restart files.
  • if MPI comm b/w for I/O can be large, provide an option to throttle it. e.g., a mid-run restart file could take several hours to write asynchronously and with throttled comm. perhaps there would have to be a throttle-override call that would tell PIO to run at max speed at the end of a run.
  • all operations in addition to writing that can be async should support it: e.g., openfile, initdecomp, closefile
  • related: support ADIOS vs PNETCDF per file.
@rljacob
Copy link
Member

rljacob commented Oct 17, 2022

Before we get all the way to per file, would per-file-type work? all restart ADIOS vs. all history PNETCDF ?

@ambrad
Copy link
Member

ambrad commented Oct 17, 2022

Tagging others who might participate in this discussion: @PeterCaldwell @mt5555 @bartgol @AaronDonahue

@PeterCaldwell
Copy link

PeterCaldwell commented Oct 17, 2022

Regarding Rob's comment above: I suspect just having restarts in ADIOS would be sufficient. I'm not sure we have good timing info for restarts versus normal output. What fraction of our writing time is spent on restarts? I think the answer is "most"... I also can't imagine wanting some output files (not including restart files) in ADIOS and others in PNETCDF. Why wouldn't we have all outputs in a single format?

@PeterCaldwell
Copy link

Interesting requirement from Andrew about needing to tell I/O to sprint during the last restart-file write at the end of a run. Otherwise the model would be done running and would be waiting around for the small I/O process to finish. I think ICON avoids this by actually launching separate slurm jobs for the output writing so the main code could quit while restarts were still getting produced.

@mt5555
Copy link

mt5555 commented Oct 17, 2022

We should discuss the async model. Years ago, we were spending most of our time aggregating the data down to the I/O processors, and it was clear that just putting the I/O processors on their own ranks was not going to save us any time. Another option is to have the application create async MPI ranks, for which it can send data quickly and then have those tasks call the synchronous SCORPIO. It may be easier for the application to do this - i.e. in the scream case, with 6 MPI tasks per node, SCREAM would just create 6 more MPI tasks per node to collect data and then have those tasks call SCORPIO. I'm not sure which is the right approach - but we need to think about these options.

@bartgol
Copy link
Contributor

bartgol commented Oct 17, 2022

I think the most important use case for us is when we run on GPUs. In this case, we do have to copy back to host for I/O anyways, so my thought was to pass the data to other MPI rank during this copy: if the deep_copy back to host is performed by separate IO ranks, then the compute ranks are free to reuse the device arrays, and continue the simulation. There is no aggregation in this model, it's simply a "hand-off" of data from one rank to another.

That said, I don't know if this is something we can implement in SCORPIO, since scorpio requires host pointers. It seems like an approach that the app using scorpio has to implement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants