-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Asynchronous I/O requirements #489
Comments
Before we get all the way to per file, would per-file-type work? all restart ADIOS vs. all history PNETCDF ? |
Tagging others who might participate in this discussion: @PeterCaldwell @mt5555 @bartgol @AaronDonahue |
Regarding Rob's comment above: I suspect just having restarts in ADIOS would be sufficient. I'm not sure we have good timing info for restarts versus normal output. What fraction of our writing time is spent on restarts? I think the answer is "most"... I also can't imagine wanting some output files (not including restart files) in ADIOS and others in PNETCDF. Why wouldn't we have all outputs in a single format? |
Interesting requirement from Andrew about needing to tell I/O to sprint during the last restart-file write at the end of a run. Otherwise the model would be done running and would be waiting around for the small I/O process to finish. I think ICON avoids this by actually launching separate slurm jobs for the output writing so the main code could quit while restarts were still getting produced. |
We should discuss the async model. Years ago, we were spending most of our time aggregating the data down to the I/O processors, and it was clear that just putting the I/O processors on their own ranks was not going to save us any time. Another option is to have the application create async MPI ranks, for which it can send data quickly and then have those tasks call the synchronous SCORPIO. It may be easier for the application to do this - i.e. in the scream case, with 6 MPI tasks per node, SCREAM would just create 6 more MPI tasks per node to collect data and then have those tasks call SCORPIO. I'm not sure which is the right approach - but we need to think about these options. |
I think the most important use case for us is when we run on GPUs. In this case, we do have to copy back to host for I/O anyways, so my thought was to pass the data to other MPI rank during this copy: if the deep_copy back to host is performed by separate IO ranks, then the compute ranks are free to reuse the device arrays, and continue the simulation. There is no aggregation in this model, it's simply a "hand-off" of data from one rank to another. That said, I don't know if this is something we can implement in SCORPIO, since scorpio requires host pointers. It seems like an approach that the app using scorpio has to implement. |
These are suggested features for the asynchronous I/O support in the library.
@ambrad :
The text was updated successfully, but these errors were encountered: