P3 lookup text file is being read by all MPI ranks -- can cause issue with filesystems #6654

ndkeen · 2024-10-01T18:25:28Z

For cases that use P3 (note I assumed there were such cases in E3SM, but I'm not currently finding any in the set I've been testing...), we are reading a small text file in a poor parallel method (by letting each MPI rank read the same file).
I was surprised to find we are doing this and surely it was a mistake as this is never a good idea.
While file is small, it still causes issues with the filesystems and NERSC admins are noticing.
It could also cause a slowdown (or even stall/hang).

I have been testing a quick fix to have rank 0 read the file and broadcast data to others, which seems to be BFB, but will need work to properly implement.

NERSC has even suggesting we move our inputdata from CFS (which uses DVS) to scratch (Lustre).
They have said we can have scratch space that is not purged for this purpose.
In general, I've been testing performance of reading from scratch and it seems about the same, but if sole reason for moving is to avoid complications such as this, hopefully we can just fix.

I also made an issue in scream (will link) as same problem exists there, but implementation of fix may be slightly different.

mahf708 · 2024-10-01T19:17:36Z

xref #5953 --- I hope we can fix that too when we fix this. In general, I find the P3 table business to be quite complicated and caused a bit of issues for me and others.

ndkeen · 2024-10-01T21:36:27Z

Is this data something that could be placed directly on the heap at init time?

mahf708 · 2024-10-01T21:56:43Z

Yes; also my understanding is that the whole table could actually be calculated at runtime ...

ndkeen · 2024-10-01T22:11:11Z

Oh that would even be better. The file in question is I think this one. It's 32MB and 31k lines (of text floats).

perlmutter-login37% ls -lh /global/cfs/cdirs/e3sm/inputdata/atm/scream/tables/p3_lookup_table_1.dat-v4.1.1
-rw-rw-r--+ 1 ndk ndk 3.2M Sep 28 11:37 /global/cfs/cdirs/e3sm/inputdata/atm/scream/tables/p3_lookup_table_1.dat-v4.1.1

erinethomas · 2024-10-03T20:20:13Z

A very similar issue is occurring with E3SM+WW3 coupled simulations on perlmutter. It takes over 45 minutes to initialize the wave model with the current settings. This problem with WW3 reading the input file also linearly scales with the number of nodes requested : double the number of nodes = twice the initialization time. This is very problematic. The only solution I have found is moving the file WAVEWATCHIII reads out of the CFS directory.

ndkeen · 2024-10-04T21:52:24Z

Erin: For this issue, I'm just wanting to find a solution for a specific place in the code where it's clear we are having all rank read the same file. In general, that's a bad idea, but in practice, might only run into problems with large MPI's. It sounds like what you are seeing is general slowness of reading files from CFS, which could be caused by different things -- though if you do know all ranks reading it serially, you certainly want to fix that. Reading from scratch would still have the same problem. It might be better to create a new issue to describe the problem you are seeing and how to reproduce.

erinethomas · 2024-10-04T21:55:25Z

hi Noel - Thanks for the feedback - I can make a new issue. Reading the file from scratch does NOT seem to have the same issue. so it seems to be just CFS.

mahf708 · 2024-10-04T21:59:22Z

hi Noel - Thanks for the feedback - I can make a new issue. Reading the file from scratch does NOT seem to have the same issue. so it seems to be just CFS.

I too would be very curious to see your example, so I hope you could open an issue and point us to the code and reproducer. I assume you don't see any issue on chrysalis or any other hpc, right?

mahf708 · 2024-10-04T23:19:57Z

@erinethomas when you have a chance, could you please test with the file on CFS but change the type to cdf5? Long story short, I had a similar issue (but for me, it was getting completely stuck in reading the file) and then when I move the file from CFS to SCRATCH, it worked. When I changed the file type from classic to cdf5, it also worked.

erinethomas · 2024-10-07T20:09:41Z

@mahf708 - the file is an ascii text file... I will open a new issue to further discuss the specifics soon.

sarats · 2024-10-08T06:23:32Z

NERSC has even suggesting we move our inputdata from CFS (which uses DVS) to scratch (Lustre).
They have said we can have scratch space that is not purged for this purpose.

I think this is good to do as Lustre is better suited for this purpose.

ndkeen · 2024-11-14T18:45:36Z

A branch to read P3 lookup table by 1 rank and broadcast to others is here.
ndk/p3/read-txt-table-with-1rank

ndkeen added Machine Files inputdata Changes affecting inputdata collection on blues pm-cpu Perlmutter at NERSC (CPU-only nodes) input file labels Oct 1, 2024

mahf708 added the help wanted label Oct 1, 2024

mahf708 added EAMxx PRs focused on capabilities for EAMxx eam labels Oct 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

P3 lookup text file is being read by all MPI ranks -- can cause issue with filesystems #6654

P3 lookup text file is being read by all MPI ranks -- can cause issue with filesystems #6654

ndkeen commented Oct 1, 2024 •

edited

Loading

mahf708 commented Oct 1, 2024

ndkeen commented Oct 1, 2024

mahf708 commented Oct 1, 2024

ndkeen commented Oct 1, 2024

erinethomas commented Oct 3, 2024 •

edited

Loading

ndkeen commented Oct 4, 2024

erinethomas commented Oct 4, 2024

mahf708 commented Oct 4, 2024

mahf708 commented Oct 4, 2024

erinethomas commented Oct 7, 2024

sarats commented Oct 8, 2024

ndkeen commented Nov 14, 2024

P3 lookup text file is being read by all MPI ranks -- can cause issue with filesystems #6654

P3 lookup text file is being read by all MPI ranks -- can cause issue with filesystems #6654

Comments

ndkeen commented Oct 1, 2024 • edited Loading

mahf708 commented Oct 1, 2024

ndkeen commented Oct 1, 2024

mahf708 commented Oct 1, 2024

ndkeen commented Oct 1, 2024

erinethomas commented Oct 3, 2024 • edited Loading

ndkeen commented Oct 4, 2024

erinethomas commented Oct 4, 2024

mahf708 commented Oct 4, 2024

mahf708 commented Oct 4, 2024

erinethomas commented Oct 7, 2024

sarats commented Oct 8, 2024

ndkeen commented Nov 14, 2024

ndkeen commented Oct 1, 2024 •

edited

Loading

erinethomas commented Oct 3, 2024 •

edited

Loading