-
Notifications
You must be signed in to change notification settings - Fork 360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Runtime errors with ERS.f09_g16_g.MALISIA
#6600
Comments
OK, the issue may be a system issue and/or intermittent.
Will leave open until I see a few more daily passes. |
Thanks @ndkeen -- please let me know what you figure out. Do the log or error files give you any more clues? |
Noting what looks like to be a different fail, but same test. on pm-cpu, the test has been failing last 4-5 time.
|
ERS.f09_g16_g.MALISIA.gcp12_gnu
ERS.f09_g16_g.MALISIA
We merged a MALI PR when it started having issues. Can you tell if it's the first run that fails or the second one that's a restart? |
I'm suspicious it's a gnu compiler thing -- the same test passes with intel. I'm testing it with gnu on chrysalis right now |
it passed on chrysalis with gnu, so not that |
I am pretty sure that in both cases (gcp12 and pm-cpu), it was the second run of ERS that failed. |
The gcp12 test today failed in the same way as described above. I can also repeat the fail on pm-cpu:
|
I just ran some tests on gcp12, where I increased the number of MPI's. And it does look like there is an increased chance of this error as number of MPI's increase. At 512 tasks, I was able to run 3 times in a row with same fail on this machine. The default tests only use 16 tasks for GLC. I made a complete copy of the case (with 3 fails) on perlmutter here:
fwiw, I also just tested |
Let me check with the MALI people -- something is not right. Thanks for all the testing |
MALI update to fix issues from earlier PR causing sporadic test failures Including a variable that was deactivated in the globalStats stream caused sporadic failures during the second run of some ERS tests on several platform/compiler combinations. That variable is now only included when MALI is using Albany. Also updates a namelist default that had been missed but does not change answers. Fixes #6600 [NML] for configurations with MALI [BFB]
On gcp12, I'm seeing an error with a test that was working before.
In init
The text was updated successfully, but these errors were encountered: