-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sixth reconciliation PR from production/RRFS.v1 #896
base: develop
Are you sure you want to change the base?
Conversation
@MatthewPyle-NOAA @dkokron When running UFS regression tests with this code, the control_restart_p8_intel test is failing with a segmentation fault. If you have access to Hera, you can see the logs here: /scratch1/BMC/gmtb/Grant.Firl/stmp2/Grant.Firl/FV3_RT/rt_1304460/control_restart_p8_intel The err log shows the first non-libarary error as: This seems to be related to the changes that were made in this PR. Could you please help to debug this? |
I don't have access to Hera. Does this failure happen on Acorn too?
…On Fri, Jan 3, 2025 at 9:46 AM Grant Firl ***@***.***> wrote:
@MatthewPyle-NOAA <https://github.com/MatthewPyle-NOAA> @dkokron
<https://github.com/dkokron> When running UFS regression tests with this
code, the control_restart_p8_intel test is failing with a segmentation
fault. If you have access to Hera, you can see the logs here:
/scratch1/BMC/gmtb/Grant.Firl/stmp2/Grant.Firl/FV3_RT/rt_1304460/control_restart_p8_intel
The err log shows the first non-libarary error as:
0x0000000002232a28 fv_io_mod_mp_fv_io_read_restart_()
/scratch1/BMC/gmtb/Grant.Firl/ufs-weather-model-grantfirl/FV3/atmos_cubed_sphere/tools/fv_io.F90:495
This seems to be related to the changes that were made in this PR. Could
you please help to debug this?
—
Reply to this email directly, view it on GitHub
<#896 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACODV2DOTNIIHTKZXNEHNNL2I2WGXAVCNFSM6AAAAABTCZ24DSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNRZGQ2DENRTGI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
So far, it's been noted on Hera and Hercules. I'm guessing it would also show up on Acorn, but I don't have access. |
@BrianCurtis-NOAA normally runs the UFS regression suite on acorn. Is he scheduled to do the same for this release? |
This PR isn't necessarily associated with a release. It's just bringing changes from the RRFSv1 release branch into the main develop branch. Perhaps we can ask @BrianCurtis-NOAA to run the failing test only on Acorn for you to see if it fails there and you can have access to the run directory with logs? |
I've run the UFS regression suite before. I was just wondering if Brian had done this already on acorn. Will the following clone get me to a state where I can reproduce the failure? git clone --recursive [email protected]:ufs-community/ufs-weather-model.git CloneUFSForRegressionTesting |
I don't know if they had gotten to Acorn yet or not, but they stopped testing once this issue was noticed. I'd use: |
Runs fine on acorn. Please try disabling the ENABLE_PARALLELRESTART feature in your Hera build. line 39 of FV3/atmos_cubed_sphere/CMakeLists.txt |
Interesting, thanks. I'll try your suggestion on Hera and report back on this PR. |
@dkokron The offending test on Hera passes with -DENABLE_PARALLELRESTART" OFF. |
@bensonr |
@dkokron @bensonr @jkbk2004 FYI, the error is reproducible on Hera. The test completes successfully with @jkbk2004 @MatthewPyle-NOAA How important is it for #818 to be in the develop branch for the SRW App? I could split this PR into just the CCPP component that should pass/merge and then leave the debugging of #818 for a separate PR if that makes sense to do so. |
@grantfirl I would say this functionality isn't important for the SRW App. It is a pretty WCOSS-targeting change for running the RRFS efficiently there. But can it be included with |
Thanks @MatthewPyle-NOAA I'm fine with changing @dkokron @bensonr We can add an issue in atmos_cubed_sphere for someone to debug this later if it is ever necessary for the broken platforms to have this functionality in the future? |
I agree. Set ENABLE_PARALLELRESTART to OFF by default. Then it can be turned ON only on WCOSS2 for now until the issues are resolved on other platforms. |
…lt to avoid RT errors on some platforms (Hera, Hercules at least)
@DusanJovic-NOAA @dkokron @MatthewPyle-NOAA I've turned it off in NOAA-GFDL/GFDL_atmos_cubed_sphere#365 in order to pass RTs for this PR. I'll add an issue in atmos_cubed_sphere for someone to debug this when there is an opportunity. |
@dkokron @MatthewPyle-NOAA By turning ENABLE_PARALLELRESTART OFF in atmos_cubed_sphere, does it mean that the functionality being introduced into the develop branch is effectively not being tested at all by UFS regression tests? If so, should a regression test be added/augmented to turn this on for a platform that it does work on? |
@grantfirl Good question. Might be nice to have, but feel like there always is a push to reduce the number of regression tests. So not really sure. |
I think RRFS depends on this feature, so I vote yes on having the feature enabled when the RT suite is run on WCOSS2. |
@dkokron do you think that ENABLE_PARALLELRESTART CMake option can be moved to somewhere https://github.com/ufs-community/ufs-weather-model/tree/develop/cmake ? I mean to make the option machine specific. |
I'm not part of the RRFS team. |
@jkbk2004 Making it a machine specific option (OFF everywhere but for WCOSS2) sounds good to me. |
Description
This PR contains changes from 2 PRs that went into production/RRFS.v1:
Provides the changes needed to take advantage of FMS parallel IO changes. Changes courtesy of Dan Kokron of GDIT. Will be paired with changes in atmos_cubed_sphere
This PR adds a new diagnostic, instantaneous downward shortwave flux at the surface assuming clear-sky conditions.
Issue(s) addressed
None
Testing
Tested on Hera using full rt.conf. See logs in UWM PR: =
Dependencies
ufs-community/ccpp-physics#235
NOAA-GFDL/GFDL_atmos_cubed_sphere#365
Requirements before merging