Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update restart tests for coupled model #316

Closed
DeniseWorthen opened this issue Dec 3, 2020 · 7 comments · Fixed by #354
Closed

update restart tests for coupled model #316

DeniseWorthen opened this issue Dec 3, 2020 · 7 comments · Fixed by #354
Assignees
Labels
enhancement New feature or request

Comments

@DeniseWorthen
Copy link
Collaborator

Description

After the merge of PR #304, we should update and/or modify existing restart tests for the coupled model.

Solution

This Issue has several aspects, some of which still need to be decided, so I consider this issue open for discussion.

I believe that at a minimum, we should

  • retire the c96mx025 restart test. This was carried over from ufs-s2s and should be replaced by a c96mx100 restart test if we want to retain a low resolution restart test.

  • implement where possible checkpoint-restarting for the restart tests, reducing by one the number of tests that need to be run.

  • implement a restart test for frac_grid.

  • implement an 'overlap' restart test, meaning the the test will overlap the end of one day. Such a test would be for example a restart from hour 12, running for 36h and comparing to a continuous 48h forecast (12h/36h/48h). This references existing Issue add coupled model restart test overlapping 24 hr time boundary #293. If this is implemented is a 'non-overlap' restart still required (12h/12h/24h)?

What is not clear yet to me is which resolutions should be tested for restart and how.

  • The benchmark+frac_grid configuration is the closest to what will be implemented, however it is also our most resource intensive test and we cannot include waves in a restart test at this point. Eventually this would also need to be a L127 test.

  • If we have a benchmark+frac_grid restart test, are other restart tests (c96mx100) still required?

@DeniseWorthen DeniseWorthen added the enhancement New feature or request label Dec 3, 2020
@junwang-noaa
Copy link
Collaborator

I'd suggest to set up restart test with C96mx100 using the benchmarch+frac_grid configuration (except the resolution). If we have this setting I won't expect the restart will not work at high resolution benchmark+frac_grid test.

@DeniseWorthen DeniseWorthen self-assigned this Dec 3, 2020
@JessicaMeixner-NOAA
Copy link
Collaborator

For MOM6, the set-up is very different at 1deg versus 1/4 deg. Therefore, there are many aspects of the code that would be used operationally that would not be tested to really let us know about restart reproducibility in context of MOM6 if the restart test is only at 1deg. I know there is a desire to make tests as small and short as possible, but this is likely not sufficient for MOM6 testing of restarts. @jiandewang can provide more specific details if required.

@junwang-noaa
Copy link
Collaborator

@JessicaMeixner-NOAA @jiandewang can you provide information on what are the features used in benchmark, but can not be used for low resolutions? Also would those features impact the coupled model in terms of model interface for coupled model? Can high resolution standalone MOM6 tests cover these feature testing including restart reproducibility? I am asking because ufs currently support 4 applications, so we do want to get fast RT turnaround time to avoid delays.

@DeniseWorthen
Copy link
Collaborator Author

Currently, the plan is to:

  1. remove the c96mx025 12h/12h/1d restart test and replace it with a c96mx100 12h/36h/48h test. Using checkpoint restarts, this will require two tests: a 48h test with restarts written at 12h intervals and a restart test from the first 12h restart integrating for 36h.

  2. A restart test for the benchmark configuration using 3h/3h/6h. We don't current test the benchmark configuration at 6h so this will require new tests. The other option would be to change the current benchmark test from 1d to 6h. This would also reduce the time required to run the benchmark tests.

  3. Implement a fractional grid restart test matching what is done for (2) above.

@DeniseWorthen
Copy link
Collaborator Author

I have a branch where I've implemented the above items as well as added in Shan's frac grid bmark wave tests from her PR #326 (including options to use L127 input).

For 2), I changed the default time for the cpld_bmark test to 6 hours (from 1d) and used that for the 3h/3h/6h restart test. I think we should probably also reduce the length to 6hours for both the exisiting bmark_wave test and the new fractional grid bmark_wave test. These are really long tests and I'm not sure we gain anything by testing 24hrs vs 6hrs.

I've also implemented a 12h/36h/48h restart test at c192mx050 for the frac grid. My idea was that the physics of the 1/2 deg MOM6 is most similar to the 1/4deg MOM6 according to @jiandewang. This is at least a frac grid long restart test although not at the resolution of the bmark.

I also added a debug test for frac grid (c96mx100).

The current number of cpld tests we actually run is 14. The new count is 19; If we set all the bmark tests to 6hours that would help.

@junwang-noaa
Copy link
Collaborator

junwang-noaa commented Dec 11, 2020 via email

@DeniseWorthen
Copy link
Collaborator Author

On orion, I get:

cpld_control_c192 (1d) : 3min, 288 PE
cpld_control_c384 (1d) : 21min, 318 PE

cpld_bmark (1day): 13min, 480 PE
cpld_bmark_wave (1d) : 23min, 520 PE

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants