Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update utest and ci #603

Merged

Conversation

MinsukJi-NOAA
Copy link
Contributor

@MinsukJi-NOAA MinsukJi-NOAA commented May 27, 2021

PR Checklist

  • Ths PR is up-to-date with the top of all sub-component repositories except for those sub-components which are the subject of this PR. Please consult the ufs-weather-model wiki if you are unsure how to do this.

  • This PR has been tested using a branch which is up-to-date with the top of all sub-component repositories except for those sub-components which are the subject of this PR

  • An Issue describing the work contained in this PR has been created either in the subcomponent(s) or in the ufs-weather-model. The Issue should be created in the repository that is most relevant to the changes in contained in the PR. The Issue and the dependent sub-component PR
    are specified below.

  • If new or updated input data is required by this PR, it is clearly stated in the text of the PR.

Instructions: All subsequent sections of text should be filled in as appropriate.

The information provided below allows the code managers to understand the changes relevant to this PR, whether those changes are in the ufs-weather-model repository or in a subcomponent repository. Ufs-weather-model code managers will use the information provided to add any applicable labels, assign reviewers and place it in the Commit Queue. Once the PR is in the Commit Queue, it is the PR owner's responsiblity to keep the PR up-to-date with the develop branch of ufs-weather-model.

Description

  • Update utest script: Handle global, regional, and coupled tests
  • Update CI workflow: When builds are complete, instead of saving the entire docker image (~730 MB) as an artifact, only save the executable (~77 MB) to save time (~7 min) and improve robustness. This will be important as more tests are added to the CI, and/or larger-size images (e.g. intel oneAPI) are used
  • Add regional thread and decomposition tests to CI
  • Fix to run threads on its own core in ec2 for thread tests
  • Use a slightly different ubuntu-hpc image to match the uid and gid within ec2 instances

Issue(s) addressed

#543
Includes the code changes in #560, and thus replaces #560.

Testing

How were these changes tested? What compilers / HPCs was it tested with? Are the changes covered by regression tests? (If not, why? Do new tests need to be added?) Have regression tests and unit tests (utests) been run? On which platforms and with which compilers? (Note that unit tests can only be run on tier-1 platforms)

  • hera.intel
  • hera.gnu
  • orion.intel
  • cheyenne.intel
  • cheyenne.gnu
  • gaea.intel
  • jet.intel
  • wcoss_cray
  • wcoss_dell_p3

Dependencies

None

@BrianCurtis-NOAA
Copy link
Collaborator

AutoRT: orion jobs sitting in pending, jet started but lots pending still. Will need to add labels on once Hera is back online.

@BrianCurtis-NOAA
Copy link
Collaborator

Machine: jet
Compiler: intel
Job: RT
Repo location: /lfs4/HFIP/h-nems/emc.nemspara/autort/pr/654368039/20210602191514/ufs-weather-model
Please manually delete: /lfs4/HFIP/h-nems/emc.nemspara/RT_RUNDIRS/emc.nemspara/FV3_RT/rt_39017
Test control_c384gdas 036 failed in check_result failed
Test control_c384gdas 036 failed in run_test failed
Please make changes and add the following label back:
jet-intel-RT

@MinsukJi-NOAA
Copy link
Contributor Author

Runtime error occurs when running control (with L127) test in a docker. However, the exact same configuration runs fine with GNU on Hera.
https://github.com/ufs-community/ufs-weather-model/runs/2732472388?check_suite_focus=true

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
free(): invalid next size (normal)
Program received signal SIGABRT: Process abort signal.
#3  0x561f5bacd317 in __mersenne_twister_MOD_random_number_t
	at /home/builder/ufs-weather-model/stochastic_physics/mersenne_twister.F:362
#4  0x561f5bacde03 in __mersenne_twister_MOD_random_number_s
	at /home/builder/ufs-weather-model/stochastic_physics/mersenne_twister.F:325

@BrianCurtis-NOAA
Copy link
Collaborator

Machine: orion
Compiler: intel
Job: RT
Repo location: /work/noaa/nems/emc.nemspara/autort/pr/654368039/20210602141510/ufs-weather-model
Please manually delete: /work/noaa/stmp/bcurtis/stmp/bcurtis/FV3_RT/rt_227346
Test cpld_controlfrac 003 failed in check_result failed
Test cpld_controlfrac 003 failed in run_test failed
Test cpld_restart_bmarkfrac_v16 022 failed in check_result failed
Test cpld_restart_bmarkfrac_v16 022 failed in run_test failed
Please make changes and add the following label back:
orion-intel-RT

@BrianCurtis-NOAA
Copy link
Collaborator

Machine: jet
Compiler: intel
Job: RT
Repo location: /lfs4/HFIP/h-nems/emc.nemspara/autort/pr/654368039/20210602191514/ufs-weather-model
Please manually delete: /lfs4/HFIP/h-nems/emc.nemspara/RT_RUNDIRS/emc.nemspara/FV3_RT/rt_39017
Test control_c384gdas 036 failed in check_result failed
Test control_c384gdas 036 failed in run_test failed
Please make changes and add the following label back:
jet-intel-RT

that test failed on this file: Comparing GFSFLX.GrbF00 .........NOT OK

@MinsukJi-NOAA
Copy link
Contributor Author

Runtime error occurs when running control (with L127) test in a docker. However, the exact same configuration runs fine with GNU on Hera.
https://github.com/ufs-community/ufs-weather-model/runs/2732472388?check_suite_focus=true

With QUILTING=.false., control is able to run.

@junwang-noaa
Copy link
Collaborator

junwang-noaa commented Jun 3, 2021 via email

@MinsukJi-NOAA
Copy link
Contributor Author

  • On Orion, the following 4 tests were re-ran to pass:
    cpld_controlfrac, cpld_restartfrac, cpld_bmarkfrac_v16, cpld_restart_bmarkfrac_v16
  • On Jet, the following 1 test was re-ran to pass:
    control_c384gdas

Copy link
Collaborator

@climbfuji climbfuji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know why the control L127 + post tests don't work in Docker?

@MinsukJi-NOAA
Copy link
Contributor Author

Do we know why the control L127 + post tests don't work in Docker?

@junwang-noaa suggested WRITE_DOPOST=false, which made control L127 to work in Docker. I do not know the underlying reason.

@junwang-noaa
Copy link
Collaborator

junwang-noaa commented Jun 3, 2021 via email

@MinsukJi-NOAA MinsukJi-NOAA merged commit 2c882f0 into ufs-community:develop Jun 3, 2021
@MinsukJi-NOAA MinsukJi-NOAA mentioned this pull request Jun 3, 2021
@MinsukJi-NOAA MinsukJi-NOAA deleted the feature/ci-workflow branch June 3, 2021 17:23
pjpegion pushed a commit to NOAA-PSL/ufs-weather-model that referenced this pull request Apr 4, 2023
* Switch ccpp-physics submodue.

* ccpp ufs-community#17: git action ci yaml update.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
No Baseline Change No Baseline Change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants