Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in regression test compare netcdf files #296

Closed
MinsukJi-NOAA opened this issue Nov 20, 2020 · 1 comment · Fixed by #396
Closed

Error in regression test compare netcdf files #296

MinsukJi-NOAA opened this issue Nov 20, 2020 · 1 comment · Fixed by #396
Assignees
Labels
bug Something isn't working

Comments

@MinsukJi-NOAA
Copy link
Contributor

Description

On Orion, @jiandewang notes regression tests passing when he knows the results are different from the baseline. Logfile indicates the alternate comparison method (compare_ncfile.py) is being used.

To Reproduce:

I have not been able to reproduce; in my run, the same regression tests failed with compare_ncfile detecting differences in netcdf files

output logs
Jiande's:

baseline dir = /work/noaa/nems/emc.nemspara/RT/NEMSfv3gfs/develop-20201118/INTEL/cpld_control_ccpp
working dir  = /work/noaa/stmp/jiwang/stmp/jiwang/FV3_RT/rt_133263/cpld_control_prod
Checking test 001 cpld_control results ....
 Comparing phyf024.tile1.nc ............ALT CHECK......OK
 Comparing dynf024.tile1.nc ............ALT CHECK......OK
 Comparing RESTART/coupler.res .........OK
 Comparing RESTART/fv_core.res.nc .........OK
 Comparing RESTART/fv_core.res.tile1.nc ............ALT CHECK......OK

Mine:

baseline dir = /work/noaa/nems/emc.nemspara/RT/NEMSfv3gfs/develop-20201118/INTEL/cpld_control_ccpp
working dir  = /work/noaa/stmp/jminsuk/stmp/jminsuk/FV3_RT/rt_222345/cpld_control_prod
Checking test 001 cpld_control results ....
 Comparing phyf024.tile1.nc ............ALT CHECK......NOT OK
 Comparing dynf024.tile1.nc ............ALT CHECK......NOT OK
 Comparing RESTART/coupler.res .........OK
 Comparing RESTART/fv_core.res.nc .........OK
 Comparing RESTART/fv_core.res.tile1.nc ............ALT CHECK......NOT OK
@MinsukJi-NOAA MinsukJi-NOAA added the bug Something isn't working label Nov 20, 2020
@MinsukJi-NOAA MinsukJi-NOAA self-assigned this Nov 20, 2020
@MinsukJi-NOAA
Copy link
Contributor Author

I was able to reproduce @jiandewang 's results with the following changes in rt.sh:

-  export PATH=/work/noaa/nems/emc.nemspara/soft/miniconda3/bin:$PATH
-  export PYTHONPATH=/work/noaa/nems/emc.nemspara/soft/miniconda3/lib/python3.8/site-packages
-  ECFLOW_START=/work/noaa/nems/emc.nemspara/soft/miniconda3/bin/ecflow_start.sh
+  export PATH=/work/noaa/fv3-cam/djovic/ecflow/bin:$PATH
+  export PYTHONPATH=/work/noaa/fv3-cam/djovic/ecflow/lib/python2.7/site-packages
+  ECFLOW_START=/work/noaa/fv3-cam/djovic/ecflow/bin/ecflow_start.sh

With the above changes:

  1. compare_ncfile.py fails because it cannot load necessary libraries (import *)
  2. Line 276 in rt_utils.sh is not able to capture this failure and returns 0 as the value of d
  3. netcdf file comparisons are considered successful

What happened in Jiande's case was rt.sh was using previous instances of ecflow_server (/work/noaa/fv3-cam/djovic/ecflow/bin/ecflow_server) instead of the new one (/work/noaa/nems/emc.nemspara/soft/miniconda3/bin/ecflow_server).

MinsukJi-NOAA added a commit to MinsukJi-NOAA/ufs-weather-model that referenced this issue Nov 23, 2020
MinsukJi-NOAA added a commit to MinsukJi-NOAA/ufs-weather-model that referenced this issue Jan 26, 2021
MinsukJi-NOAA added a commit to MinsukJi-NOAA/ufs-weather-model that referenced this issue Feb 1, 2021
* Fix rt.sh -n option to work with new rt.conf MACHINES format.
* Fix compare_nc ufs-community#296.
* Make compare_ncfile.py return 2 if files are different since it returns 1 if error occurs.
* Add variable dimension check in compare_ncfile.py.
* Use compare_ncfile only if it's a *.nc file. Add error messages.
MinsukJi-NOAA added a commit to MinsukJi-NOAA/ufs-weather-model that referenced this issue Feb 12, 2021
epic-cicd-jenkins pushed a commit that referenced this issue Apr 17, 2023
This PR modifies the script that runs the WE2E tests (run_experiments.sh) as well as the individual WE2E configuration files to allow tests to run on hera and cheyenne without the need to manually change settings (e.g. directories) in the individual test configuration files.  This capability can be easily extended to other platforms by adding appropriate stanzas in run_experiments.sh.

## DESCRIPTION OF CHANGES:
* Set the following workflow parameters in the run_experiments.sh script and write them to the workflow configuration file instead of having them defined in each WE2E configuration file (i.e. remove them from each WE2E configuration file):
 MACHINE, ACCOUNT, EXPT_SUBDIR, USE_CRON_TO_RELAUNCH, CRON_RELAUNCH_INTVL_MNTS, VERBOSE
Note that all these parameters except EXPT_SUBDIR can now be set on the command line when calling run_experiments.sh.  If they are not set on the command line, they get set to default values.  Also, EXPT_SUBDIR always gets set to the name of the WE2E test.
* Add new arguments stmp, ptmp, and verbose to run_experiments.sh so that users can specify them on the command line if they don't like the defaults.
* In run_experiments.sh, source the default workflow configuration file (config_defaults.sh) to have all user-specifiable workflow variables defined in some way (even if some of those are set to nonsensical default values).
* Put in a check to make sure the CCPP physics suite definition file exists in the ufs-weather-model repo.
* Bug fix:  Change default blending halo (HALO_BLEND) to 0 (no blending) to avoid bug in halo-blending PR.  Bug is that the make_lbcs task does not create a blending zone (i.e. it assumes halo_blend is zero).

## TESTS CONDUCTED:
Ran all WE2E tests except user_donwload_extrn_files on hera and cheyenne.  All passed except regional_010, which has a preexisting bug related to FV3 namelist settings.  Did not run user_donwload_extrn_files because it interrupts the progression of the test script (it needs improvements; but this PR does not affect downloading of external model files from NOMADS).

Note that on cheyenne, the make_ics, make_lbcs, and run_post tasks often have to be run multiple times before they succeed (especially the latter two).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant