Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to diags handling #633

Merged
merged 1 commit into from
Oct 30, 2024
Merged

Update to diags handling #633

merged 1 commit into from
Oct 30, 2024

Conversation

forsyth2
Copy link
Collaborator

@forsyth2 forsyth2 commented Oct 16, 2024

Issue resolution

Select one: This pull request is...

  • a bug fix: increment the patch version
  • a small improvement: increment the minor version
  • an incompatible (non-backwards compatible) API change: increment the major version

1. Does this do what we want it to do?

Objectives:

Required:

  • Product Management: I have confirmed with the stakeholders that the objectives above are correct and complete.
  • Testing: I have added or modified at least one "min-case" configuration file to test this change. Every objective above is represented in at least one cfg.
  • Testing: I have considered likely and/or severe edge cases and have included them in testing.

If applicable:

  • Testing: this pull request introduces an important feature or bug fix that we must test often. I have updated the weekly-test configuration files, not just a "min-case" one.
  • Testing: this pull request adds at least one new possible parameter to the cfg. I have tested using this parameter with and without any other parameter that may interact with it.

2. Are the implementation details accurate & efficient?

Required:

  • Logic: I have visually inspected the entire pull request myself.
  • Logic: I have left GitHub comments highlighting important pieces of code logic. I have had these code blocks reviewed by at least one other team member.

3. Is this well documented?

Required:

  • Documentation: by looking at the docs, a new user could easily understand the functionality introduced by this pull request.
  • Documentation in default.ini has been updated. (Parameter documentation redirects to there).

4. Is this code clean?

Required:

  • Readability: The code is as simple as possible and well-commented, such that a new team member could understand what's happening.
  • Pre-commit checks: All the pre-commits checks have passed.

@forsyth2 forsyth2 self-assigned this Oct 16, 2024
@forsyth2 forsyth2 force-pushed the issue-622-land-checks branch from 9f8e681 to c3bbb5b Compare October 16, 2024 19:12
@forsyth2
Copy link
Collaborator Author

Checked for each set in "lat_lon","zonal_mean_xy","zonal_mean_2d","polar","cosp_histogram","meridional_mean_2d","annual_cycle_zonal_mean","zonal_mean_2d_stratosphere","enso_diags","qbo","diurnal_cycle","streamflow","tropical_subseasonal","tc_analysis","lat_lon_land that dependencies are added.

@forsyth2
Copy link
Collaborator Author

forsyth2 commented Oct 16, 2024

As far as I can tell, e3sm_diags.py never had a dependency addition for when lat_lon_land was in sets. That is, #622 (comment) has been a bug ever since #548 was merged and hasn't been caught until now. Perhaps the climo task was just always happening to finish first. (Notice no lat_lon_land in https://github.com/E3SM-Project/zppy/blob/c3f463da95011a6beeab0c6d9fc96a47355d3ab7/zppy/e3sm_diags.py, which is e3sm_diags.py after #548 was merged).

In any case, an explicit dependency check has been added with this PR.

@forsyth2
Copy link
Collaborator Author

The min-case test min_case_e3sm_diags_lat_lon_land also doesn't work. It appears that one wasn't tested run as part of merging #608. (Note the caveat "I haven't actually run these minimal cases because there are quite a few of them. My plan is to modify them as needed when actually using them to debug/test future pull requests." on #608 (comment))

Error description
$ cd /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_e3sm_diags_lat_lon_land_mvm_1_output/test-633/v3.LR.historical_0051/post/scripts
$ grep -v "OK" *status
ERROR (3)
$ grep -in error climo_land_monthly_climo_1985-1988.o607365 
64:ncks: ERROR nco_rgr_wgt() Hail Mary algorithm reports final failure since product of identifed horizontal dimension sizes in the 2D input data file does not equal the map-file 1D dimension size = 21600.
68:ncks: ERROR nco_rgr_wgt() Hail Mary algorithm reports final failure since product of identifed horizontal dimension sizes in the 2D input data file does not equal the map-file 1D dimension size = 21600.
72:ncks: ERROR nco_rgr_wgt() Hail Mary algorithm reports final failure since product of identifed horizontal dimension sizes in the 2D input data file does not equal the map-file 1D dimension size = 21600.
76:ncks: ERROR nco_rgr_wgt() Hail Mary algorithm reports final failure since product of identifed horizontal dimension sizes in the 2D input data file does not equal the map-file 1D dimension size = 21600.
80:ncks: ERROR nco_rgr_wgt() Hail Mary algorithm reports final failure since product of identifed horizontal dimension sizes in the 2D input data file does not equal the map-file 1D dimension size = 21600.
84:ncks: ERROR nco_rgr_wgt() Hail Mary algorithm reports final failure since product of identifed horizontal dimension sizes in the 2D input data file does not equal the map-file 1D dimension size = 21600.
88:ncks: ERROR nco_rgr_wgt() Hail Mary algorithm reports final failure since product of identifed horizontal dimension sizes in the 2D input data file does not equal the map-file 1D dimension size = 21600.
89:ncremap: ERROR Failed to horizontally regrid. cmd_rgr[0] failed. Debug this:
91:ncremap: ERROR Failed to horizontally regrid. cmd_rgr[0] failed. Debug this:
93:ncremap: ERROR Failed to horizontally regrid. cmd_rgr[0] failed. Debug this:
95:ncremap: ERROR Failed to horizontally regrid. cmd_rgr[0] failed. Debug this:
97:ncremap: ERROR Failed to horizontally regrid. cmd_rgr[0] failed. Debug this:
98:ncremap: ERROR Failed to horizontally regrid. cmd_rgr[0] failed. Debug this:
102:ncremap: ERROR Failed to horizontally regrid. cmd_rgr[0] failed. Debug this:
106:ncks: ERROR nco_rgr_wgt() Hail Mary algorithm reports final failure since product of identifed horizontal dimension sizes in the 2D input data file does not equal the map-file 1D dimension size = 21600.
107:ncremap: ERROR Failed to horizontally regrid. cmd_rgr[0] failed. Debug this:
112:ncks: ERROR nco_rgr_wgt() Hail Mary algorithm reports final failure since product of identifed horizontal dimension sizes in the 2D input data file does not equal the map-file 1D dimension size = 21600.
113:ncremap: ERROR Failed to horizontally regrid. cmd_rgr[0] failed. Debug this:
115:srun: error: chr-0499: task 0: Exited with exit code 1
116:srun: error: chr-0495: task 0: Exited with exit code 1
117:srun: error: chr-0495: task 0: Exited with exit code 1
118:srun: error: chr-0495: task 0: Exited with exit code 1
119:srun: error: chr-0499: task 0: Exited with exit code 1
120:srun: error: chr-0494: task 0: Exited with exit code 1
121:srun: error: chr-0494: task 0: Exited with exit code 1
122:srun: error: chr-0494: task 0: Exited with exit code 1
123:srun: error: chr-0499: task 0: Exited with exit code 1
127:ncks: ERROR nco_rgr_wgt() Hail Mary algorithm reports final failure since product of identifed horizontal dimension sizes in the 2D input data file does not equal the map-file 1D dimension size = 21600.
131:ncks: ERROR nco_rgr_wgt() Hail Mary algorithm reports final failure since product of identifed horizontal dimension sizes in the 2D input data file does not equal the map-file 1D dimension size = 21600.
135:ncks: ERROR nco_rgr_wgt() Hail Mary algorithm reports final failure since product of identifed horizontal dimension sizes in the 2D input data file does not equal the map-file 1D dimension size = 21600.
136:ncremap: ERROR Failed to horizontally regrid. cmd_rgr[0] failed. Debug this:
138:ncremap: ERROR Failed to horizontally regrid. cmd_rgr[0] failed. Debug this:
140:ncremap: ERROR Failed to horizontally regrid. cmd_rgr[0] failed. Debug this:
142:srun: error: chr-0496: task 0: Exited with exit code 1
143:srun: error: chr-0496: task 0: Exited with exit code 1
144:srun: error: chr-0496: task 0: Exited with exit code 1
145:ncclimo: ERROR monthly regrid cmd_rgr[1] failed. Debug this:

$ cd /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_e3sm_diags_lat_lon_land_mvm_2_output/test-633/v3.LR.historical_0051/post/scripts
$ grep -v "OK" *status
climo_land_monthly_climo_1995-1998.status:ERROR (3)
e3sm_diags_lnd_monthly_mvm_lnd_model_vs_model_1995-1998_vs_1985-1988.status:WAITING 607367
# similar regridding errors

@forsyth2
Copy link
Collaborator Author

So far so good on my latest version of the cfg from #622 (comment) though: the other two diags tasks are running while the tc_analysis diags task waits for tc_analysis to finish.

@forsyth2
Copy link
Collaborator Author

Test the cfg from #622

Re: the issues with the cfg at #622 (comment).

The cfg (5th iteration from the original cfg)
# Test `main` after merging #623
[default]
# NEW in v3: Set `dry_run = False`
dry_run = False
input = /lcrc/group/e3sm2/ac.wlin/E3SMv3/v3.LR.historical_0051
#output = /lcrc/group/e3sm2/ac.zhang40/E3SMv3/v3.LR.historical_0920
output = /lcrc/group/e3sm/ac.forsyth2/issue_622v5/v3.LR.historical_0051
case = v3.LR.historical_0051
#www = /lcrc/group/e3sm/public_html/diagnostic_output/ac.zhang40/E3SMv3_0920
www = /lcrc/group/e3sm/public_html/diagnostic_output/ac.forsyth2/issue_622v5
partition = compute
environment_commands = "source /lcrc/soft/climate/e3sm-unified/load_latest_e3sm_unified_chrysalis.sh"
campaign = "water_cycle"

[climo]
active = True
years = "1985:2014:30",
walltime = "1:00:00"

  [[ atm_monthly_180x360_aave ]]
  input_subdir = "archive/atm/hist"
  mapping_file = map_ne30pg2_to_cmip6_180x360_aave.20200201.nc
  frequency = "monthly"

  [[ atm_monthly_diurnal_8xdaily_180x360_aave ]]
  input_subdir = "archive/atm/hist"
  input_files = "eam.h3"
  mapping_file = map_ne30pg2_to_cmip6_180x360_aave.20200201.nc
  vars = "PRECT"
  frequency = "diurnal_8xdaily"

  [[ land_monthly_climo ]]
  frequency = "monthly"
  input_files = "elm.h0"
  input_subdir = archive/lnd/hist
  vars = ""

[ts]
active = True
years = "1985:2014:30"
walltime = "00:50:00"

  [[ atm_monthly_180x360_aave ]]
  input_subdir = "archive/atm/hist"
  input_files = "eam.h0"
  frequency = "monthly"
  mapping_file = /home/ac.zender/data/maps/map_ne30pg2_to_cmip6_180x360_aave.20200201.nc
  vars = "FSNTOA,FLUT,FSNT,FLNT,FSNS,FLNS,SHFLX,QFLX,TAUX,TAUY,PRECC,PRECL,PRECSC,PRECSL,TS,TREFHT,CLDTOT,CLDHGH,CLDMED,CLDLOW,U,ICEFRAC,LANDFRAC,OCNFRAC,PS,CLDICE,CLDLIQ,T,AODDUST"
  ts_fmt = "cmip"

  [[ atm_daily_180x360_aave ]]
  input_subdir = "archive/atm/hist"
  input_files = "eam.h1"
  frequency = "daily"
  mapping_file = /home/ac.zender/data/maps/map_ne30pg2_to_cmip6_180x360_aave.20200201.nc
  # Needed for Wheeler Kiladis
  vars = "FLUT,PRECT,U850"

  [[ land_monthly ]]
  input_subdir = "archive/lnd/hist"
  input_files = "elm.h0"
  frequency = "monthly"
  mapping_file = map_r05_to_cmip6_180x360_aave.20231110.nc
  vars = "FSH,RH2M,LAISHA,LAISUN,QINTR,QOVER,QRUNOFF,QSOIL,QVEGE,QVEGT,SOILICE,SOILLIQ,SOILWATER_10CM,TSA,TSOI,H2OSNO,TOTLITC,CWDC,SOIL1C,SOIL2C,SOIL3C,SOIL4C,WOOD_HARVESTC,TOTVEGC,NBP,GPP,AR,HR"
  extra_vars = "landfrac"
  ts_fmt = "cmip"

  [[ rof_monthly ]]
  input_subdir = "archive/rof/hist"
  input_files = "mosart.h0"
  mapping_file = ""
  frequency = "monthly"
  vars = "RIVER_DISCHARGE_OVER_LAND_LIQ"
  extra_vars = 'areatotal2'

[tc_analysis]
active = True
scratch = "/lcrc/globalscratch/$USER"
# Make walltime very short to reproduce this error
# NEW in v4: try with longer run time
walltime = "01:00:00"
years = "1985:2014:30",

[e3sm_diags]
active = True
walltime = "4:00:00"
years = "1985:2014:30",
ts_num_years = 30
# NEW in v3: Added `ref_end_yr` (zppy.utils.ParameterNotProvidedError: ref_end_yr)
ref_end_yr = 2014
ref_start_yr = 1985
ref_final_yr = 2014
multiprocessing = True
num_workers = 8

  [[ atm_monthly_180x360_aave ]]
  short_name = 'v3.LR.historical_0051'
  grid = '180x360_aave'
  reference_data_path = '/lcrc/soft/climate/e3sm_diags_data/obs_for_e3sm_diags/climatology'
  obs_ts = '/lcrc/soft/climate/e3sm_diags_data/obs_for_e3sm_diags/time-series'
  dc_obs_climo = '/lcrc/group/e3sm/public_html/e3sm_diags_test_data/unit_test_complete_run/obs/climatology'
  climo_diurnal_subsection = "atm_monthly_diurnal_8xdaily_180x360_aave"
  climo_diurnal_frequency = "diurnal_8xdaily"
  ts_daily_subsection = "atm_daily_180x360_aave"
  sets="lat_lon","zonal_mean_xy","zonal_mean_2d","polar","cosp_histogram","meridional_mean_2d","annual_cycle_zonal_mean","qbo","diurnal_cycle","zonal_mean_2d_stratosphere","aerosol_aeronet","tropical_subseasonal","tc_analysis",
  output_format_subplot = "pdf",

  [[ lnd_monthly_mvm_lnd ]]
  # Test model-vs-model using the same files as the reference
  grid = 'native'
  # NEW in v5: renamed this parameter
  climo_land_subsection = "land_monthly_climo"
  diff_title = "Difference"
  partition = "compute"
  qos = "regular"
  short_name = v3.LR.piControl
  ref_name = "20231209.v3.LR.piControl-spinup.chrysalis"
  ref_start_yr = 0051
  ref_final_yr = 0100
  ref_years = "0051-0100",
  reference_data_path = "/lcrc/group/e3sm/ac.zhang40/tests/20231209.v3.LR.piControl-spinup.chrysalis_land_diags/post/lnd/native/clim"
  run_type = "model_vs_model"
  sets = "lat_lon_land",
  short_ref_name = "20231209.v3.LR.piControl-spinup"
  swap_test_ref = False
  tag = "model_vs_model"
  ts_num_years_ref = 50

  [[atm_monthly_180x360_aave_mvm]]
  ref_years = "0001-0050",
  ref_start_yr = 1
  ref_final_yr = 50
  ts_num_years = 30 
  ts_num_years_ref = 10
  ts_subsection = "atm_monthly_180x360_aave"
  short_name = 'v3alpha04-COARE.piControl'
  grid = '180x360_aave'
  ref_name = '20230924.v3alpha04_trigrid.piControl.chrysalis'
  short_ref_name = 'v3alpha04-CTL.piControl'
  tag = 'v3alpha04i-COARE_vs_CTL'
  run_type = "model_vs_model"
  reference_data_path = '/lcrc/group/e3sm2/ac.xzheng/E3SMv3_dev/20230924.v3alpha04_trigrid.piControl.chrysalis/post/atm/180x360_aave/clim'
  climo_diurnal_subsection = "atm_monthly_diurnal_8xdaily_180x360_aave"
  climo_diurnal_frequency = "diurnal_8xdaily"
  climo_subsection = "atm_monthly_180x360_aave"
  sets="lat_lon","zonal_mean_xy","zonal_mean_2d","polar","cosp_histogram","meridional_mean_2d","annual_cycle_zonal_mean","qbo","diurnal_cycle","zonal_mean_2d_stratosphere","aerosol_budget"
  diff_title = 'Difference'
$ cd /lcrc/group/e3sm/ac.forsyth2/issue_622v5/v3.LR.historical_0051/post/scripts
$ grep -v "OK" *status
# No errors

These links show viewers:

Some sets still need better handling

Looking at the sets used in this cfg, it looks like "aerosol_aeronet" and "aerosol_budget" don't appear much in the zppy code:

$ git grep -n aerosol
zppy/templates/cryosphere.cfg:2:sets= "lat_lon","zonal_mean_xy","zonal_mean_2d","polar","cosp_histogram","meridional_mean_2d","enso_diags","qbo","diurnal_cycle","annual_cycle_zonal_mean","streamflow","zonal_mean_2d_stratosphere","aerosol_aeronet","aerosol_budget"
zppy/templates/default.ini:205:# All available sets (17) = "aerosol_aeronet","aerosol_budget","annual_cycle_zonal_mean","area_mean_time_series","cosp_histogram","diurnal_cycle","enso_diags","lat_lon","meridional_mean_2d","polar","qbo","streamflow","tc_analysis", "tropical_subseasonal", "zonal_mean_2d","zonal_mean_2d_stratosphere","zonal_mean_xy"
zppy/templates/default.ini:209:sets = string_list(default=list("lat_lon","zonal_mean_xy","zonal_mean_2d","polar","cosp_histogram","meridional_mean_2d","annual_cycle_zonal_mean","zonal_mean_2d_stratosphere","aerosol_aeronet","aerosol_budget"))
zppy/templates/water_cycle.cfg:2:sets = "lat_lon","zonal_mean_xy","zonal_mean_2d","polar","cosp_histogram","meridional_mean_2d","enso_diags","qbo","diurnal_cycle","annual_cycle_zonal_mean","streamflow","zonal_mean_2d_stratosphere","aerosol_aeronet","aerosol_budget"

That means those 2 sets don't occur in either the .py or .bash file for E3SM Diags. These were added in #289 and #352. However, these 2 PRs only updated the default sets for e3sm_diags. It appears that parameter checking and dependency checking in e3sm_diags.py and e3sm_diags.bash have not included them. I suppose the dependency handling hasn't come up because they've typically been included with other sets that did force a dependency requirement.

Example error from /lcrc/group/e3sm/ac.forsyth2/issue_622v5/v3.LR.historical_0051/post/scripts/e3sm_diags_atm_monthly_180x360_aave_mvm_v3alpha04i-COARE_vs_CTL_1985-2014_vs_0001-0050.o607385:

2024-10-16 15:19:15,254 [ERROR]: core_parameter.py(_run_diag:269) >> Error in e3sm_diags.driver.aerosol_budget_driver
Traceback (most recent call last):
  File "/lcrc/soft/climate/e3sm-unified/base/envs/e3sm_unified_1.10.0_chrysalis/lib/python3.10/site-packages/e3sm_diags/parameter/core_parameter.py", line 266, in _run_diag
    single_result = module.run_diag(self)
  File "/lcrc/soft/climate/e3sm-unified/base/envs/e3sm_unified_1.10.0_chrysalis/lib/python3.10/site-packages/e3sm_diags/driver/aerosol_budget_driver.py", line 148, in run_diag
    metrics_dict_test[aerosol] = generate_metrics_dic(
  File "/lcrc/soft/climate/e3sm-unified/base/envs/e3sm_unified_1.10.0_chrysalis/lib/python3.10/site-packages/e3sm_diags/driver/aerosol_budget_driver.py", line 82, in generate_metrics_d\
ic
    elvemis = data.get_climo_variable(f"{aerosol}_CLXF", season)
  File "/lcrc/soft/climate/e3sm-unified/base/envs/e3sm_unified_1.10.0_chrysalis/lib/python3.10/site-packages/e3sm_diags/driver/utils/dataset.py", line 161, in get_climo_variable
    variables = self._get_climo_var(filename, *args, **kwargs)
  File "/lcrc/soft/climate/e3sm-unified/base/envs/e3sm_unified_1.10.0_chrysalis/lib/python3.10/site-packages/e3sm_diags/driver/utils/dataset.py", line 386, in _get_climo_var
    derived_var = func(*variables)
  File "/lcrc/soft/climate/e3sm-unified/base/envs/e3sm_unified_1.10.0_chrysalis/lib/python3.10/site-packages/e3sm_diags/derivations/acme.py", line 1654, in <lambda>
    (("bc_a?_CLXF",), lambda *x: molec_convert_units(sum(x), 12.0)),
  File "/lcrc/soft/climate/e3sm-unified/base/envs/e3sm_unified_1.10.0_chrysalis/lib/python3.10/site-packages/e3sm_diags/derivations/acme.py", line 175, in molec_convert_units
    if var.units == "molec/cm2/s":
AttributeError: 'int' object has no attribute 'units'

@chengzhuzhang I'll look into this more, but let me know if anything stands out as immediately wrong (either in how zppy is calling Diags or in Diags itself).

Action items to do before merging this PR

  • Debug the missing set described above
  • Update e3sm_diags.py and e3sm_diags.bash to include parameter/dependency checks for the aerosol sets.
  • default.ini shows 17 possible sets for e3sm_diags. Update the weekly comprehensive tests to test all 17 -- that is, make sure none were inadvertently missed in Improve input validation and testing #628.
  • Get the min-case mvm-land test working

@forsyth2
Copy link
Collaborator Author

@chengzhuzhang I need more context for the "aerosol_aeronet","aerosol_budget" sets. #289 and #352 only added these to default sets.

  • That implies everything about them is covered through CoreParameter, correct?
  • I'm assuming the only dependency is on the climo task? Is that true?

AttributeError: 'int' object has no attribute 'units' implies somehow Diags is getting an int instead of a variable, but it's unclear if that's because of something in the Diags code or a missing processing step in zppy. If there really doesn't need to be extra processing of these two sets beyond adding them to the default sets, that would seem to imply the error is in E3SM Diags then.

# min_case_e3sm_diags_depend_on_ts: "enso_diags","qbo",
# min_case_e3sm_diags_diurnal_cycle: "diurnal_cycle",
# min_case_e3sm_diags_streamflow: "streamflow",
# min_case_e3sm_diags_tc_analysis: "tc_analysis",
# min_case_e3sm_diags_tropical_subseasonal: "tropical_subseasonal",
sets = "lat_lon","zonal_mean_xy","zonal_mean_2d","polar","cosp_histogram","meridional_mean_2d","annual_cycle_zonal_mean","zonal_mean_2d_stratosphere","enso_diags","qbo","diurnal_cycle","streamflow","tc_analysis","tropical_subseasonal",
sets = "lat_lon","zonal_mean_xy","zonal_mean_2d","polar","cosp_histogram","meridional_mean_2d","annual_cycle_zonal_mean","zonal_mean_2d_stratosphere","enso_diags","qbo","diurnal_cycle","streamflow","tc_analysis","tropical_subseasonal","aerosol_aeronet","aerosol_budget"
Copy link
Collaborator Author

@forsyth2 forsyth2 Oct 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All 17 sets excluding area_mean_time_series which I think isn't used much now?

@chengzhuzhang
Copy link
Collaborator

@chengzhuzhang I need more context for the "aerosol_aeronet","aerosol_budget" sets. #289 and #352 only added these to default sets.

* That implies everything about them is covered through `CoreParameter`, correct?

Yes, this is correct

* I'm assuming the only dependency is on the `climo` task? Is that true?

Yes, this is true

AttributeError: 'int' object has no attribute 'units' implies somehow Diags is getting an int instead of a variable, but it's unclear if that's because of something in the Diags code or a missing processing step in zppy. If there really doesn't need to be extra processing of these two sets beyond adding them to the default sets, that would seem to imply the error is in E3SM Diags then.

This seems to be a diags error.

@forsyth2 forsyth2 changed the title Add checks for lat_lon_land Update to diags handling Oct 18, 2024
@forsyth2
Copy link
Collaborator Author

@chengzhuzhang I've renamed this PR to better cover that the changes affect a number of sets. We need two changes to the e3sm_diags code base before I can proceed though:

Thanks!

@chengzhuzhang
Copy link
Collaborator

Thanks. yes, it makes sense to update the upstream tool first.

@chengzhuzhang
Copy link
Collaborator

chengzhuzhang commented Oct 23, 2024

@chengzhuzhang I've renamed this PR to better cover that the changes affect a number of sets. We need two changes to the e3sm_diags code base before I can proceed though:

* Move the non-empty `dat` file check to the `tc_analysis` set in E3SM Diags. Then I can remove that error in the `tc_analysis` task in `zppy`. (Addresses [Improve input validation and testing #628 (comment)](https://github.com/E3SM-Project/zppy/pull/628#discussion_r1805574596))

* Fix the aerosol type (int vs variable) bug. Then, I can properly test the `aerosol` sets in `zppy` (Addresses [Update to diags handling #633 (comment)](https://github.com/E3SM-Project/zppy/pull/633#issuecomment-2420037419))

Thanks!

@forsyth2 item one has been taken care of by E3SM-Project/e3sm_diags@ffbdf28.
Though I'm not sure if item two is a blocker for zppy, we can improve error handling in e3sm diags (which I filed an issue), but it doesn't seem to be critical for running zppy...

@forsyth2
Copy link
Collaborator Author

forsyth2 commented Oct 23, 2024

The only thing is no plots will generate for the aerosol sets if we run into that bug in the weekly testing too (now that the aerosol sets have been added to weekly testing). But correct, zppy can still function.

@chengzhuzhang
Copy link
Collaborator

The only thing is no plots will generate for the aerosol sets if we run into that bug in the weekly testing too (now that the aerosol sets have been added to weekly testing). But correct, zppy can still function.

The aerosl_budget table won't be created because the variables are not included in the standard v3 output that zppy weekly tests run with.

@forsyth2 sorry I accidentally messed up your comment by editing...

@forsyth2 forsyth2 force-pushed the issue-622-land-checks branch 3 times, most recently from 527aed4 to f30b399 Compare October 29, 2024 18:28
@forsyth2 forsyth2 force-pushed the issue-622-land-checks branch from f30b399 to cd8d349 Compare October 30, 2024 00:38
Copy link
Collaborator Author

@forsyth2 forsyth2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chengzhuzhang This is ready for review. You can skip tests/integration/generated since those are auto-generated files. In this self-review, I've outlined 3 key changes and made other explainer comments.

@@ -35,4 +35,4 @@ walltime = "#expand diags_walltime#"

[[ atm_monthly_180x360_aave ]]
climo_subsection = "atm_monthly_180x360_aave"
sets = "lat_lon","zonal_mean_xy","zonal_mean_2d","polar","cosp_histogram","meridional_mean_2d","annual_cycle_zonal_mean","zonal_mean_2d_stratosphere",
sets = "lat_lon","zonal_mean_xy","zonal_mean_2d","polar","cosp_histogram","meridional_mean_2d","annual_cycle_zonal_mean","zonal_mean_2d_stratosphere","aerosol_aeronet","aerosol_budget",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Key Change 1: adding in the aerosol sets to tests.

@@ -34,7 +34,7 @@ short_name = "#expand case_name#"
walltime = "#expand diags_walltime#"

[[ lnd_monthly_mvm_lnd ]]
climo_subsection = "land_monthly_climo"
climo_land_subsection = "land_monthly_climo"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Key Change 2: specifying a climo subsection specific to land

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@forsyth2 I have a clarification question regarding to backward compatibility. What would be the consequence if in an old zppy config file, a user uses climo_subsection = "land_monthly_climo"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, so that wouldn't have actually had any effect. And therefore there is nothing to be backwards compatible with.

This block is new in this PR:

    if "lat_lon_land" in c["sets"]:
        check_parameter_defined(c, "climo_land_subsection")
        dependencies.append(
            os.path.join(
                script_dir, f"climo_{c['climo_land_subsection']}{status_suffix}"
            )
        )

Notice, lat_lon_land wasn't included in:

def add_climo_dependencies(
    c: Dict[str, Any], dependencies: List[str], script_dir: str
) -> None:
    depend_on_climo: Set[str] = set(
        [
            "lat_lon",
            "zonal_mean_xy",
            "zonal_mean_2d",
            "polar",
            "cosp_histogram",
            "meridional_mean_2d",
            "annual_cycle_zonal_mean",
            "zonal_mean_2d_stratosphere",
            "aerosol_aeronet",
            "aerosol_budget",
        ]
    )

So basically, including climo_subsection as a parameter in this block of the cfg was simply unused information before this PR.

It only appeared like it was useful, because the land climo task often just happens to finish before the lnd_monthly_mvm_lnd diags task runs anyway.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying. I think this PR is good to go!

# min_case_e3sm_diags_depend_on_ts: "enso_diags","qbo",
# min_case_e3sm_diags_diurnal_cycle: "diurnal_cycle",
# min_case_e3sm_diags_streamflow: "streamflow",
# min_case_e3sm_diags_tc_analysis: "tc_analysis",
# min_case_e3sm_diags_tropical_subseasonal: "tropical_subseasonal",
# TODO: Add "tc_analysis" back in after empty dat is resolved.
sets = "lat_lon","zonal_mean_xy","zonal_mean_2d","polar","cosp_histogram","meridional_mean_2d","annual_cycle_zonal_mean","zonal_mean_2d_stratosphere","enso_diags","qbo","diurnal_cycle","streamflow","tropical_subseasonal",
# TODO: Add "aerosol_budget" back in once that's working for v3.
sets = "lat_lon","zonal_mean_xy","zonal_mean_2d","polar","cosp_histogram","meridional_mean_2d","annual_cycle_zonal_mean","zonal_mean_2d_stratosphere","enso_diags","qbo","diurnal_cycle","streamflow","tropical_subseasonal","aerosol_aeronet",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left "aerosol_budget" off the weekly v3 test since that won't work at the moment.

Comment on lines +536 to +545
c = {"sets": ["diurnal_cycle"]}
c.update(base)
dependencies = []
self.assertRaises(
ParameterNotProvidedError,
add_climo_dependencies,
c,
dependencies,
"script_dir",
)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test that climo_diurnal_subsection is checked for if diurnal_cycle is requested.

(In this unit test, we have "guess_path_parameters": True, "guess_section_parameters": True, so an error will be raised if this parameter is not provided. These options are off-by-default, meaning zppy will instead try to guess the value.)

Comment on lines +547 to +561
c = {"sets": ["lat_lon_land"], "climo_land_subsection": "lndsub"}
c.update(base)
dependencies = []
add_climo_dependencies(c, dependencies, "script_dir")
self.assertEqual(dependencies, ["script_dir/climo_lndsub_1980-1990.status"])
c = {"sets": ["lat_lon_land"]}
c.update(base)
dependencies = []
self.assertRaises(
ParameterNotProvidedError,
add_climo_dependencies,
c,
dependencies,
"script_dir",
)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test that climo_land_subsection is checked for if lat_lon_land is requested.

Comment on lines +464 to +469
def test_check_parameter_defined(self):
c = {"a": 1, "b": 2, "c": ""}
check_parameter_defined(c, "a")
self.assertRaises(ParameterNotProvidedError, check_parameter_defined, c, "c")
self.assertRaises(ParameterNotProvidedError, check_parameter_defined, c, "d")

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test the check_parameter_defined function.

Comment on lines +244 to +245
"aerosol_aeronet",
"aerosol_budget",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marked the aerosol sets as depending on climo.

Comment on lines -93 to -99
# If cyclones_stitch file is empty, exit
if ! [ -s ${result_dir}cyclones_stitch_${file_name}.dat ]; then
cd {{ scriptDir }}
echo 'ERROR (1)' > {{ prefix }}.status
exit 1
fi

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Key Change 3: Removed this block since this is not an error per-se, as discussed. Instead, E3SM Diags will do this check via E3SM-Project/e3sm_diags@ffbdf28

@forsyth2 forsyth2 marked this pull request as ready for review October 30, 2024 00:50
@chengzhuzhang
Copy link
Collaborator

chengzhuzhang commented Oct 30, 2024

@forsyth2 the PR looks good, and thanks for the comments! I just had a clarification regarding the change of climo_subsection for land.

@chengzhuzhang chengzhuzhang self-requested a review October 30, 2024 18:25
Copy link
Collaborator

@chengzhuzhang chengzhuzhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above.

@forsyth2
Copy link
Collaborator Author

Thanks @chengzhuzhang! Let me know if my answer to your question is sufficient, and then I can merge this.

@forsyth2 forsyth2 merged commit 408e9ad into main Oct 30, 2024
4 checks passed
@forsyth2 forsyth2 deleted the issue-622-land-checks branch October 30, 2024 19:31
zhangshixuan1987 pushed a commit to zhangshixuan1987/zppy that referenced this pull request Nov 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: Issues with tc_analysis in zppy
2 participants