Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for multiple time series datasets via glob and fix enso_diags set #866

Merged
merged 27 commits into from
Oct 28, 2024

Conversation

tomvothecoder
Copy link
Collaborator

@tomvothecoder tomvothecoder commented Oct 8, 2024

Description

Summary of changes

  • Add support for glob filepath
    • Replace open_dataset() with open_mfdataset() in _get_time_series_dataset_obj()
    • Update _get_time_series_filepaths and _get_matching_time_series_filepaths to return a list of filepath(s) or None if no filepaths found
    • Call _subset_and_load_vars() with time series datasets
      • Update list of variables to subset for time series datasets -- add areatotal2, lat, and lon for arm_diags set
    • Update _get_time_slice() to support parsing a list of filepath(s) for start and end years via _parse_years_from_filepaths()
  • Fix enso_diags (related errors)
    • Fix calculate_nino_index_model() not catching the correct IOError message before trying to get the sst dataset using the "TS" variable
    • Fix slow time slice retrieval with bounds -- need to perform .load() with time series dataset for downstream operations
    • Results after fixes (successful):
     2024-10-09 13:48:27,949 [INFO]: run.py(_add_parent_attrs_to_children:372) >> ['results_dir', 'diff_title', 'num_workers', 'short_test_name', 'multiprocessing']
     2024-10-09 13:48:27,950 [INFO]: run.py(_add_parent_attrs_to_children:372) >> ['results_dir', 'diff_title', 'num_workers', 'short_test_name', 'multiprocessing']
     2024-10-09 13:48:27,950 [INFO]: run.py(_add_parent_attrs_to_children:372) >> ['results_dir', 'diff_title', 'test_name', 'num_workers', 'multiprocessing']
     2024-10-09 13:48:33,776 [INFO]: e3sm_diags_driver.py(_save_env_yml:58) >> Saved environment yml file to: model_vs_obs_1987-1988/prov/environment.yml
     2024-10-09 13:48:33,777 [INFO]: e3sm_diags_driver.py(_save_parameter_files:69) >> Saved command used to: model_vs_obs_1987-1988/prov/cmd_used.txt
     2024-10-09 13:48:33,781 [INFO]: e3sm_diags_driver.py(_save_python_script:133) >> Saved Python script to: model_vs_obs_1987-1988/prov/run_script.py
     2024-10-09 13:48:34,866 [INFO]: enso_diags_driver.py(run_diag_map:71) >> run_type: model_vs_obs
     2024-10-09 13:48:34,866 [INFO]: enso_diags_driver.py(run_diag_scatter:168) >> run_type: model_vs_obs
     2024-10-09 13:48:34,948 [INFO]: enso_diags_driver.py(calculate_nino_index_model:299) >> Handling the following exception by looking for surface temperature: No files found for target variable SST or derived variables ([('sst',), ('TS', 'OCNFRAC'), ('SST',)]) in /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_output/test-diags-no-cdat-20240917/v3.LR.historical_0051/post/atm/180x360_aave/ts/monthly/2yr.
     2024-10-09 13:48:34,948 [INFO]: enso_diags_driver.py(calculate_nino_index_model:299) >> Handling the following exception by looking for surface temperature: No files found for target variable SST or derived variables ([('sst',), ('TS', 'OCNFRAC'), ('SST',)]) in /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_output/test-diags-no-cdat-20240917/v3.LR.historical_0051/post/atm/180x360_aave/ts/monthly/2yr.
     2024-10-09 13:48:35,361 [INFO]: enso_diags_driver.py(calculate_nino_index_model:307) >> Simulated sea surface temperature not found, using surface temperature instead.
     2024-10-09 13:48:35,362 [INFO]: enso_diags_driver.py(calculate_nino_index_model:307) >> Simulated sea surface temperature not found, using surface temperature instead.
     2024-10-09 13:48:35,465 [INFO]: enso_diags_driver.py(run_diag_map:88) >> Season: ANN
     2024-10-09 13:48:35,465 [INFO]: enso_diags_driver.py(run_diag_map:92) >> Variable: LHFLX
     2024-10-09 13:48:35,465 [INFO]: enso_diags_driver.py(run_diag_scatter:201) >> Variable: FLNS
     2024-10-09 13:48:36,805 [INFO]: enso_diags_driver.py(run_diag_map:99) >> Selected region: 20S20N
     2024-10-09 13:48:38,792 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/enso_diags/FLNS-feedback/feedback-FLNS-NINO3-TS-NINO3.png
     2024-10-09 13:48:38,792 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/enso_diags/FLNS-feedback/feedback-FLNS-NINO3-TS-NINO3.png
     2024-10-09 13:48:38,793 [INFO]: enso_diags_driver.py(run_diag_scatter:168) >> run_type: model_vs_obs
     2024-10-09 13:48:38,798 [INFO]: enso_diags_driver.py(calculate_nino_index_model:299) >> Handling the following exception by looking for surface temperature: No files found for target variable SST or derived variables ([('sst',), ('TS', 'OCNFRAC'), ('SST',)]) in /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_output/test-diags-no-cdat-20240917/v3.LR.historical_0051/post/atm/180x360_aave/ts/monthly/2yr.
     2024-10-09 13:48:38,902 [INFO]: io.py(_save_data_metrics_and_plots:77) >> Metrics saved in model_vs_obs_1987-1988/enso_diags/LHFLX-response/regression-coefficient-lhflx-over-nino34.json
     2024-10-09 13:48:38,930 [INFO]: enso_diags_driver.py(calculate_nino_index_model:307) >> Simulated sea surface temperature not found, using surface temperature instead.
     2024-10-09 13:48:39,038 [INFO]: enso_diags_driver.py(run_diag_scatter:201) >> Variable: SHFLX
     2024-10-09 13:48:40,874 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/enso_diags/SHFLX-feedback/feedback-SHFLX-NINO3-TS-NINO3.png
     2024-10-09 13:48:40,874 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/enso_diags/SHFLX-feedback/feedback-SHFLX-NINO3-TS-NINO3.png
     2024-10-09 13:48:40,875 [INFO]: enso_diags_driver.py(run_diag_scatter:168) >> run_type: model_vs_obs
     2024-10-09 13:48:40,879 [INFO]: enso_diags_driver.py(calculate_nino_index_model:299) >> Handling the following exception by looking for surface temperature: No files found for target variable SST or derived variables ([('sst',), ('TS', 'OCNFRAC'), ('SST',)]) in /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_output/test-diags-no-cdat-20240917/v3.LR.historical_0051/post/atm/180x360_aave/ts/monthly/2yr.
     2024-10-09 13:48:41,062 [INFO]: enso_diags_driver.py(calculate_nino_index_model:307) >> Simulated sea surface temperature not found, using surface temperature instead.
     2024-10-09 13:48:41,163 [INFO]: enso_diags_driver.py(run_diag_scatter:201) >> Variable: LHFLX
     2024-10-09 13:48:43,234 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/enso_diags/LHFLX-response/regression-coefficient-lhflx-over-nino34.png
     2024-10-09 13:48:43,234 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/enso_diags/LHFLX-response/regression-coefficient-lhflx-over-nino34.png
     2024-10-09 13:48:43,235 [INFO]: enso_diags_driver.py(run_diag_map:71) >> run_type: model_vs_obs
     2024-10-09 13:48:43,238 [INFO]: enso_diags_driver.py(calculate_nino_index_model:299) >> Handling the following exception by looking for surface temperature: No files found for target variable SST or derived variables ([('sst',), ('TS', 'OCNFRAC'), ('SST',)]) in /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_output/test-diags-no-cdat-20240917/v3.LR.historical_0051/post/atm/180x360_aave/ts/monthly/2yr.
     2024-10-09 13:48:43,270 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/enso_diags/LHFLX-feedback/feedback-LHFLX-NINO3-TS-NINO3.png
     2024-10-09 13:48:43,270 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/enso_diags/LHFLX-feedback/feedback-LHFLX-NINO3-TS-NINO3.png
     2024-10-09 13:48:43,271 [INFO]: enso_diags_driver.py(run_diag_scatter:168) >> run_type: model_vs_obs
     2024-10-09 13:48:43,273 [INFO]: enso_diags_driver.py(calculate_nino_index_model:299) >> Handling the following exception by looking for surface temperature: No files found for target variable SST or derived variables ([('sst',), ('TS', 'OCNFRAC'), ('SST',)]) in /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_output/test-diags-no-cdat-20240917/v3.LR.historical_0051/post/atm/180x360_aave/ts/monthly/2yr.
     2024-10-09 13:48:43,369 [INFO]: enso_diags_driver.py(calculate_nino_index_model:307) >> Simulated sea surface temperature not found, using surface temperature instead.
     2024-10-09 13:48:43,403 [INFO]: enso_diags_driver.py(calculate_nino_index_model:307) >> Simulated sea surface temperature not found, using surface temperature instead.
     2024-10-09 13:48:43,472 [INFO]: enso_diags_driver.py(run_diag_map:88) >> Season: ANN
     2024-10-09 13:48:43,472 [INFO]: enso_diags_driver.py(run_diag_map:92) >> Variable: TAUY
     2024-10-09 13:48:43,503 [INFO]: enso_diags_driver.py(run_diag_scatter:201) >> Variable: TAUX
     2024-10-09 13:48:44,604 [INFO]: enso_diags_driver.py(run_diag_map:99) >> Selected region: 20S20N
     2024-10-09 13:48:45,429 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/enso_diags/TAUX-feedback/feedback-TAUX-NINO4-TS-NINO3.png
     2024-10-09 13:48:45,429 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/enso_diags/TAUX-feedback/feedback-TAUX-NINO4-TS-NINO3.png
     2024-10-09 13:48:45,430 [INFO]: enso_diags_driver.py(run_diag_map:71) >> run_type: model_vs_obs
     2024-10-09 13:48:45,437 [INFO]: enso_diags_driver.py(calculate_nino_index_model:299) >> Handling the following exception by looking for surface temperature: No files found for target variable SST or derived variables ([('sst',), ('TS', 'OCNFRAC'), ('SST',)]) in /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_output/test-diags-no-cdat-20240917/v3.LR.historical_0051/post/atm/180x360_aave/ts/monthly/2yr.
     2024-10-09 13:48:45,598 [INFO]: enso_diags_driver.py(calculate_nino_index_model:307) >> Simulated sea surface temperature not found, using surface temperature instead.
     2024-10-09 13:48:45,697 [INFO]: enso_diags_driver.py(run_diag_map:88) >> Season: ANN
     2024-10-09 13:48:45,697 [INFO]: enso_diags_driver.py(run_diag_map:92) >> Variable: NET_FLUX_SRF
     2024-10-09 13:48:46,323 [INFO]: io.py(_save_data_metrics_and_plots:77) >> Metrics saved in model_vs_obs_1987-1988/enso_diags/TAUY-response/regression-coefficient-tauy-over-nino34.json
     2024-10-09 13:48:49,068 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/enso_diags/TAUY-response/regression-coefficient-tauy-over-nino34.png
     2024-10-09 13:48:49,068 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/enso_diags/TAUY-response/regression-coefficient-tauy-over-nino34.png
     2024-10-09 13:48:49,069 [INFO]: enso_diags_driver.py(run_diag_scatter:168) >> run_type: model_vs_obs
     2024-10-09 13:48:49,076 [INFO]: enso_diags_driver.py(calculate_nino_index_model:299) >> Handling the following exception by looking for surface temperature: No files found for target variable SST or derived variables ([('sst',), ('TS', 'OCNFRAC'), ('SST',)]) in /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_output/test-diags-no-cdat-20240917/v3.LR.historical_0051/post/atm/180x360_aave/ts/monthly/2yr.
     2024-10-09 13:48:49,246 [INFO]: enso_diags_driver.py(calculate_nino_index_model:307) >> Simulated sea surface temperature not found, using surface temperature instead.
     2024-10-09 13:48:49,349 [INFO]: enso_diags_driver.py(run_diag_scatter:201) >> Variable: NET_FLUX_SRF
     2024-10-09 13:49:16,382 [INFO]: enso_diags_driver.py(run_diag_map:99) >> Selected region: 20S20N
     2024-10-09 13:49:17,699 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/enso_diags/NET_FLUX_SRF-feedback/feedback-NET_FLUX_SRF-NINO3-TS-NINO3.png
     2024-10-09 13:49:17,699 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/enso_diags/NET_FLUX_SRF-feedback/feedback-NET_FLUX_SRF-NINO3-TS-NINO3.png
     2024-10-09 13:49:17,702 [INFO]: enso_diags_driver.py(run_diag_scatter:168) >> run_type: model_vs_obs
     2024-10-09 13:49:17,721 [INFO]: enso_diags_driver.py(calculate_nino_index_model:299) >> Handling the following exception by looking for surface temperature: No files found for target variable SST or derived variables ([('sst',), ('TS', 'OCNFRAC'), ('SST',)]) in /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_output/test-diags-no-cdat-20240917/v3.LR.historical_0051/post/atm/180x360_aave/ts/monthly/2yr.
     2024-10-09 13:49:17,917 [INFO]: enso_diags_driver.py(calculate_nino_index_model:307) >> Simulated sea surface temperature not found, using surface temperature instead.
     2024-10-09 13:49:18,033 [INFO]: enso_diags_driver.py(run_diag_scatter:201) >> Variable: FSNS
     2024-10-09 13:49:18,714 [INFO]: io.py(_save_data_metrics_and_plots:77) >> Metrics saved in model_vs_obs_1987-1988/enso_diags/NET_FLUX_SRF-response/regression-coefficient-net_flux_srf-over-nino34.json
     2024-10-09 13:49:20,821 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/enso_diags/FSNS-feedback/feedback-FSNS-NINO3-TS-NINO3.png
     2024-10-09 13:49:20,821 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/enso_diags/FSNS-feedback/feedback-FSNS-NINO3-TS-NINO3.png
     2024-10-09 13:49:20,823 [INFO]: enso_diags_driver.py(run_diag_map:71) >> run_type: model_vs_obs
     2024-10-09 13:49:20,829 [INFO]: enso_diags_driver.py(calculate_nino_index_model:299) >> Handling the following exception by looking for surface temperature: No files found for target variable SST or derived variables ([('sst',), ('TS', 'OCNFRAC'), ('SST',)]) in /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_output/test-diags-no-cdat-20240917/v3.LR.historical_0051/post/atm/180x360_aave/ts/monthly/2yr.
     2024-10-09 13:49:20,957 [INFO]: enso_diags_driver.py(calculate_nino_index_model:307) >> Simulated sea surface temperature not found, using surface temperature instead.
     2024-10-09 13:49:21,058 [INFO]: enso_diags_driver.py(run_diag_map:88) >> Season: ANN
     2024-10-09 13:49:21,058 [INFO]: enso_diags_driver.py(run_diag_map:92) >> Variable: TAUX
     2024-10-09 13:49:22,389 [INFO]: enso_diags_driver.py(run_diag_map:99) >> Selected region: 20S20N
     2024-10-09 13:49:22,591 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/enso_diags/NET_FLUX_SRF-response/regression-coefficient-net_flux_srf-over-nino34.png
     2024-10-09 13:49:22,591 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/enso_diags/NET_FLUX_SRF-response/regression-coefficient-net_flux_srf-over-nino34.png
     2024-10-09 13:49:22,594 [INFO]: enso_diags_driver.py(run_diag_map:71) >> run_type: model_vs_obs
     2024-10-09 13:49:22,597 [INFO]: enso_diags_driver.py(calculate_nino_index_model:299) >> Handling the following exception by looking for surface temperature: No files found for target variable SST or derived variables ([('sst',), ('TS', 'OCNFRAC'), ('SST',)]) in /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_output/test-diags-no-cdat-20240917/v3.LR.historical_0051/post/atm/180x360_aave/ts/monthly/2yr.
     2024-10-09 13:49:22,732 [INFO]: enso_diags_driver.py(calculate_nino_index_model:307) >> Simulated sea surface temperature not found, using surface temperature instead.
     2024-10-09 13:49:22,837 [INFO]: enso_diags_driver.py(run_diag_map:88) >> Season: ANN
     2024-10-09 13:49:22,837 [INFO]: enso_diags_driver.py(run_diag_map:92) >> Variable: SHFLX
     2024-10-09 13:49:24,111 [INFO]: io.py(_save_data_metrics_and_plots:77) >> Metrics saved in model_vs_obs_1987-1988/enso_diags/TAUX-response/regression-coefficient-taux-over-nino34.json
     2024-10-09 13:49:24,240 [INFO]: enso_diags_driver.py(run_diag_map:99) >> Selected region: 20S20N
     2024-10-09 13:49:25,922 [INFO]: io.py(_save_data_metrics_and_plots:77) >> Metrics saved in model_vs_obs_1987-1988/enso_diags/SHFLX-response/regression-coefficient-shflx-over-nino34.json
     2024-10-09 13:49:27,114 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/enso_diags/TAUX-response/regression-coefficient-taux-over-nino34.png
     2024-10-09 13:49:27,114 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/enso_diags/TAUX-response/regression-coefficient-taux-over-nino34.png
     2024-10-09 13:49:27,116 [INFO]: enso_diags_driver.py(run_diag_map:71) >> run_type: model_vs_obs
     2024-10-09 13:49:27,122 [INFO]: enso_diags_driver.py(calculate_nino_index_model:299) >> Handling the following exception by looking for surface temperature: No files found for target variable SST or derived variables ([('sst',), ('TS', 'OCNFRAC'), ('SST',)]) in /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_output/test-diags-no-cdat-20240917/v3.LR.historical_0051/post/atm/180x360_aave/ts/monthly/2yr.
     2024-10-09 13:49:27,284 [INFO]: enso_diags_driver.py(calculate_nino_index_model:307) >> Simulated sea surface temperature not found, using surface temperature instead.
     2024-10-09 13:49:27,387 [INFO]: enso_diags_driver.py(run_diag_map:88) >> Season: ANN
     2024-10-09 13:49:27,388 [INFO]: enso_diags_driver.py(run_diag_map:92) >> Variable: PRECT
     2024-10-09 13:49:30,922 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/enso_diags/SHFLX-response/regression-coefficient-shflx-over-nino34.png
     2024-10-09 13:49:30,922 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/enso_diags/SHFLX-response/regression-coefficient-shflx-over-nino34.png
     2024-10-09 13:49:33,589 [INFO]: enso_diags_driver.py(run_diag_map:99) >> Selected region: 20S20N
     2024-10-09 13:49:34,665 [INFO]: io.py(_save_data_metrics_and_plots:77) >> Metrics saved in model_vs_obs_1987-1988/enso_diags/PRECT-response/regression-coefficient-prect-over-nino34.json
     2024-10-09 13:49:37,054 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/enso_diags/PRECT-response/regression-coefficient-prect-over-nino34.png
     2024-10-09 13:49:37,054 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/enso_diags/PRECT-response/regression-coefficient-prect-over-nino34.png
     2024-10-09 13:49:37,102 [INFO]: main.py(create_viewer:132) >> enso_diags model_vs_obs_1987-1988/viewer
     2024-10-09 13:49:37,216 [INFO]: main.py(create_viewer:135) >> ('ENSO Diagnostics', 'enso_diags/index.html')
     2024-10-09 13:49:37,220 [INFO]: e3sm_diags_driver.py(main:392) >> Viewer HTML generated at model_vs_obs_1987-1988/viewer/index.html
     2024-10-09 13:49:37,223 [INFO]: logger.py(move_log_to_prov_dir:106) >> Log file saved in model_vs_obs_1987-1988/prov/e3sm_diags_run.log
  • Fix tc_analysis
    • The TE stitch file is empty
    • /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_output/test-diags-no-cdat-20240917/v3.LR.historical_0051/post/atm/tc-analysis_1987_1988/cyclones_stitch_v3.LR.historical_0051_1987_1988.dat
        2024-10-09 16:25:18,711 [ERROR]: core_parameter.py(_run_diag:343) >> Error in e3sm_diags.driver.tc_analysis_driver
      Traceback (most recent call last):
        File "/home/ac.tvo/E3SM-Project/e3sm_diags/e3sm_diags/parameter/core_parameter.py", line 340, in _run_diag
          single_result = module.run_diag(self)
        File "/home/ac.tvo/E3SM-Project/e3sm_diags/e3sm_diags/driver/tc_analysis_driver.py", line 91, in run_diag
          test_data["metrics"] = generate_tc_metrics_from_te_stitch_file(test_te_file)
        File "/home/ac.tvo/E3SM-Project/e3sm_diags/e3sm_diags/driver/tc_analysis_driver.py", line 181, in generate_tc_metrics_from_te_stitch_file
          te_stitch_vars = _get_vars_from_te_stitch(lines, max_len, num_storms)
        File "/home/ac.tvo/E3SM-Project/e3sm_diags/e3sm_diags/driver/tc_analysis_driver.py", line 258, in _get_vars_from_te_stitch
          year_start = int(lines[0].split("\t")[2])
      IndexError: list index out of range
      2024-10-09 16:25:22,252 [WARNING]: e3sm_diags_driver.py(main:378) >> There was not a single valid diagnostics run, no viewer created.
      2024-10-09 16:25:22,253 [ERROR]: run.py(run_diags:91) >> Error traceback:
      Traceback (most recent call last):
        File "/home/ac.tvo/E3SM-Project/e3sm_diags/e3sm_diags/run.py", line 89, in run_diags
          params_results = main(params)
        File "/home/ac.tvo/E3SM-Project/e3sm_diags/e3sm_diags/e3sm_diags_driver.py", line 397, in main
          if parameters_results[0].fail_on_incomplete and (
      IndexError: list index out of range
  • Fix streamflow
    • Add support for finding .nc files using glob and regex under nested directories -- not sure if this will work because how will e3sm_diags know which .nc files to use if there are multiple sub-directories with the same matching files?
      • Alternative: Update zppy to use exact filepath for root directory containing file
      • e.g., For streamflow: /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_output/test-diags-no-cdat-20240917/v3.LR.historical_0051/post/rof/native/ts/monthly/2yr/
  • Add unit tests
  • Fix integration tests
  • Perform regression testing with full run script on time series datasets to ensure changes work properly for all sets -- IN PROGRESS
  • Ensure zppy run script produces results
2024-10-22 16:58:04,001 [INFO]: run.py(_add_parent_attrs_to_children:372) >> ['diff_title', 'short_test_name', 'multiprocessing', 'num_workers', 'results_dir']
2024-10-22 16:58:04,002 [INFO]: run.py(_add_parent_attrs_to_children:372) >> ['diff_title', 'short_test_name', 'multiprocessing', 'num_workers', 'results_dir']
2024-10-22 16:58:04,003 [INFO]: run.py(_add_parent_attrs_to_children:372) >> ['diff_title', 'test_name', 'multiprocessing', 'num_workers', 'results_dir']
2024-10-22 16:58:08,773 [INFO]: e3sm_diags_driver.py(_save_env_yml:58) >> Saved environment yml file to: model_vs_obs_1987-1988/prov/environment.yml
2024-10-22 16:58:08,775 [INFO]: e3sm_diags_driver.py(_save_parameter_files:69) >> Saved command used to: model_vs_obs_1987-1988/prov/cmd_used.txt
2024-10-22 16:58:08,778 [INFO]: e3sm_diags_driver.py(_save_python_script:133) >> Saved Python script to: model_vs_obs_1987-1988/prov/ipykernel_launcher.py
2024-10-22 16:58:09,589 [INFO]: enso_diags_driver.py(run_diag_scatter:168) >> run_type: model_vs_obs
2024-10-22 16:58:09,589 [INFO]: enso_diags_driver.py(run_diag_map:71) >> run_type: model_vs_obs
2024-10-22 16:58:09,590 [INFO]: enso_diags_driver.py(run_diag_map:71) >> run_type: model_vs_obs
2024-10-22 16:58:09,844 [INFO]: enso_diags_driver.py(calculate_nino_index_model:299) >> Handling the following exception by looking for surface temperature: No files found for target variable SST or derived variables ([('sst',), ('TS', 'OCNFRAC'), ('SST',)]) in /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_output/test-diags-no-cdat-20240917/v3.LR.historical_0051/post/atm/180x360_aave/ts/monthly/2yr.
2024-10-22 16:58:09,844 [INFO]: enso_diags_driver.py(calculate_nino_index_model:299) >> Handling the following exception by looking for surface temperature: No files found for target variable SST or derived variables ([('sst',), ('TS', 'OCNFRAC'), ('SST',)]) in /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_output/test-diags-no-cdat-20240917/v3.LR.historical_0051/post/atm/180x360_aave/ts/monthly/2yr.
2024-10-22 16:58:09,845 [INFO]: enso_diags_driver.py(calculate_nino_index_model:299) >> Handling the following exception by looking for surface temperature: No files found for target variable SST or derived variables ([('sst',), ('TS', 'OCNFRAC'), ('SST',)]) in /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_output/test-diags-no-cdat-20240917/v3.LR.historical_0051/post/atm/180x360_aave/ts/monthly/2yr.
2024-10-22 16:58:10,200 [INFO]: enso_diags_driver.py(calculate_nino_index_model:307) >> Simulated sea surface temperature not found, using surface temperature instead.
2024-10-22 16:58:10,203 [INFO]: enso_diags_driver.py(calculate_nino_index_model:307) >> Simulated sea surface temperature not found, using surface temperature instead.
2024-10-22 16:58:10,204 [INFO]: enso_diags_driver.py(calculate_nino_index_model:307) >> Simulated sea surface temperature not found, using surface temperature instead.
2024-10-22 16:58:10,321 [INFO]: enso_diags_driver.py(run_diag_map:88) >> Season: ANN
2024-10-22 16:58:10,321 [INFO]: enso_diags_driver.py(run_diag_map:88) >> Season: ANN
2024-10-22 16:58:10,322 [INFO]: enso_diags_driver.py(run_diag_scatter:201) >> Variable: SHFLX
2024-10-22 16:58:10,322 [INFO]: enso_diags_driver.py(run_diag_map:92) >> Variable: TAUY
2024-10-22 16:58:10,322 [INFO]: enso_diags_driver.py(run_diag_map:92) >> Variable: PRECT
2024-10-22 16:58:11,445 [INFO]: enso_diags_driver.py(run_diag_map:99) >> Selected region: 20S20N
2024-10-22 16:58:11,594 [INFO]: enso_diags_driver.py(run_diag_map:99) >> Selected region: 20S20N
2024-10-22 16:58:12,398 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/enso_diags/SHFLX-feedback/feedback-SHFLX-NINO3-TS-NINO3.png
2024-10-22 16:58:12,398 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/enso_diags/SHFLX-feedback/feedback-SHFLX-NINO3-TS-NINO3.png
2024-10-22 16:58:12,401 [INFO]: enso_diags_driver.py(run_diag_scatter:168) >> run_type: model_vs_obs
2024-10-22 16:58:12,413 [INFO]: enso_diags_driver.py(calculate_nino_index_model:299) >> Handling the following exception by looking for surface temperature: No files found for target variable SST or derived variables ([('sst',), ('TS', 'OCNFRAC'), ('SST',)]) in /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_output/test-diags-no-cdat-20240917/v3.LR.historical_0051/post/atm/180x360_aave/ts/monthly/2yr.
2024-10-22 16:58:12,547 [INFO]: enso_diags_driver.py(calculate_nino_index_model:307) >> Simulated sea surface temperature not found, using surface temperature instead.
2024-10-22 16:58:12,646 [INFO]: enso_diags_driver.py(run_diag_scatter:201) >> Variable: LHFLX
2024-10-22 16:58:13,019 [INFO]: io.py(_save_data_metrics_and_plots:77) >> Metrics saved in model_vs_obs_1987-1988/enso_diags/PRECT-response/regression-coefficient-prect-over-nino34.json
2024-10-22 16:58:13,345 [INFO]: io.py(_save_data_metrics_and_plots:77) >> Metrics saved in model_vs_obs_1987-1988/enso_diags/TAUY-response/regression-coefficient-tauy-over-nino34.json
2024-10-22 16:58:14,330 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/enso_diags/LHFLX-feedback/feedback-LHFLX-NINO3-TS-NINO3.png
2024-10-22 16:58:14,330 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/enso_diags/LHFLX-feedback/feedback-LHFLX-NINO3-TS-NINO3.png
2024-10-22 16:58:14,332 [INFO]: enso_diags_driver.py(run_diag_scatter:168) >> run_type: model_vs_obs
2024-10-22 16:58:14,345 [INFO]: enso_diags_driver.py(calculate_nino_index_model:299) >> Handling the following exception by looking for surface temperature: No files found for target variable SST or derived variables ([('sst',), ('TS', 'OCNFRAC'), ('SST',)]) in /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_output/test-diags-no-cdat-20240917/v3.LR.historical_0051/post/atm/180x360_aave/ts/monthly/2yr.
2024-10-22 16:58:14,478 [INFO]: enso_diags_driver.py(calculate_nino_index_model:307) >> Simulated sea surface temperature not found, using surface temperature instead.
2024-10-22 16:58:14,577 [INFO]: enso_diags_driver.py(run_diag_scatter:201) >> Variable: TAUX
2024-10-22 16:58:15,967 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/enso_diags/PRECT-response/regression-coefficient-prect-over-nino34.png
2024-10-22 16:58:15,967 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/enso_diags/PRECT-response/regression-coefficient-prect-over-nino34.png
2024-10-22 16:58:16,043 [INFO]: tc_analysis_driver.py(generate_tc_metrics_from_te_stitch_file:173) >> 
Generating TC Metrics from TE Stitch Files
2024-10-22 16:58:16,044 [INFO]: tc_analysis_driver.py(generate_tc_metrics_from_te_stitch_file:174) >> ============================================
2024-10-22 16:58:16,064 [ERROR]: core_parameter.py(_run_diag:343) >> Error in e3sm_diags.driver.tc_analysis_driver
Traceback (most recent call last):
  File "/gpfs/fs1/home/ac.tvo/E3SM-Project/e3sm_diags/e3sm_diags/parameter/core_parameter.py", line 340, in _run_diag
    single_result = module.run_diag(self)
  File "/gpfs/fs1/home/ac.tvo/E3SM-Project/e3sm_diags/e3sm_diags/driver/tc_analysis_driver.py", line 91, in run_diag
    test_data["metrics"] = generate_tc_metrics_from_te_stitch_file(test_te_file)
  File "/gpfs/fs1/home/ac.tvo/E3SM-Project/e3sm_diags/e3sm_diags/driver/tc_analysis_driver.py", line 182, in generate_tc_metrics_from_te_stitch_file
    raise ValueError(f"The file {te_stitch_file} is empty.")
ValueError: The file /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_output/test-diags-no-cdat-20240917/v3.LR.historical_0051/post/atm/tc-analysis_1987_1988/cyclones_stitch_v3.LR.historical_0051_1987_1988.dat is empty.
2024-10-22 16:58:16,252 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/enso_diags/TAUX-feedback/feedback-TAUX-NINO4-TS-NINO3.png
2024-10-22 16:58:16,252 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/enso_diags/TAUX-feedback/feedback-TAUX-NINO4-TS-NINO3.png
2024-10-22 16:58:16,255 [INFO]: enso_diags_driver.py(run_diag_map:71) >> run_type: model_vs_obs
2024-10-22 16:58:16,267 [INFO]: enso_diags_driver.py(calculate_nino_index_model:299) >> Handling the following exception by looking for surface temperature: No files found for target variable SST or derived variables ([('sst',), ('TS', 'OCNFRAC'), ('SST',)]) in /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_output/test-diags-no-cdat-20240917/v3.LR.historical_0051/post/atm/180x360_aave/ts/monthly/2yr.
2024-10-22 16:58:16,400 [INFO]: enso_diags_driver.py(calculate_nino_index_model:307) >> Simulated sea surface temperature not found, using surface temperature instead.
2024-10-22 16:58:16,499 [INFO]: enso_diags_driver.py(run_diag_map:88) >> Season: ANN
2024-10-22 16:58:16,500 [INFO]: enso_diags_driver.py(run_diag_map:92) >> Variable: NET_FLUX_SRF
2024-10-22 16:58:16,740 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/enso_diags/TAUY-response/regression-coefficient-tauy-over-nino34.png
2024-10-22 16:58:16,740 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/enso_diags/TAUY-response/regression-coefficient-tauy-over-nino34.png
2024-10-22 16:58:16,743 [INFO]: enso_diags_driver.py(run_diag_map:71) >> run_type: model_vs_obs
2024-10-22 16:58:16,755 [INFO]: enso_diags_driver.py(calculate_nino_index_model:299) >> Handling the following exception by looking for surface temperature: No files found for target variable SST or derived variables ([('sst',), ('TS', 'OCNFRAC'), ('SST',)]) in /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_output/test-diags-no-cdat-20240917/v3.LR.historical_0051/post/atm/180x360_aave/ts/monthly/2yr.
2024-10-22 16:58:16,892 [INFO]: enso_diags_driver.py(calculate_nino_index_model:307) >> Simulated sea surface temperature not found, using surface temperature instead.
2024-10-22 16:58:16,992 [INFO]: enso_diags_driver.py(run_diag_map:88) >> Season: ANN
2024-10-22 16:58:16,994 [INFO]: enso_diags_driver.py(run_diag_map:92) >> Variable: TAUX
2024-10-22 16:58:17,518 [INFO]: enso_diags_driver.py(run_diag_map:99) >> Selected region: 20S20N
2024-10-22 16:58:19,153 [INFO]: io.py(_save_data_metrics_and_plots:77) >> Metrics saved in model_vs_obs_1987-1988/enso_diags/TAUX-response/regression-coefficient-taux-over-nino34.json
2024-10-22 16:58:21,370 [INFO]: enso_diags_driver.py(run_diag_map:99) >> Selected region: 20S20N
2024-10-22 16:58:21,989 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/enso_diags/TAUX-response/regression-coefficient-taux-over-nino34.png
2024-10-22 16:58:21,989 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/enso_diags/TAUX-response/regression-coefficient-taux-over-nino34.png
2024-10-22 16:58:23,069 [INFO]: streamflow_driver.py(run_diag:47) >> Variable: RIVER_DISCHARGE_OVER_LAND_LIQ
2024-10-22 16:58:23,354 [INFO]: io.py(_save_data_metrics_and_plots:77) >> Metrics saved in model_vs_obs_1987-1988/enso_diags/NET_FLUX_SRF-response/regression-coefficient-net_flux_srf-over-nino34.json
2024-10-22 16:58:27,235 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/enso_diags/NET_FLUX_SRF-response/regression-coefficient-net_flux_srf-over-nino34.png
2024-10-22 16:58:27,235 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/enso_diags/NET_FLUX_SRF-response/regression-coefficient-net_flux_srf-over-nino34.png
2024-10-22 16:58:27,240 [INFO]: enso_diags_driver.py(run_diag_map:71) >> run_type: model_vs_obs
2024-10-22 16:58:27,252 [INFO]: enso_diags_driver.py(calculate_nino_index_model:299) >> Handling the following exception by looking for surface temperature: No files found for target variable SST or derived variables ([('sst',), ('TS', 'OCNFRAC'), ('SST',)]) in /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_output/test-diags-no-cdat-20240917/v3.LR.historical_0051/post/atm/180x360_aave/ts/monthly/2yr.
2024-10-22 16:58:27,385 [INFO]: enso_diags_driver.py(calculate_nino_index_model:307) >> Simulated sea surface temperature not found, using surface temperature instead.
2024-10-22 16:58:27,483 [INFO]: enso_diags_driver.py(run_diag_map:88) >> Season: ANN
2024-10-22 16:58:27,484 [INFO]: enso_diags_driver.py(run_diag_map:92) >> Variable: SHFLX
2024-10-22 16:58:27,971 [INFO]: enso_diags_driver.py(run_diag_map:99) >> Selected region: 20S20N
2024-10-22 16:58:29,613 [INFO]: io.py(_save_data_metrics_and_plots:77) >> Metrics saved in model_vs_obs_1987-1988/enso_diags/SHFLX-response/regression-coefficient-shflx-over-nino34.json
2024-10-22 16:58:34,558 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/enso_diags/SHFLX-response/regression-coefficient-shflx-over-nino34.png
2024-10-22 16:58:34,558 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/enso_diags/SHFLX-response/regression-coefficient-shflx-over-nino34.png
2024-10-22 16:58:34,561 [INFO]: enso_diags_driver.py(run_diag_map:71) >> run_type: model_vs_obs
2024-10-22 16:58:34,573 [INFO]: enso_diags_driver.py(calculate_nino_index_model:299) >> Handling the following exception by looking for surface temperature: No files found for target variable SST or derived variables ([('sst',), ('TS', 'OCNFRAC'), ('SST',)]) in /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_output/test-diags-no-cdat-20240917/v3.LR.historical_0051/post/atm/180x360_aave/ts/monthly/2yr.
2024-10-22 16:58:34,707 [INFO]: enso_diags_driver.py(calculate_nino_index_model:307) >> Simulated sea surface temperature not found, using surface temperature instead.
2024-10-22 16:58:34,806 [INFO]: enso_diags_driver.py(run_diag_map:88) >> Season: ANN
2024-10-22 16:58:34,807 [INFO]: enso_diags_driver.py(run_diag_map:92) >> Variable: LHFLX
2024-10-22 16:58:35,295 [INFO]: enso_diags_driver.py(run_diag_map:99) >> Selected region: 20S20N
2024-10-22 16:58:36,942 [INFO]: io.py(_save_data_metrics_and_plots:77) >> Metrics saved in model_vs_obs_1987-1988/enso_diags/LHFLX-response/regression-coefficient-lhflx-over-nino34.json
2024-10-22 16:58:40,433 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/enso_diags/LHFLX-response/regression-coefficient-lhflx-over-nino34.png
2024-10-22 16:58:40,433 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/enso_diags/LHFLX-response/regression-coefficient-lhflx-over-nino34.png
2024-10-22 16:58:54,317 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/streamflow/RIVER_DISCHARGE_OVER_LAND_LIQ_GSIM/seasonality_map.png
2024-10-22 16:58:54,317 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/streamflow/RIVER_DISCHARGE_OVER_LAND_LIQ_GSIM/seasonality_map.png
2024-10-22 16:59:03,468 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/streamflow/RIVER_DISCHARGE_OVER_LAND_LIQ_GSIM/annual_map.png
2024-10-22 16:59:03,468 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/streamflow/RIVER_DISCHARGE_OVER_LAND_LIQ_GSIM/annual_map.png
2024-10-22 16:59:04,002 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/streamflow/RIVER_DISCHARGE_OVER_LAND_LIQ_GSIM/annual_scatter.png
2024-10-22 16:59:04,002 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/streamflow/RIVER_DISCHARGE_OVER_LAND_LIQ_GSIM/annual_scatter.png
2024-10-22 16:59:04,005 [INFO]: enso_diags_driver.py(run_diag_scatter:168) >> run_type: model_vs_obs
2024-10-22 16:59:04,017 [INFO]: enso_diags_driver.py(calculate_nino_index_model:299) >> Handling the following exception by looking for surface temperature: No files found for target variable SST or derived variables ([('sst',), ('TS', 'OCNFRAC'), ('SST',)]) in /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_output/test-diags-no-cdat-20240917/v3.LR.historical_0051/post/atm/180x360_aave/ts/monthly/2yr.
2024-10-22 16:59:04,155 [INFO]: enso_diags_driver.py(calculate_nino_index_model:307) >> Simulated sea surface temperature not found, using surface temperature instead.
2024-10-22 16:59:04,253 [INFO]: enso_diags_driver.py(run_diag_scatter:201) >> Variable: NET_FLUX_SRF
2024-10-22 16:59:08,342 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/enso_diags/NET_FLUX_SRF-feedback/feedback-NET_FLUX_SRF-NINO3-TS-NINO3.png
2024-10-22 16:59:08,342 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/enso_diags/NET_FLUX_SRF-feedback/feedback-NET_FLUX_SRF-NINO3-TS-NINO3.png
2024-10-22 16:59:08,346 [INFO]: enso_diags_driver.py(run_diag_scatter:168) >> run_type: model_vs_obs
2024-10-22 16:59:08,359 [INFO]: enso_diags_driver.py(calculate_nino_index_model:299) >> Handling the following exception by looking for surface temperature: No files found for target variable SST or derived variables ([('sst',), ('TS', 'OCNFRAC'), ('SST',)]) in /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_output/test-diags-no-cdat-20240917/v3.LR.historical_0051/post/atm/180x360_aave/ts/monthly/2yr.
2024-10-22 16:59:08,492 [INFO]: enso_diags_driver.py(calculate_nino_index_model:307) >> Simulated sea surface temperature not found, using surface temperature instead.
2024-10-22 16:59:08,591 [INFO]: enso_diags_driver.py(run_diag_scatter:201) >> Variable: FSNS
2024-10-22 16:59:10,286 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/enso_diags/FSNS-feedback/feedback-FSNS-NINO3-TS-NINO3.png
2024-10-22 16:59:10,286 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/enso_diags/FSNS-feedback/feedback-FSNS-NINO3-TS-NINO3.png
2024-10-22 16:59:10,289 [INFO]: enso_diags_driver.py(run_diag_scatter:168) >> run_type: model_vs_obs
2024-10-22 16:59:10,302 [INFO]: enso_diags_driver.py(calculate_nino_index_model:299) >> Handling the following exception by looking for surface temperature: No files found for target variable SST or derived variables ([('sst',), ('TS', 'OCNFRAC'), ('SST',)]) in /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_output/test-diags-no-cdat-20240917/v3.LR.historical_0051/post/atm/180x360_aave/ts/monthly/2yr.
2024-10-22 16:59:10,436 [INFO]: enso_diags_driver.py(calculate_nino_index_model:307) >> Simulated sea surface temperature not found, using surface temperature instead.
2024-10-22 16:59:10,535 [INFO]: enso_diags_driver.py(run_diag_scatter:201) >> Variable: FLNS
2024-10-22 16:59:12,833 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/enso_diags/FLNS-feedback/feedback-FLNS-NINO3-TS-NINO3.png
2024-10-22 16:59:12,833 [INFO]: utils.py(_save_plot:91) >> Plot saved in: model_vs_obs_1987-1988/enso_diags/FLNS-feedback/feedback-FLNS-NINO3-TS-NINO3.png
2024-10-22 16:59:12,889 [INFO]: main.py(create_viewer:132) >> enso_diags model_vs_obs_1987-1988/viewer
2024-10-22 16:59:13,058 [INFO]: main.py(create_viewer:135) >> ('ENSO Diagnostics', 'enso_diags/index.html')
2024-10-22 16:59:13,059 [INFO]: main.py(create_viewer:132) >> streamflow model_vs_obs_1987-1988/viewer
2024-10-22 16:59:13,090 [INFO]: main.py(create_viewer:135) >> ('Streamflow', 'streamflow/index.html')
2024-10-22 16:59:13,095 [INFO]: e3sm_diags_driver.py(main:392) >> Viewer HTML generated at model_vs_obs_1987-1988/viewer/index.html
2024-10-22 16:59:13,098 [INFO]: logger.py(move_log_to_prov_dir:106) >> Log file saved in model_vs_obs_1987-1988/prov/e3sm_diags_run.log

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • My changes generate no new warnings
  • Any dependent changes have been merged and published in downstream modules

If applicable:

  • New and existing unit tests pass with my changes (locally and CI/CD build)
  • I have added tests that prove my fix is effective or that my feature works
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have noted that this is a breaking change for a major release (fix or feature that would cause existing functionality to not work as expected)

@tomvothecoder tomvothecoder added the cdat-migration-fy24 CDAT Migration FY24 Task label Oct 8, 2024
@tomvothecoder tomvothecoder self-assigned this Oct 8, 2024
@tomvothecoder tomvothecoder changed the title Add support for multiple time series datasets via glob Add support for multiple time series datasets via glob and fix enso_diags and tc_analysis Oct 8, 2024
@tomvothecoder tomvothecoder changed the title Add support for multiple time series datasets via glob and fix enso_diags and tc_analysis Add support for multiple time series datasets via glob and fix enso_diags, streamflow, tc_analysis Oct 9, 2024
@tomvothecoder
Copy link
Collaborator Author

@forsyth2 For the failing tc_analysis set on main and the cdat-migration-fy24 branch, I found the TE stitch file is empty. This results in the ambiguous Python bug related to not being able to parse the lines of the file.

Can you fix this file? /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_output/test-diags-no-cdat-20240917/v3.LR.historical_0051/post/atm/tc-analysis_1987_1988/cyclones_stitch_v3.LR.historical_0051_1987_1988.dat

@chengzhuzhang
Copy link
Contributor

chengzhuzhang commented Oct 10, 2024

@tomvothecoder thank you for helping identify this problem. In this version of v3 Low Resolution datasets (documented here), much lower TC activity is found. It is likely that no TC is detected during the testing period (1987-1988). Therefore no file called "cyclones_stitch_v3.LR.historical_0051_1987_1988.dat" was generated from an upstream process. So the testing result that skipped tc_analysis figure is expected in case of v3..

@tomvothecoder
Copy link
Collaborator Author

@tomvothecoder thank you for helping identify this problem. In this version of v3 Low Resolution datasets (documented here), much lower TC activity is found. It is likely that no TC is detected during the testing period (1987-1988). Therefore no file called "cyclones_stitch_v3.LR.historical_0051_1987_1988.dat" was generated from an upstream process. So the testing result that skipped tc_analysis figure is expected in case of v3..

Thank you for clarifying. It sounds like @forsyth2 should exclude tc_analysis from zppy for now and open a separate GitHub issue to add it back later.

@tomvothecoder
Copy link
Collaborator Author

tomvothecoder commented Oct 10, 2024

@forsyth2 The run script in zppy specifies input dir paths that contain sub-directories. e3sm_diags cannot determine which .nc files to use if there are multiple sub-directories containing the same matching filenames.

Instead, the zppy run script should specify the exact input dir path containing the input .nc files that should be used by e3sm_diags

Why?

I tried implementing functionality to parse the input data path for .nc files nested under these sub-directories. However, this presents issues where multiple sub-directories might contain the same input file pattern.

For example:

  • Input Path: /input_path/
  • Pattern: a_file.{13}.nc
  • Dir path 1: /input_path/dir_1/a_file_195001_198501.nc -> we want this file
  • Dir path 2: /input_path/dir_2/a_file_195001_198501.nc -> we don't want this file, but it will still be considered match

Fix -> set Input Path to /input_path/dir_1/

Fixes for zppy e3sm_diags run script

I used the e3sm.py run script from the provenance here and noticed the test_data_path parameters aren't exact (e.g., streamflow_param.test_data_path = 'rof')

enso_diags:

  • Instead of: /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_output/test-diags-no-cdat-20240917/v3.LR.historical_0051/post/atm/180x360_aave/ts/
  • It should be: /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_output/test-diags-no-cdat-20240917/v3.LR.historical_0051/post/atm/180x360_aave/ts/monthly/2yr
(e3sm_diags_dev_673) [ac.tvo@chrlogin1 e3sm_diags]$ ls -al /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_output/test-diags-no-cdat-20240917/v3.LR.historical_0051/post/atm/180x360_aave/ts/monthly/2yr
total 1249540
drwxrws---+ 4 ac.forsyth2 E3SM      4096 Sep 17 16:13 .
drwxrws---+ 3 ac.forsyth2 E3SM      4096 Sep 17 16:13 ..
drwxrws---+ 2 ac.forsyth2 E3SM      4096 Sep 17 16:13 1985_1986
drwxrws---+ 2 ac.forsyth2 E3SM      4096 Sep 17 16:13 1987_1988
-rw-rw----+ 1 ac.forsyth2 E3SM   6759312 Sep 17 16:13 CLDHGH_198501_198612.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759312 Sep 17 16:13 CLDHGH_198701_198812.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759312 Sep 17 16:13 CLDLOW_198501_198612.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759312 Sep 17 16:13 CLDLOW_198701_198812.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759320 Sep 17 16:13 CLDMED_198501_198612.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759320 Sep 17 16:13 CLDMED_198701_198812.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759316 Sep 17 16:13 CLDTOT_198501_198612.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759316 Sep 17 16:13 CLDTOT_198701_198812.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759392 Sep 17 16:13 FLNS_198501_198612.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759392 Sep 17 16:13 FLNS_198701_198812.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759400 Sep 17 16:13 FLNT_198501_198612.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759400 Sep 17 16:13 FLNT_198701_198812.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759404 Sep 17 16:13 FLUT_198501_198612.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759404 Sep 17 16:13 FLUT_198701_198812.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759392 Sep 17 16:13 FSNS_198501_198612.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759392 Sep 17 16:13 FSNS_198701_198812.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759396 Sep 17 16:13 FSNT_198501_198612.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759396 Sep 17 16:13 FSNT_198701_198812.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759416 Sep 17 16:13 FSNTOA_198501_198612.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759416 Sep 17 16:13 FSNTOA_198701_198812.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759316 Sep 17 16:13 PRECC_198501_198612.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759316 Sep 17 16:13 PRECC_198701_198812.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759324 Sep 17 16:13 PRECL_198501_198612.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759324 Sep 17 16:13 PRECL_198701_198812.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759320 Sep 17 16:13 PRECSC_198501_198612.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759320 Sep 17 16:13 PRECSC_198701_198812.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759332 Sep 17 16:13 PRECSL_198501_198612.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759332 Sep 17 16:13 PRECSL_198701_198812.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759348 Sep 17 16:13 QFLX_198501_198612.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759348 Sep 17 16:13 QFLX_198701_198812.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759364 Sep 17 16:13 SHFLX_198501_198612.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759364 Sep 17 16:13 SHFLX_198701_198812.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759284 Sep 17 16:13 TAUX_198501_198612.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759284 Sep 17 16:13 TAUX_198701_198812.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759292 Sep 17 16:13 TAUY_198501_198612.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759292 Sep 17 16:13 TAUY_198701_198812.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759352 Sep 17 16:13 TREFHT_198501_198612.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759352 Sep 17 16:13 TREFHT_198701_198812.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759336 Sep 17 16:13 TS_198501_198612.nc
-rw-rw----+ 1 ac.forsyth2 E3SM   6759336 Sep 17 16:13 TS_198701_198812.nc
-rw-rw----+ 1 ac.forsyth2 E3SM 504428552 Sep 17 16:13 U_198501_198612.nc
-rw-rw----+ 1 ac.forsyth2 E3SM 504428552 Sep 17 16:13 U_198701_198812.nc

streamflow:

  • Instead of /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_output/test-diags-no-cdat-20240917/v3.LR.historical_0051/post/rof/
  • It should be: /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_output/test-diags-no-cdat-20240917/v3.LR.historical_0051/post/rof/native/ts/monthly/2yr
(e3sm_diags_dev_673) [ac.tvo@chrlogin1 e3sm_diags]$ ls -al /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_output/test-diags-no-cdat-20240917/v3.LR.historical_0051/post/rof/native/ts/monthly/2yr
total 50690
drwxrws---+ 2 ac.forsyth2 E3SM     4096 Sep 17 16:13 .
drwxrws---+ 3 ac.forsyth2 E3SM     4096 Sep 17 16:13 ..
-rw-rw----+ 1 ac.forsyth2 E3SM 25938612 Sep 17 16:13 RIVER_DISCHARGE_OVER_LAND_LIQ_198501_198612.nc
-rw-rw----+ 1 ac.forsyth2 E3SM 25938612 Sep 17 16:13 RIVER_DISCHARGE_OVER_LAND_LIQ_198701_198812.nc

@forsyth2
Copy link
Collaborator

forsyth2 commented Oct 11, 2024

@tomvothecoder Thank you for looking into all this.

should exclude tc_analysis from zppy for now and open a separate GitHub issue to add it back later.

I also added a failure if the stitch file is empty (included in E3SM-Project/zppy#628) to catch this in the future. But yes, for now, we'll just have to exclude tc_analysis from v3 testing.

Instead, the zppy run script should specify the exact input dir path containing the input .nc files that should be used by e3sm_diags

I'm going to have to dive deeper into this. In theory, zppy is constructing the exact paths given other parameters passed to it. E.g., in e3sm_diags.bash:

ts_dir_source={{ output }}/post/atm/{{ grid }}/ts/monthly/{{ '%dyr' % (ts_num_years) }}
ts_daily_dir={{ output }}/post/atm/{{ grid }}/ts/daily/{{ '%dyr' % (ts_num_years) }}
ts_rof_dir_source="{{ output }}/post/rof/native/ts/monthly/{{ ts_num_years }}yr"

@tomvothecoder
Copy link
Collaborator Author

I'm going to have to dive deeper into this. In theory, zppy is constructing the exact paths given other parameters passed to it. E.g., in e3sm_diags.bash:

Got it. I will do final testing then merge this PR soon for you to try in zppy.

Copy link
Collaborator Author

@tomvothecoder tomvothecoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @chengzhuzhang, I would appreciate a PR review to ensure my code changes look good to you as well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main of file of interest to add support for glob of .nc files.

Copy link
Contributor

@chengzhuzhang chengzhuzhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR looks good! I only have one minor comment.

@chengzhuzhang
Copy link
Contributor

@tomvothecoder, to follow up our discussion this morning, for the file path pattern, if there is a ref_name specified, it is added to the file path pattern, so the scenario I described was taken care already.

if ref_name is None:
glob_dir = root_path
filepath_pattern = os.path.join(root_path, filename_pattern)
else:
glob_dir = os.path.join(root_path, ref_name)
filepath_pattern = os.path.join(root_path, ref_name, filename_pattern)

@tomvothecoder
Copy link
Collaborator Author

@tomvothecoder, to follow up our discussion this morning, for the file path pattern, if there is a ref_name specified, it is added to the file path pattern, so the scenario I described was taken care already.

if ref_name is None:
glob_dir = root_path
filepath_pattern = os.path.join(root_path, filename_pattern)
else:
glob_dir = os.path.join(root_path, ref_name)
filepath_pattern = os.path.join(root_path, ref_name, filename_pattern)

Just noting that this logic has been verified to work in previous regression testing and there are units tests that cover it.

@tomvothecoder tomvothecoder changed the title Add support for multiple time series datasets via glob and fix enso_diags, streamflow, tc_analysis Add support for multiple time series datasets via glob, fix enso_diags set, various smaller fixes Oct 22, 2024
@tomvothecoder tomvothecoder changed the title Add support for multiple time series datasets via glob, fix enso_diags set, various smaller fixes Add support for multiple time series datasets via glob, fix enso_diags set, and raise exception for non-existent or empty TE stitch file Oct 22, 2024
@tomvothecoder tomvothecoder force-pushed the refactor/861-time-series-multiple branch from 1c2ab1f to a27b15b Compare October 25, 2024 20:00
@tomvothecoder tomvothecoder changed the title Add support for multiple time series datasets via glob, fix enso_diags set, and raise exception for non-existent or empty TE stitch file Add support for multiple time series datasets via glob and fix enso_diags set Oct 25, 2024
Copy link
Collaborator Author

@tomvothecoder tomvothecoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Final self-review.

tests/e3sm_diags/driver/utils/test_dataset_xr.py Outdated Show resolved Hide resolved
tests/e3sm_diags/driver/utils/test_dataset_xr.py Outdated Show resolved Hide resolved
Comment on lines -361 to -372
if self.is_climo:
ds = self._get_climo_dataset(season)
return ds
elif self.is_time_series:
if self.is_time_series:
ds = self.get_time_series_dataset(var)
ds_climo = climo(ds, self.var, season).to_dataset()

return ds_climo
else:
raise RuntimeError(
"This Dataset object could not be identified as either a climatology "
"(`self.is_climo`) or time series dataset (`self.is_time_series`)."
)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed else statement because this condition will never be met (always climo or time series)

# filename if the season is in ["ANN", "DJF", "MAM", "JJA", "SON"].
if season in ["ANN", "DJF", "MAM", "JJA", "SON"]:
# filename. This is a more general pattern for model only data.
if season in MODEL_ONLY_SEASONS:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to #871 and #712

e3sm_diags/driver/tc_analysis_driver.py Outdated Show resolved Hide resolved
e3sm_diags/derivations/formulas.py Outdated Show resolved Hide resolved
@tomvothecoder tomvothecoder merged commit e4e7c86 into cdat-migration-fy24 Oct 28, 2024
4 checks passed
@tomvothecoder tomvothecoder deleted the refactor/861-time-series-multiple branch October 28, 2024 21:23
@tomvothecoder
Copy link
Collaborator Author

Regression tests show same results as before.

@forsyth2 I just merged this PR to cdat-migration-fy24 for you to test on zppy when you get around to it.

tomvothecoder added a commit that referenced this pull request Dec 5, 2024
Refer to the PR for more information because the changelog is massive.

Update build workflow to run on `cdat-migration-fy24` branch

CDAT Migration Phase 2: Add CDAT regression test notebook template and fix GH Actions build (#743)

- Add Makefile for quick access to multiple Python-based commands such as linting, testing, cleaning up cache and build files
- Fix some lingering unit tests failure
- Update `xcdat=0.6.0rc1` to `xcdat >=0.6.0` in `ci.yml`, `dev.yml` and `dev-nompi.yml`
- Add `xskillscore` to `ci.yml`
- Fix `pre-commit` issues

CDAT Migration Phase 2: Regression testing for `lat_lon`, `lat_lon_land`, and `lat_lon_river` (#744)

- Add Makefile that simplifies common development commands (building and installing, testing, etc.)
- Write unit tests to cover all new code for utility functions
  - `dataset_xr.py`, `metrics.py`, `climo_xr.py`, `io.py`, `regrid.py`
- Metrics comparison for  `cdat-migration-fy24` `lat_lon` and `main` branch of `lat_lon` -- `NET_FLUX_SRF` and `RESTOM` have the highest spatial average diffs
- Test run with 3D variables (`_run_3d_diags()`)
  - Fix Python 3.9 bug with using pipe command to represent Union -- doesn't work with `from __future__ import annotations` still
  - Fix subsetting syntax bug using ilev
  - Fix regridding bug where a single plev is passed and xCDAT does not allow generating bounds for coordinates of len <= 1 -- add conditional that just ignores adding new bounds for regridded output datasets, fix related tests
  - Fix accidentally calling save plots and metrics twice in `_get_metrics_by_region()`
- Fix failing integration tests pass in CI/CD
  - Refactor `test_diags.py` -- replace unittest with pytest
  - Refactor `test_all_sets.py` -- replace unittest with pytest
  - Test climatology datasets -- tested with 3d variables using `test_all_sets.py`

CDAT Migration Phase 2: Refactor utilities and CoreParameter methods for reusability across diagnostic sets (#746)

- Move driver type annotations to `type_annotations.py`
- Move `lat_lon_driver._save_data_metrics_and_plots()` to `io.py`
- Update `_save_data_metrics_and_plots` args to accept `plot_func` callable
- Update `metrics.spatial_avg` to return an optionally `xr.DataArray` with `as_list=False`
- Move `parameter` arg to the top in `lat_lon_plot.plot`
- Move `_set_param_output_attrs` and `_set_name_yr_attrs` from `lat_lon_driver` to `CoreParameter` class

Regression testing for lat_lon variables `NET_FLUX_SRF` and `RESTOM` (#754)

Update regression test notebook to show validation of all vars

Add `subset_and_align_datasets()` to regrid.py (#776)

Add template run scripts

CDAT Migration Phase: Refactor `cosp_histogram` set (#748)

- Refactor `cosp_histogram_driver.py` and `cosp_histogram_plot.py`
- `formulas_cosp.py` (new file)
  - Includes refactored, Xarray-based `cosp_histogram_standard()` and `cosp_bin_sum()` functions
  - I wrote a lot of new code in `formulas_cosp.py` to clean up `derivations.py` and the old equivalent functions in `utils.py`
- `derivations.py`
  - Cleaned up portions of `DERIVED_VARIABLES` dictionary
  - Removed unnecessary `OrderedDict` usage for `cosp_histogram` related variables (we should do this for the rest of the variables in in #716)
  - Remove unnecessary `convert_units()` function calls
  - Move cloud levels passed to derived variable formulas to `formulas_cosp.CLOUD_BIN_SUM_MAP`
- `utils.py`
  - Delete deprecated, CDAT-based `cosp_histogram` functions
- `dataset_xr.py`
  - Add `dataset_xr.Dataset._open_climo_dataset()` method with a catch for dataset quality issues where "time" is a scalar variable that does not match the "time" dimension array length, drops this variable and replaces it with the correct coordinate
  -  Update `_get_dataset_with_derivation_func()` to handle derivation functions that require the `xr.Dataset` and `target_var_key` args (e.g., `cosp_histogram_standardize()` and `cosp_bin_sum()`)
- `io.py`
  - Update `_write_vars_to_netcdf()` to write test, ref, and diff variables to individual netCDF (required for easy comparison to CDAT-based code that does the same thing)
- Add `cdat_migration_regression_test_netcdf.ipynb` validation notebook template for comparing `.nc` files

CDAT Migration Phase 2: Refactor `zonal_mean_2d()` and `zonal_mean_2d_stratosphere()` sets (#774)

Refactor 654 zonal mean xy (#752)

Co-authored-by: Tom Vo <[email protected]>

CDAT Migration - Update run script output directory to NERSC public webserver (#793)

[PR]: CDAT Migration: Refactor `aerosol_aeronet` set (#788)

CDAT Migration: Test `lat_lon` set with run script and debug any issues (#794)

CDAT Migration: Refactor `polar` set (#749)

Co-authored-by: Tom Vo <[email protected]>

Align order of calls to `_set_param_output_attrs`

CDAT Migration: Refactor `meridional_mean_2d` set (#795)

CDAT Migration: Refactor `aerosol_budget` (#800)

Add `acme.py` changes from PR #712 (#814)

* Add `acme.py` changes from PR #712

* Replace unnecessary lambda call

Refactor area_mean_time_series and add ccb slice flag feature (#750)

Co-authored-by: Tom Vo <[email protected]>

[Refactor]: Validate fix in PR #750 for #759 (#815)

CDAT Migration Phase 2: Refactor `diurnal_cycle` set (#819)

CDAT Migration: Refactor annual_cycle_zonal_mean set (#798)

* Refactor `annual_cycle_zonal_mean` set

* Address PR review comments

* Add lat lon regression testing

* Add debugging scripts

* Update `_open_climo_dataset()` to decode times as workaround to misaligned time coords
- Update `annual_cycle_zonal_mean_plot.py` to convert time coordinates to month integers

* Fix unit tests

* Remove old plotter

* Add script to debug decode_times=True and ncclimo file

* Update plotter time values to month integers

* Fix slow `.load()` and multiprocessing issue
- Due to incorrectly updating `keep_bnds` logic
- Add `_encode_time_coords()` to workaround cftime issue `ValueError: "months since" units only allowed for "360_day" calendar`

* Update `_encode_time_coords()` docstring

* Add AODVIS debug script

* update AODVIS obs datasets; regression test results

---------

Co-authored-by: Tom Vo <[email protected]>

CDAT Migration Phase 2: Refactor `qbo` set (#826)

CDAT Migration Phase 2: Refactor tc_analysis set  (#829)

* start tc_analysis_refactor

* update driver

* update plotting

* Clean up plotter
- Remove unused variables
- Make `plot_info` a constant called `PLOT_INFO`, which is now a dict of dicts
- Reorder functions for top-down readability

* Remove unused notebook

---------

Co-authored-by: tomvothecoder <[email protected]>

CDAT Migration Phase 2: Refactor `enso_diags` set (#832)

CDAT Migration Phase 2: Refactor `streamflow` set (#837)

[Bug]: CDAT Migration Phase 2: enso_diags plot fixes (#841)

[Refactor]: CDAT Migration Phase 3: testing and documentation update (#846)

CDAT Migration Phase 3 - Port QBO Wavelet feature to Xarray/xCDAT codebase (#860)

CDAT Migration Phase 2: Refactor arm_diags set (#842)

Add performance benchmark material (#864)

Add function to add CF axis attr to Z axis if missing for downstream xCDAT operations (#865)

CDAT Migration Phase 3: Add Convective Precipitation Fraction in lat-lon (#875)

CDAT Migration Phase 3: Fix LHFLX name and add catch for non-existent or empty TE stitch file (#876)

Add support for time series datasets via glob and fix `enso_diags` set (#866)

Add fix for checking `is_time_series()` property based on `data_type` attr (#881)

CDAT migration: Fix African easterly wave density plots in TC analysis and convert H20LNZ units to ppm/volume (#882)

CDAT Migration: Update `mp_partition_driver.py` to use Dataset from `dataset_xr.py` (#883)

CDAT Migration - Port JJB tropical subseasonal diags to Xarray/xCDAT (#887)

CDAT Migration: Prepare branch for merge to `main` (#885)

[Refactor]: CDAT Migration - Update dependencies and remove Dataset._add_cf_attrs_to_z_axes() (#891)

CDAT Migration Phase 2: Refactor core utilities and  `lat_lon` set (#677)

Refer to the PR for more information because the changelog is massive.

Update build workflow to run on `cdat-migration-fy24` branch

CDAT Migration Phase 2: Add CDAT regression test notebook template and fix GH Actions build (#743)

- Add Makefile for quick access to multiple Python-based commands such as linting, testing, cleaning up cache and build files
- Fix some lingering unit tests failure
- Update `xcdat=0.6.0rc1` to `xcdat >=0.6.0` in `ci.yml`, `dev.yml` and `dev-nompi.yml`
- Add `xskillscore` to `ci.yml`
- Fix `pre-commit` issues

CDAT Migration Phase 2: Regression testing for `lat_lon`, `lat_lon_land`, and `lat_lon_river` (#744)

- Add Makefile that simplifies common development commands (building and installing, testing, etc.)
- Write unit tests to cover all new code for utility functions
  - `dataset_xr.py`, `metrics.py`, `climo_xr.py`, `io.py`, `regrid.py`
- Metrics comparison for  `cdat-migration-fy24` `lat_lon` and `main` branch of `lat_lon` -- `NET_FLUX_SRF` and `RESTOM` have the highest spatial average diffs
- Test run with 3D variables (`_run_3d_diags()`)
  - Fix Python 3.9 bug with using pipe command to represent Union -- doesn't work with `from __future__ import annotations` still
  - Fix subsetting syntax bug using ilev
  - Fix regridding bug where a single plev is passed and xCDAT does not allow generating bounds for coordinates of len <= 1 -- add conditional that just ignores adding new bounds for regridded output datasets, fix related tests
  - Fix accidentally calling save plots and metrics twice in `_get_metrics_by_region()`
- Fix failing integration tests pass in CI/CD
  - Refactor `test_diags.py` -- replace unittest with pytest
  - Refactor `test_all_sets.py` -- replace unittest with pytest
  - Test climatology datasets -- tested with 3d variables using `test_all_sets.py`

CDAT Migration Phase 2: Refactor utilities and CoreParameter methods for reusability across diagnostic sets (#746)

- Move driver type annotations to `type_annotations.py`
- Move `lat_lon_driver._save_data_metrics_and_plots()` to `io.py`
- Update `_save_data_metrics_and_plots` args to accept `plot_func` callable
- Update `metrics.spatial_avg` to return an optionally `xr.DataArray` with `as_list=False`
- Move `parameter` arg to the top in `lat_lon_plot.plot`
- Move `_set_param_output_attrs` and `_set_name_yr_attrs` from `lat_lon_driver` to `CoreParameter` class

CDAT Migration Phase 2: Refactor `zonal_mean_2d()` and `zonal_mean_2d_stratosphere()` sets (#774)

CDAT Migration Phase 2: Refactor `qbo` set (#826)
tomvothecoder added a commit that referenced this pull request Dec 5, 2024
Refer to the PR for more information because the changelog is massive.

Update build workflow to run on `cdat-migration-fy24` branch

CDAT Migration Phase 2: Add CDAT regression test notebook template and fix GH Actions build (#743)

- Add Makefile for quick access to multiple Python-based commands such as linting, testing, cleaning up cache and build files
- Fix some lingering unit tests failure
- Update `xcdat=0.6.0rc1` to `xcdat >=0.6.0` in `ci.yml`, `dev.yml` and `dev-nompi.yml`
- Add `xskillscore` to `ci.yml`
- Fix `pre-commit` issues

CDAT Migration Phase 2: Regression testing for `lat_lon`, `lat_lon_land`, and `lat_lon_river` (#744)

- Add Makefile that simplifies common development commands (building and installing, testing, etc.)
- Write unit tests to cover all new code for utility functions
  - `dataset_xr.py`, `metrics.py`, `climo_xr.py`, `io.py`, `regrid.py`
- Metrics comparison for  `cdat-migration-fy24` `lat_lon` and `main` branch of `lat_lon` -- `NET_FLUX_SRF` and `RESTOM` have the highest spatial average diffs
- Test run with 3D variables (`_run_3d_diags()`)
  - Fix Python 3.9 bug with using pipe command to represent Union -- doesn't work with `from __future__ import annotations` still
  - Fix subsetting syntax bug using ilev
  - Fix regridding bug where a single plev is passed and xCDAT does not allow generating bounds for coordinates of len <= 1 -- add conditional that just ignores adding new bounds for regridded output datasets, fix related tests
  - Fix accidentally calling save plots and metrics twice in `_get_metrics_by_region()`
- Fix failing integration tests pass in CI/CD
  - Refactor `test_diags.py` -- replace unittest with pytest
  - Refactor `test_all_sets.py` -- replace unittest with pytest
  - Test climatology datasets -- tested with 3d variables using `test_all_sets.py`

CDAT Migration Phase 2: Refactor utilities and CoreParameter methods for reusability across diagnostic sets (#746)

- Move driver type annotations to `type_annotations.py`
- Move `lat_lon_driver._save_data_metrics_and_plots()` to `io.py`
- Update `_save_data_metrics_and_plots` args to accept `plot_func` callable
- Update `metrics.spatial_avg` to return an optionally `xr.DataArray` with `as_list=False`
- Move `parameter` arg to the top in `lat_lon_plot.plot`
- Move `_set_param_output_attrs` and `_set_name_yr_attrs` from `lat_lon_driver` to `CoreParameter` class

Regression testing for lat_lon variables `NET_FLUX_SRF` and `RESTOM` (#754)

Update regression test notebook to show validation of all vars

Add `subset_and_align_datasets()` to regrid.py (#776)

Add template run scripts

CDAT Migration Phase: Refactor `cosp_histogram` set (#748)

- Refactor `cosp_histogram_driver.py` and `cosp_histogram_plot.py`
- `formulas_cosp.py` (new file)
  - Includes refactored, Xarray-based `cosp_histogram_standard()` and `cosp_bin_sum()` functions
  - I wrote a lot of new code in `formulas_cosp.py` to clean up `derivations.py` and the old equivalent functions in `utils.py`
- `derivations.py`
  - Cleaned up portions of `DERIVED_VARIABLES` dictionary
  - Removed unnecessary `OrderedDict` usage for `cosp_histogram` related variables (we should do this for the rest of the variables in in #716)
  - Remove unnecessary `convert_units()` function calls
  - Move cloud levels passed to derived variable formulas to `formulas_cosp.CLOUD_BIN_SUM_MAP`
- `utils.py`
  - Delete deprecated, CDAT-based `cosp_histogram` functions
- `dataset_xr.py`
  - Add `dataset_xr.Dataset._open_climo_dataset()` method with a catch for dataset quality issues where "time" is a scalar variable that does not match the "time" dimension array length, drops this variable and replaces it with the correct coordinate
  -  Update `_get_dataset_with_derivation_func()` to handle derivation functions that require the `xr.Dataset` and `target_var_key` args (e.g., `cosp_histogram_standardize()` and `cosp_bin_sum()`)
- `io.py`
  - Update `_write_vars_to_netcdf()` to write test, ref, and diff variables to individual netCDF (required for easy comparison to CDAT-based code that does the same thing)
- Add `cdat_migration_regression_test_netcdf.ipynb` validation notebook template for comparing `.nc` files

CDAT Migration Phase 2: Refactor `zonal_mean_2d()` and `zonal_mean_2d_stratosphere()` sets (#774)

Refactor 654 zonal mean xy (#752)

Co-authored-by: Tom Vo <[email protected]>

CDAT Migration - Update run script output directory to NERSC public webserver (#793)

[PR]: CDAT Migration: Refactor `aerosol_aeronet` set (#788)

CDAT Migration: Test `lat_lon` set with run script and debug any issues (#794)

CDAT Migration: Refactor `polar` set (#749)

Co-authored-by: Tom Vo <[email protected]>

Align order of calls to `_set_param_output_attrs`

CDAT Migration: Refactor `meridional_mean_2d` set (#795)

CDAT Migration: Refactor `aerosol_budget` (#800)

Add `acme.py` changes from PR #712 (#814)

* Add `acme.py` changes from PR #712

* Replace unnecessary lambda call

Refactor area_mean_time_series and add ccb slice flag feature (#750)

Co-authored-by: Tom Vo <[email protected]>

[Refactor]: Validate fix in PR #750 for #759 (#815)

CDAT Migration Phase 2: Refactor `diurnal_cycle` set (#819)

CDAT Migration: Refactor annual_cycle_zonal_mean set (#798)

* Refactor `annual_cycle_zonal_mean` set

* Address PR review comments

* Add lat lon regression testing

* Add debugging scripts

* Update `_open_climo_dataset()` to decode times as workaround to misaligned time coords
- Update `annual_cycle_zonal_mean_plot.py` to convert time coordinates to month integers

* Fix unit tests

* Remove old plotter

* Add script to debug decode_times=True and ncclimo file

* Update plotter time values to month integers

* Fix slow `.load()` and multiprocessing issue
- Due to incorrectly updating `keep_bnds` logic
- Add `_encode_time_coords()` to workaround cftime issue `ValueError: "months since" units only allowed for "360_day" calendar`

* Update `_encode_time_coords()` docstring

* Add AODVIS debug script

* update AODVIS obs datasets; regression test results

---------

Co-authored-by: Tom Vo <[email protected]>

CDAT Migration Phase 2: Refactor `qbo` set (#826)

CDAT Migration Phase 2: Refactor tc_analysis set  (#829)

* start tc_analysis_refactor

* update driver

* update plotting

* Clean up plotter
- Remove unused variables
- Make `plot_info` a constant called `PLOT_INFO`, which is now a dict of dicts
- Reorder functions for top-down readability

* Remove unused notebook

---------

Co-authored-by: tomvothecoder <[email protected]>

CDAT Migration Phase 2: Refactor `enso_diags` set (#832)

CDAT Migration Phase 2: Refactor `streamflow` set (#837)

[Bug]: CDAT Migration Phase 2: enso_diags plot fixes (#841)

[Refactor]: CDAT Migration Phase 3: testing and documentation update (#846)

CDAT Migration Phase 3 - Port QBO Wavelet feature to Xarray/xCDAT codebase (#860)

CDAT Migration Phase 2: Refactor arm_diags set (#842)

Add performance benchmark material (#864)

Add function to add CF axis attr to Z axis if missing for downstream xCDAT operations (#865)

CDAT Migration Phase 3: Add Convective Precipitation Fraction in lat-lon (#875)

CDAT Migration Phase 3: Fix LHFLX name and add catch for non-existent or empty TE stitch file (#876)

Add support for time series datasets via glob and fix `enso_diags` set (#866)

Add fix for checking `is_time_series()` property based on `data_type` attr (#881)

CDAT migration: Fix African easterly wave density plots in TC analysis and convert H20LNZ units to ppm/volume (#882)

CDAT Migration: Update `mp_partition_driver.py` to use Dataset from `dataset_xr.py` (#883)

CDAT Migration - Port JJB tropical subseasonal diags to Xarray/xCDAT (#887)

CDAT Migration: Prepare branch for merge to `main` (#885)

[Refactor]: CDAT Migration - Update dependencies and remove Dataset._add_cf_attrs_to_z_axes() (#891)

CDAT Migration Phase 2: Refactor core utilities and  `lat_lon` set (#677)

Refer to the PR for more information because the changelog is massive.

Update build workflow to run on `cdat-migration-fy24` branch

CDAT Migration Phase 2: Add CDAT regression test notebook template and fix GH Actions build (#743)

- Add Makefile for quick access to multiple Python-based commands such as linting, testing, cleaning up cache and build files
- Fix some lingering unit tests failure
- Update `xcdat=0.6.0rc1` to `xcdat >=0.6.0` in `ci.yml`, `dev.yml` and `dev-nompi.yml`
- Add `xskillscore` to `ci.yml`
- Fix `pre-commit` issues

CDAT Migration Phase 2: Regression testing for `lat_lon`, `lat_lon_land`, and `lat_lon_river` (#744)

- Add Makefile that simplifies common development commands (building and installing, testing, etc.)
- Write unit tests to cover all new code for utility functions
  - `dataset_xr.py`, `metrics.py`, `climo_xr.py`, `io.py`, `regrid.py`
- Metrics comparison for  `cdat-migration-fy24` `lat_lon` and `main` branch of `lat_lon` -- `NET_FLUX_SRF` and `RESTOM` have the highest spatial average diffs
- Test run with 3D variables (`_run_3d_diags()`)
  - Fix Python 3.9 bug with using pipe command to represent Union -- doesn't work with `from __future__ import annotations` still
  - Fix subsetting syntax bug using ilev
  - Fix regridding bug where a single plev is passed and xCDAT does not allow generating bounds for coordinates of len <= 1 -- add conditional that just ignores adding new bounds for regridded output datasets, fix related tests
  - Fix accidentally calling save plots and metrics twice in `_get_metrics_by_region()`
- Fix failing integration tests pass in CI/CD
  - Refactor `test_diags.py` -- replace unittest with pytest
  - Refactor `test_all_sets.py` -- replace unittest with pytest
  - Test climatology datasets -- tested with 3d variables using `test_all_sets.py`

CDAT Migration Phase 2: Refactor utilities and CoreParameter methods for reusability across diagnostic sets (#746)

- Move driver type annotations to `type_annotations.py`
- Move `lat_lon_driver._save_data_metrics_and_plots()` to `io.py`
- Update `_save_data_metrics_and_plots` args to accept `plot_func` callable
- Update `metrics.spatial_avg` to return an optionally `xr.DataArray` with `as_list=False`
- Move `parameter` arg to the top in `lat_lon_plot.plot`
- Move `_set_param_output_attrs` and `_set_name_yr_attrs` from `lat_lon_driver` to `CoreParameter` class

CDAT Migration Phase 2: Refactor `zonal_mean_2d()` and `zonal_mean_2d_stratosphere()` sets (#774)

CDAT Migration Phase 2: Refactor `qbo` set (#826)
@forsyth2
Copy link
Collaborator

@tomvothecoder Following up on #866 (comment) and #866 (comment):

e3sm_diags cannot determine which .nc files to use if there are multiple sub-directories containing the same matching filenames.

Has this actually caused an error or is it more a readability issue? I'm not sure if I missed that above.

pre- CDAT migration zppy

Code before E3SM-Project/zppy#651

For example, in https://github.com/E3SM-Project/zppy/blob/main/zppy/templates/e3sm_diags.bash, we have:

streamflow_param.test_data_path = '${ts_rof_dir_primary}'

Which is defined at:

{%- if "streamflow" in sets %}
{% if run_type == "model_vs_obs" %}
ts_rof_dir_primary=rof
{% elif run_type == "model_vs_model" %}
ts_rof_dir_primary=rof_test
{%- endif %}
ts_rof_dir_source="{{ output }}/post/rof/native/ts/monthly/{{ ts_num_years }}yr"
create_links_ts_rof ${ts_rof_dir_source} ${ts_rof_dir_primary} ${Y1} ${Y2} 7

That link creation function is:

create_links_ts_rof()
{
  ts_rof_dir_source=$1
  ts_rof_dir_destination=$2
  begin_year=$3
  end_year=$4
  error_num=$5
  mkdir -p ${ts_rof_dir_destination}
  cd ${ts_rof_dir_destination}
  v="RIVER_DISCHARGE_OVER_LAND_LIQ"
  xml_name=${v}_${begin_year}01_${end_year}12.xml
  cdscan -x ${xml_name} ${ts_rof_dir_source}/${v}_*.nc
  if [ $? != 0 ]; then
    cd {{ scriptDir }}
    echo "ERROR (${error_num})" > {{ prefix }}.status
    exit ${error_num}
  fi
  cd ..
}

So, we've passed along 'rof' to ts_rof_dir_primary to ts_rof_dir_destination. In this function, we create that directory with mkdir -p ${ts_rof_dir_destination}. Meanwhile, "{{ output }}/post/rof/native/ts/monthly/{{ ts_num_years }}yr" get passed in as ts_rof_dir_source and then the result of cdscan on that directory is placed into the destination directory.

That is we essentially have the following:

cd rof
cdscan -x ${xml_name} {{ output }}/post/rof/native/ts/monthly/{{ ts_num_years }}yr/${v}_*.nc
cd ..
# I believe the directory 'rof' contains the xml name and that's it.
# Set `streamflow_param.test_data_path = 'rof'`

Notice, we are getting data from an exact path but yes, diags isn't getting anything more than 'rof'.

post- CDAT migration zppy

Code after E3SM-Project/zppy#651

Here, this block:

  xml_name=${v}_${begin_year}01_${end_year}12.xml
  cdscan -x ${xml_name} ${ts_rof_dir_source}/${v}_*.nc

becomes:

cp ${ts_rof_dir_source}/${v}_*.nc ${v}_*.nc

Which means we essentially have:

cp {{ output }}/post/rof/native/ts/monthly/{{ ts_num_years }}yr/${v}_*.nc ${v}_*.nc

Here, the directory 'rof' now contains all the nc files directly. That is, there are no subdirectories.

Conclusions

In both cases (pre and post CDAT migration), it looks like the paths that don't have subdirectories specified are like that because they do not, in fact, have subdirectories.

This is why I ask if there was an error you actually ran into. From visual inspection of the code, it looks like the directories we're feeding into e3sm_diags don't actually have subdirectories to specify.

@tomvothecoder
Copy link
Collaborator Author

tomvothecoder commented Dec 19, 2024

Which means we essentially have:

cp {{ output }}/post/rof/native/ts/monthly/{{ ts_num_years }}yr/${v}_*.nc ${v}_*.nc

Here, the directory 'rof' now contains all the nc files directly. That is, there are no subdirectories.

In both cases (pre and post CDAT migration), it looks like the paths that don't have subdirectories specified are like that because they do not, in fact, have subdirectories.

Correct me if I'm wrong, but isn't /native a sub-directory of /rof/? And /rof/ is the test data directory being passed to e3sm_diags?

CDAT-based e3sm_diags Behavior

In the CDAT-based version of the e3sm_diags code, the exact file path for .nc files wasn't necessary. Instead, the code simply reads the XML file located under {{ output }}/post/rof/. This XML file then provides the locations of all the .nc files within the path {{ output }}/post/rof/native/ts/monthly/{{ ts_num_years }}yr/.

CDAT-free e3sm_diags Behavior

In the CDAT-free version of e3sm_diags, we no longer reference the {{ output }}/post/rof/ directory directly. Instead, we must specify the full path: {{ output }}/post/rof/native/ts/monthly/{{ ts_num_years }}yr/. This allows the code to identify the matching .nc files for the specified variable(s).

Limitations of Directory Structure Handling

The file-matching logic in the CDAT-free code is limited in terms of how many subdirectories it can handle, and this number can vary depending on the .nc files require for each diagnostic set. Since cdscan and XML are no longer used in the CDAT-free code, the exact directory must now be specified to locate the .nc files using Xarray/xCDAT.

@forsyth2
Copy link
Collaborator

Correct me if I'm wrong, but isn't /native a sub-directory of /rof/? And /rof/ is the test data directory being passed to e3sm_diags?

/native is a subdirectory of {{ output }}/post/rof/, yes. However, this line is actually copying the subdirectory native/ts/monthly/{{ ts_num_years }}yr/ file-by-file to the directory we're currently in:

In create_links_ts_rof, we havecd ${ts_rof_dir_destination}. That means the files are copied directly into ${ts_rof_dir_destination}

That means that line is essentially:

cp {{ output }}/post/rof/native/ts/monthly/{{ ts_num_years }}yr/${v}_*.nc ${ts_rof_dir_destination}/${v}_*.nc

Now, let's jump up a level:

{%- if "streamflow" in sets %}
{% if run_type == "model_vs_obs" %}
ts_rof_dir_primary=rof
{% elif run_type == "model_vs_model" %}
ts_rof_dir_primary=rof_test
{%- endif %}
ts_rof_dir_source="{{ output }}/post/rof/native/ts/monthly/{{ ts_num_years }}yr"
create_links_ts_rof ${ts_rof_dir_source} ${ts_rof_dir_primary} ${Y1} ${Y2} 7
{% if run_type == "model_vs_model" %}
ts_rof_dir_source={{ reference_data_path_ts_rof }}/{{ ts_num_years_ref }}yr
ts_rof_dir_ref=ts_rof_ref
create_links_ts_rof ${ts_rof_dir_source} ${ts_rof_dir_ref} ${ref_Y1} ${ref_Y2} 8
{%- endif %}
{%- endif %}

So, ${ts_rof_dir_destination} is either ${ts_rof_dir_primary} (which is either rof or rof_test) or ${ts_rof_dir_ref} (which is ts_rof_ref).

Now, that code block is run after cd {{ scriptDir }}, so we're in {{ scriptDir }}.

That means that line is essentially:

cp {{ output }}/post/rof/native/ts/monthly/{{ ts_num_years }}yr/${v}_*.nc `{{ scriptDir }}/rof/${v}_*.nc`

Now, we have streamflow_param.test_data_path = '${ts_rof_dir_primary}'. It's{{ scriptDir }}/rof/${v}_*.nc that is getting passed into E3SM Diags, not {{ output }}/post/rof/native/ts/monthly/{{ ts_num_years }}yr/${v}_*.nc. The former doesn't have subdirectories.

the exact directory must now be specified to locate the .nc files using Xarray/xCDAT

@golaz @chengzhuzhang What was the original purpose of the create_link_... functions? I was thinking it was to create the xml file lists, but not all of them did that.

On main (https://github.com/E3SM-Project/zppy/blob/main/zppy/templates/e3sm_diags.bash), these lines create xml files (which isn't necessary post-CDAT-migration):

# create_links_ts
cdscan -x ${xml_name} -f ${v}_files.txt
# Replaced in https://github.com/E3SM-Project/zppy/pull/651/ with:
# cp ${file} ${ts_dir_destination}/${v}_${YYYY}*.nc

# create_links_ts_rof
cdscan -x ${xml_name} ${ts_rof_dir_source}/${v}_*.nc
# Replaced in https://github.com/E3SM-Project/zppy/pull/651/ with:
# cp ${ts_rof_dir_source}/${v}_*.nc ${v}_*.nc

but these lines just create symbolic links:

# create_links_climo
cp -s ${climo_dir_source}/${nc_prefix}_*_${begin_year}??_${end_year}??_climo.nc .

# create_links_climo_diurnal
cp -s ${climo_diurnal_dir_source}/${nc_prefix}.*_*_${begin_year}??_${end_year}??_climo.nc .

So, it occurs to me that the changes in https://github.com/E3SM-Project/zppy/pull/651/files#top should have been to symoblic links, not copies. In any case, we're clearly still creating copies/links to pass into Diags.

Are these symbolic links still necessary in the CDAT-migrated E3SM Diags? I can't seem to recall why it was implemented this way. From what @tomvothecoder says, it seems like it would be better to just pass in the directories directly.

(In any case, the e3sm_diags.bash template is overly complex for bash. I think we should pull out as much as we can into Python, for easier testing/debugging/maintainability as soon as possible).

@chengzhuzhang
Copy link
Contributor

chengzhuzhang commented Dec 23, 2024

@golaz @chengzhuzhang What was the original purpose of the create_link_... functions? I was thinking it was to create the xml file lists, but not all of them did that.

Hi @forsyth2 , creating symbolic links for climo files is still necessary, the reason is that in zppy's climo tasks, climo files from multiple year chunks are stored in the same directory, however e3sm_diags can only read climo files from one year range, that is why we need to create symlinks to separate multiple sets of climo files.

For monthly time-series files, for the pre-CDAT-migration, we need to create xml files that essentially concatenate ts files from multiple files. In CDAT-migrated diags, time slicing is handled withing E3SM Diags, I think that's why @tomvothecoder suggested to pass in directoreis directly without symbolic links or copying files around.

@forsyth2
Copy link
Collaborator

Ah ok, thanks for the info.

pass in directoreis directly without symbolic links or copying files around.

Great, I'll integrate that into E3SM-Project/zppy#651 and test that along with the changes that #907 introduces.

@chengzhuzhang chengzhuzhang mentioned this pull request Dec 23, 2024
9 tasks
tomvothecoder added a commit that referenced this pull request Jan 15, 2025
Refer to the PR for more information because the changelog is massive.

Update build workflow to run on `cdat-migration-fy24` branch

CDAT Migration Phase 2: Add CDAT regression test notebook template and fix GH Actions build (#743)

- Add Makefile for quick access to multiple Python-based commands such as linting, testing, cleaning up cache and build files
- Fix some lingering unit tests failure
- Update `xcdat=0.6.0rc1` to `xcdat >=0.6.0` in `ci.yml`, `dev.yml` and `dev-nompi.yml`
- Add `xskillscore` to `ci.yml`
- Fix `pre-commit` issues

CDAT Migration Phase 2: Regression testing for `lat_lon`, `lat_lon_land`, and `lat_lon_river` (#744)

- Add Makefile that simplifies common development commands (building and installing, testing, etc.)
- Write unit tests to cover all new code for utility functions
  - `dataset_xr.py`, `metrics.py`, `climo_xr.py`, `io.py`, `regrid.py`
- Metrics comparison for  `cdat-migration-fy24` `lat_lon` and `main` branch of `lat_lon` -- `NET_FLUX_SRF` and `RESTOM` have the highest spatial average diffs
- Test run with 3D variables (`_run_3d_diags()`)
  - Fix Python 3.9 bug with using pipe command to represent Union -- doesn't work with `from __future__ import annotations` still
  - Fix subsetting syntax bug using ilev
  - Fix regridding bug where a single plev is passed and xCDAT does not allow generating bounds for coordinates of len <= 1 -- add conditional that just ignores adding new bounds for regridded output datasets, fix related tests
  - Fix accidentally calling save plots and metrics twice in `_get_metrics_by_region()`
- Fix failing integration tests pass in CI/CD
  - Refactor `test_diags.py` -- replace unittest with pytest
  - Refactor `test_all_sets.py` -- replace unittest with pytest
  - Test climatology datasets -- tested with 3d variables using `test_all_sets.py`

CDAT Migration Phase 2: Refactor utilities and CoreParameter methods for reusability across diagnostic sets (#746)

- Move driver type annotations to `type_annotations.py`
- Move `lat_lon_driver._save_data_metrics_and_plots()` to `io.py`
- Update `_save_data_metrics_and_plots` args to accept `plot_func` callable
- Update `metrics.spatial_avg` to return an optionally `xr.DataArray` with `as_list=False`
- Move `parameter` arg to the top in `lat_lon_plot.plot`
- Move `_set_param_output_attrs` and `_set_name_yr_attrs` from `lat_lon_driver` to `CoreParameter` class

Regression testing for lat_lon variables `NET_FLUX_SRF` and `RESTOM` (#754)

Update regression test notebook to show validation of all vars

Add `subset_and_align_datasets()` to regrid.py (#776)

Add template run scripts

CDAT Migration Phase: Refactor `cosp_histogram` set (#748)

- Refactor `cosp_histogram_driver.py` and `cosp_histogram_plot.py`
- `formulas_cosp.py` (new file)
  - Includes refactored, Xarray-based `cosp_histogram_standard()` and `cosp_bin_sum()` functions
  - I wrote a lot of new code in `formulas_cosp.py` to clean up `derivations.py` and the old equivalent functions in `utils.py`
- `derivations.py`
  - Cleaned up portions of `DERIVED_VARIABLES` dictionary
  - Removed unnecessary `OrderedDict` usage for `cosp_histogram` related variables (we should do this for the rest of the variables in in #716)
  - Remove unnecessary `convert_units()` function calls
  - Move cloud levels passed to derived variable formulas to `formulas_cosp.CLOUD_BIN_SUM_MAP`
- `utils.py`
  - Delete deprecated, CDAT-based `cosp_histogram` functions
- `dataset_xr.py`
  - Add `dataset_xr.Dataset._open_climo_dataset()` method with a catch for dataset quality issues where "time" is a scalar variable that does not match the "time" dimension array length, drops this variable and replaces it with the correct coordinate
  -  Update `_get_dataset_with_derivation_func()` to handle derivation functions that require the `xr.Dataset` and `target_var_key` args (e.g., `cosp_histogram_standardize()` and `cosp_bin_sum()`)
- `io.py`
  - Update `_write_vars_to_netcdf()` to write test, ref, and diff variables to individual netCDF (required for easy comparison to CDAT-based code that does the same thing)
- Add `cdat_migration_regression_test_netcdf.ipynb` validation notebook template for comparing `.nc` files

CDAT Migration Phase 2: Refactor `zonal_mean_2d()` and `zonal_mean_2d_stratosphere()` sets (#774)

Refactor 654 zonal mean xy (#752)

Co-authored-by: Tom Vo <[email protected]>

CDAT Migration - Update run script output directory to NERSC public webserver (#793)

[PR]: CDAT Migration: Refactor `aerosol_aeronet` set (#788)

CDAT Migration: Test `lat_lon` set with run script and debug any issues (#794)

CDAT Migration: Refactor `polar` set (#749)

Co-authored-by: Tom Vo <[email protected]>

Align order of calls to `_set_param_output_attrs`

CDAT Migration: Refactor `meridional_mean_2d` set (#795)

CDAT Migration: Refactor `aerosol_budget` (#800)

Add `acme.py` changes from PR #712 (#814)

* Add `acme.py` changes from PR #712

* Replace unnecessary lambda call

Refactor area_mean_time_series and add ccb slice flag feature (#750)

Co-authored-by: Tom Vo <[email protected]>

[Refactor]: Validate fix in PR #750 for #759 (#815)

CDAT Migration Phase 2: Refactor `diurnal_cycle` set (#819)

CDAT Migration: Refactor annual_cycle_zonal_mean set (#798)

* Refactor `annual_cycle_zonal_mean` set

* Address PR review comments

* Add lat lon regression testing

* Add debugging scripts

* Update `_open_climo_dataset()` to decode times as workaround to misaligned time coords
- Update `annual_cycle_zonal_mean_plot.py` to convert time coordinates to month integers

* Fix unit tests

* Remove old plotter

* Add script to debug decode_times=True and ncclimo file

* Update plotter time values to month integers

* Fix slow `.load()` and multiprocessing issue
- Due to incorrectly updating `keep_bnds` logic
- Add `_encode_time_coords()` to workaround cftime issue `ValueError: "months since" units only allowed for "360_day" calendar`

* Update `_encode_time_coords()` docstring

* Add AODVIS debug script

* update AODVIS obs datasets; regression test results

---------

Co-authored-by: Tom Vo <[email protected]>

CDAT Migration Phase 2: Refactor `qbo` set (#826)

CDAT Migration Phase 2: Refactor tc_analysis set  (#829)

* start tc_analysis_refactor

* update driver

* update plotting

* Clean up plotter
- Remove unused variables
- Make `plot_info` a constant called `PLOT_INFO`, which is now a dict of dicts
- Reorder functions for top-down readability

* Remove unused notebook

---------

Co-authored-by: tomvothecoder <[email protected]>

CDAT Migration Phase 2: Refactor `enso_diags` set (#832)

CDAT Migration Phase 2: Refactor `streamflow` set (#837)

[Bug]: CDAT Migration Phase 2: enso_diags plot fixes (#841)

[Refactor]: CDAT Migration Phase 3: testing and documentation update (#846)

CDAT Migration Phase 3 - Port QBO Wavelet feature to Xarray/xCDAT codebase (#860)

CDAT Migration Phase 2: Refactor arm_diags set (#842)

Add performance benchmark material (#864)

Add function to add CF axis attr to Z axis if missing for downstream xCDAT operations (#865)

CDAT Migration Phase 3: Add Convective Precipitation Fraction in lat-lon (#875)

CDAT Migration Phase 3: Fix LHFLX name and add catch for non-existent or empty TE stitch file (#876)

Add support for time series datasets via glob and fix `enso_diags` set (#866)

Add fix for checking `is_time_series()` property based on `data_type` attr (#881)

CDAT migration: Fix African easterly wave density plots in TC analysis and convert H20LNZ units to ppm/volume (#882)

CDAT Migration: Update `mp_partition_driver.py` to use Dataset from `dataset_xr.py` (#883)

CDAT Migration - Port JJB tropical subseasonal diags to Xarray/xCDAT (#887)

CDAT Migration: Prepare branch for merge to `main` (#885)

[Refactor]: CDAT Migration - Update dependencies and remove Dataset._add_cf_attrs_to_z_axes() (#891)

CDAT Migration Phase 2: Refactor core utilities and  `lat_lon` set (#677)

Refer to the PR for more information because the changelog is massive.

Update build workflow to run on `cdat-migration-fy24` branch

CDAT Migration Phase 2: Add CDAT regression test notebook template and fix GH Actions build (#743)

- Add Makefile for quick access to multiple Python-based commands such as linting, testing, cleaning up cache and build files
- Fix some lingering unit tests failure
- Update `xcdat=0.6.0rc1` to `xcdat >=0.6.0` in `ci.yml`, `dev.yml` and `dev-nompi.yml`
- Add `xskillscore` to `ci.yml`
- Fix `pre-commit` issues

CDAT Migration Phase 2: Regression testing for `lat_lon`, `lat_lon_land`, and `lat_lon_river` (#744)

- Add Makefile that simplifies common development commands (building and installing, testing, etc.)
- Write unit tests to cover all new code for utility functions
  - `dataset_xr.py`, `metrics.py`, `climo_xr.py`, `io.py`, `regrid.py`
- Metrics comparison for  `cdat-migration-fy24` `lat_lon` and `main` branch of `lat_lon` -- `NET_FLUX_SRF` and `RESTOM` have the highest spatial average diffs
- Test run with 3D variables (`_run_3d_diags()`)
  - Fix Python 3.9 bug with using pipe command to represent Union -- doesn't work with `from __future__ import annotations` still
  - Fix subsetting syntax bug using ilev
  - Fix regridding bug where a single plev is passed and xCDAT does not allow generating bounds for coordinates of len <= 1 -- add conditional that just ignores adding new bounds for regridded output datasets, fix related tests
  - Fix accidentally calling save plots and metrics twice in `_get_metrics_by_region()`
- Fix failing integration tests pass in CI/CD
  - Refactor `test_diags.py` -- replace unittest with pytest
  - Refactor `test_all_sets.py` -- replace unittest with pytest
  - Test climatology datasets -- tested with 3d variables using `test_all_sets.py`

CDAT Migration Phase 2: Refactor utilities and CoreParameter methods for reusability across diagnostic sets (#746)

- Move driver type annotations to `type_annotations.py`
- Move `lat_lon_driver._save_data_metrics_and_plots()` to `io.py`
- Update `_save_data_metrics_and_plots` args to accept `plot_func` callable
- Update `metrics.spatial_avg` to return an optionally `xr.DataArray` with `as_list=False`
- Move `parameter` arg to the top in `lat_lon_plot.plot`
- Move `_set_param_output_attrs` and `_set_name_yr_attrs` from `lat_lon_driver` to `CoreParameter` class

CDAT Migration Phase 2: Refactor `zonal_mean_2d()` and `zonal_mean_2d_stratosphere()` sets (#774)

CDAT Migration Phase 2: Refactor `qbo` set (#826)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cdat-migration-fy24 CDAT Migration FY24 Task
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants