Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CDAT Migration Phase 2: Refactor arm_diags set #842

Merged
merged 35 commits into from
Oct 1, 2024

Conversation

chengzhuzhang
Copy link
Contributor

@chengzhuzhang chengzhuzhang commented Aug 26, 2024

Description

Additional Changes

arm_diags_driver.py and arm_diags_viewer.py

  • Remove aerosol annual cycle driver so it is not used by any aset

derivations.py and formulas.py

  • Add variable derivations to derivations dictionary
  • Add formulas for deriving variables

arm_diags_plot.py

arm_diags_parameter.py

  • Add time_interval attribute to store hours for diurnal cycle

dataset_xr.py

  • Add feature that replicates the "co" flag from cdms2, which excludes the last time coordinate when subsetting if that time coordinate spans the next year (Dataset._exclude_sub_monthly_coord_spanning_year())

diurnal_cycle_xr.py

  • Update diurnal_cycle_xr.py to convert coordinate DataArrays to lists and convert for loop to list comprehension for performance gains

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • My changes generate no new warnings
  • Any dependent changes have been merged and published in downstream modules

If applicable:

  • New and existing unit tests pass with my changes (locally and CI/CD build)
  • I have added tests that prove my fix is effective or that my feature works
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have noted that this is a breaking change for a major release (fix or feature that would cause existing functionality to not work as expected)

@chengzhuzhang
Copy link
Contributor Author

Updates:
All sub sets code refactoring including (annual_cycle, diurnal_cycle(_zt), convection_onset, aerosol_activation) are completed.

Todo:
Fix mypy issues
Potential performance bottleneck from deriving variables using xarray dataset as input

@chengzhuzhang chengzhuzhang marked this pull request as ready for review September 9, 2024 17:25
@tomvothecoder tomvothecoder force-pushed the refactor/667-arm_diags_rebase branch from 7b52e83 to ef44fc6 Compare September 9, 2024 21:44
@tomvothecoder
Copy link
Collaborator

tomvothecoder commented Sep 9, 2024

Potential performance bottleneck from deriving variables using xarray dataset as input

Thank you for identifying the two formulas code as potential bottlenecks. I confirmed that these computations with Xarray are indeed much slower than CDAT.

Solution

I ran a performance benchmark and found the solution to the slowness: we need to call .load(scheduler="sync") in _get_dataset_with_source_vars() to speed up the computation. I pushed commit ef44fc6 (#842) with this fix.

Benchmark Results

The first runtime is the current code and the second runtime is with .load(). I also ran e3sm_diags with commit ef44fc6 (#842) and confirmed a significant runtime improvement, similar to the benchmarks below. Performance is now on-par with CDAT.

"""
Results
----------
1. Elapsed time (Xarray non-chunked): 6.540755605790764 seconds
2. Elapsed time (Xarray non-chunked with .load()): 0.17097265785560012 seconds
3. Elapsed time (Xarray chunked): 0.1452920027077198 seconds
4. Elapsed time (numpy .values): 6.418793010059744 seconds
5. Elapsed time (numpy .data): 7.334999438840896 seconds
"""
Elapsed time (CDAT main branch, single runtime): 0.12261438369750977 seconds

@chengzhuzhang
Copy link
Contributor Author

The first runtime is the current code and the second runtime is with .load(). I also ran e3sm_diags with commit ef44fc6 (#842)

@tomvothecoder thank you for performing the timing test. Interesting that when I test your commit. Running through a configuration with each subset, it took 24 mins. Without, .load(), the total time is about 3 mins, which I consider at least on par with cdat code. In this case, maybe we should drop the .load() change?

@tomvothecoder
Copy link
Collaborator

The first runtime is the current code and the second runtime is with .load(). I also ran e3sm_diags with commit ef44fc6 (#842)

@tomvothecoder thank you for performing the timing test. Interesting that when I test your commit. Running through a configuration with each subset, it took 24 mins. Without, .load(), the total time is about 3 mins, which I consider at least on par with cdat code. In this case, maybe we should drop the .load() change?

I would think that loading the derived variables dataset into memory shouldn't slow down performance unless the datasets were extremely large (which we should use Dask chunking for).

I only benchmarked performance for the formula computations. I will benchmark a complete run to verify your findings and determine if we should revert the commit or not.

Side-note:

It could be that the logic I implemented already stores the dataset in-memory, since it merges multiple xr.Dataset objects opened via open_dataset() (uses numpy arrays) instead of using open_mfdataset() (uses Dask arrays). If this the case, I don't see how .load() would improve the speed of the formula computations though.

for var in vars_to_get:
ds = self._get_time_series_dataset_obj(var)
datasets.append(ds)
ds = xr.merge(datasets)
ds = squeeze_time_dim(ds)
ds.load(scheduler="sync")
return ds

@tomvothecoder
Copy link
Collaborator

RE: #842 (comment)
I found adding .load() in _get_dataset_with_source_vars() adds 2-4 minutes of runtime to a complete arm_diags run.

I reverted this change and will now address the pre-commit issues.

# Commit: 58361c49-4b1b-11ec-9b3b-9c5c8e2f5e4e (no .load())
# run_set function took 281.78 seconds to complete.
# run_set function took 332.79 seconds to complete.
# Commit: ef44fc6ffd538a5e257b097b99f7a1a79b79bc3b (with .load())
# run_set function took 472.81 seconds to complete.

@tomvothecoder
Copy link
Collaborator

tomvothecoder commented Sep 13, 2024

Hey @chengzhuzhang, I fixed the pre-commit issues in 2c248fb (#842). I also performed initial code cleanup since I would probably be doing that later anyways. Refer to the commit message and my review comments for more information.

I re-ran all sets and they completed successfully with these changes, although I noticed some of the diagnostic sets weren't done yet.

Copy link
Collaborator

@tomvothecoder tomvothecoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial review comment and some questions of FIXME comments.

e3sm_diags/driver/arm_diags_driver.py Show resolved Hide resolved
e3sm_diags/driver/arm_diags_driver.py Show resolved Hide resolved
e3sm_diags/driver/arm_diags_driver.py Outdated Show resolved Hide resolved
e3sm_diags/driver/arm_diags_driver.py Outdated Show resolved Hide resolved
e3sm_diags/driver/arm_diags_driver.py Show resolved Hide resolved
e3sm_diags/driver/arm_diags_driver.py Outdated Show resolved Hide resolved
e3sm_diags/driver/arm_diags_driver.py Outdated Show resolved Hide resolved
e3sm_diags/driver/arm_diags_driver.py Outdated Show resolved Hide resolved
@chengzhuzhang
Copy link
Contributor Author

Thank you @tomvothecoder I will do another pass to see if there are other sets need to be refactored. Thanks a lot for fixing and clean this branch.

@chengzhuzhang
Copy link
Contributor Author

@tomvothecoder I applied png regression tests. And all figures are produced and results are as expected as noted in the notebook.
Also performance-wise, the refactored codes are similar to the original codes. wall time is ~ 10mins with 4 workers for the full arm_diags set.

@tomvothecoder
Copy link
Collaborator

Hey @chengzhuzhang, I noticed you have several other PRs you're working on right now. I'm happy to help finish up refactoring the last diag function in this PR. Let me know.

@chengzhuzhang
Copy link
Contributor Author

Hey @chengzhuzhang, I noticed you have several other PRs you're working on right now. I'm happy to help finish up refactoring the last diag function in this PR. Let me know.

@tomvothecoder it would be great if you could help me on finish this last diag, so that I'm not holding back the progress to merge!

@tomvothecoder
Copy link
Collaborator

Hey @chengzhuzhang, I noticed you have several other PRs you're working on right now. I'm happy to help finish up refactoring the last diag function in this PR. Let me know.

@tomvothecoder it would be great if you could help me on finish this last diag, so that I'm not holding back the progress to merge!

I found that no sets actually run "annual_cycle_aerosol" (arm_diags_model_vs_obs.cfg, arm_diags_model_vs_model.cfg). Unless we expect to run this diagnostic in the future, I think we can delete _run_diag_annual_cycle_aerosol instead of refactoring it.

@chengzhuzhang
Copy link
Contributor Author

@tomvothecoder good for catching this. I think for now we can just create an issue to log this problem, and we can add back the code at a later time.

@@ -152,6 +152,7 @@ def climo(dataset: xr.Dataset, var_key: str, freq: ClimoFreq):
# averaging.
dims = [dim for dim in dv.dims if dim != time_coords.name]
coords = {k: v for k, v in dv.coords.items() if k in dims}
climo = climo.squeeze(axis=0)
Copy link
Collaborator

@tomvothecoder tomvothecoder Sep 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixes bug when ncycle == 1 where the climo variable time axis is not being squeezed which causes the rebuilt dv_climo DataArray to fail with

FAILED tests/e3sm_diags/driver/utils/test_climo_xr.py::TestClimo::test_returns_annual_cycle_climatology - ValueError: different number of dimensions on data and dims: 3 vs 2
FAILED tests/e3sm_diags/driver/utils/test_climo_xr.py::TestClimo::test_returns_DJF_season_climatology - ValueError: different number of dimensions on data and dims: 3 vs 2
FAILED tests/e3sm_diags/driver/utils/test_climo_xr.py::TestClimo::test_returns_MAM_season_climatology - ValueError: different number of dimensions on data and dims: 3 vs 2
FAILED tests/e3sm_diags/driver/utils/test_climo_xr.py::TestClimo::test_returns_JJA_season_climatology - ValueError: different number of dimensions on data and dims: 3 vs 2
FAILED tests/e3sm_diags/driver/utils/test_climo_xr.py::TestClimo::test_returns_SON_season_climatology - ValueError: different number of dimensions on data and dims: 3 vs 2
FAILED tests/e3sm_diags/driver/utils/test_climo_xr.py::TestClimo::test_returns_jan_climatology - ValueError: different number of dimensions on data and dims: 3 vs 2
FAILED tests/e3sm_diags/driver/utils/test_climo_xr.py::TestClimo::test_returns_climatology_for_derived_variable - ValueError: different number of dimensions on data and dims: 3 vs 2
FAILED tests/e3sm_diags/driver/utils/test_dataset_xr.py::TestGetClimoDataset::test_returns_climo_dataset_using_climo_of_time_series_files - ValueError: different number of dimensions on data and dims: 3 vs 2

@tomvothecoder
Copy link
Collaborator

tomvothecoder commented Sep 27, 2024

  • Fix integration test failing
  • Address remaining FIXME and TODO: items in arm_diags_driver.py

Regression test results

.png regression test shows all plots are identical except the following, which have missing data (white spaces) compared to main:

  • Investigate the following plots because they have white spaces on dev branch for some reason -- CLOUD test variable -- not a concern, dev branch is doing the right thing by masking closer to the surface at 1000 mb pressure level
Comparing:
    * /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/main/arm_diags/armdiags-CLOUD-ANNUALCYCLE-nsac1-test.png
    * /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/667-arm_diags-final/arm_diags/armdiags-CLOUD-ANNUALCYCLE-nsac1-test.png
     * Difference path /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/667-arm_diags-final/arm_diags_diff/armdiags-CLOUD-ANNUALCYCLE-nsac1-test.png
Comparing:
    * /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/main/arm_diags/armdiags-CLOUD-ANNUALCYCLE-sgpc1-test.png
    * /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/667-arm_diags-final/arm_diags/armdiags-CLOUD-ANNUALCYCLE-sgpc1-test.png
     * Difference path /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/667-arm_diags-final/arm_diags_diff/armdiags-CLOUD-ANNUALCYCLE-sgpc1-test.png
Comparing:
    * /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/main/arm_diags/armdiags-CLOUD-ANNUALCYCLE-twpc1-test.png
    * /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/667-arm_diags-final/arm_diags/armdiags-CLOUD-ANNUALCYCLE-twpc1-test.png
     * Difference path /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/667-arm_diags-final/arm_diags_diff/armdiags-CLOUD-ANNUALCYCLE-twpc1-test.png
Comparing:
    * /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/main/arm_diags/armdiags-CLOUD-ANNUALCYCLE-twpc2-test.png
    * /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/667-arm_diags-final/arm_diags/armdiags-CLOUD-ANNUALCYCLE-twpc2-test.png
     * Difference path /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/667-arm_diags-final/arm_diags_diff/armdiags-CLOUD-ANNUALCYCLE-twpc2-test.png
Comparing:
    * /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/main/arm_diags/armdiags-CLOUD-ANNUALCYCLE-twpc3-test.png
    * /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/667-arm_diags-final/arm_diags/armdiags-CLOUD-ANNUALCYCLE-twpc3-test.png
     * Difference path /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/667-arm_diags-final/arm_diags_diff/armdiags-CLOUD-ANNUALCYCLE-twpc3-test.png

Copy link
Collaborator

@tomvothecoder tomvothecoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remaining TODO and FIXME comments in arm_diags_driver.py

e3sm_diags/driver/arm_diags_driver.py Show resolved Hide resolved
e3sm_diags/driver/arm_diags_driver.py Show resolved Hide resolved
e3sm_diags/driver/arm_diags_driver.py Outdated Show resolved Hide resolved
@tomvothecoder tomvothecoder changed the title CDAT Migration Phase 2: Refactor arm_diags set (try 2) CDAT Migration Phase 2: Refactor arm_diags set Sep 30, 2024
@tomvothecoder tomvothecoder force-pushed the refactor/667-arm_diags_rebase branch from 18357c5 to 7b53932 Compare October 1, 2024 17:49
Comment on lines +1406 to +1445
def _exclude_sub_monthly_coord_spanning_year(
self, ds_subset: xr.Dataset
) -> xr.Dataset:
"""
Exclude the last time coordinate for sub-monthly data if it extends into
the next year.

Excluding end time coordinates that extend to the next year is
necessary because downstream operations such as annual cycle climatology
should consist of data for full years for accurate calculations.

For example, if the time slice is ("0001-01-01", "0002-01-01") and
the last time coordinate is:
* "0002-01-01" -> exclude
* "0001-12-31" -> don't exclude

Parameters
----------
ds_subset : xr.Dataset
The subsetted dataset.

Returns
-------
xr.Dataset
The dataset with the last time coordinate excluded if necessary.

Notes
-----
This function replicates the CDAT cdms2 "co" slice flag (close, open).
"""
time_dim = xc.get_dim_keys(ds_subset, axis="T")
time_values = ds_subset[time_dim]
last_time_year = time_values[-1].dt.year.item()
second_last_time_year = time_values[-2].dt.year.item()

if self.is_sub_monthly and last_time_year > second_last_time_year:
ds_subset = ds_subset.isel(time=slice(0, -1))

return ds_subset

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replicates the "co" slice flag for sub-monthly data

@tomvothecoder
Copy link
Collaborator

tomvothecoder commented Oct 1, 2024

This PR should be good to go. The integration test image checker fails because there is now one additional plot for the sgpc1 region that was generated with a fix in this PR (related line)

I will merge this PR even though the integration tests fail. I will update the complete test results on LCRC using the cdat-migration-fy24 branch. The complete test results should also include the QBO wavelet plot from #850.

@tomvothecoder tomvothecoder merged commit 22cd0f5 into cdat-migration-fy24 Oct 1, 2024
2 of 4 checks passed
@tomvothecoder tomvothecoder deleted the refactor/667-arm_diags_rebase branch October 1, 2024 19:27
tomvothecoder added a commit that referenced this pull request Dec 5, 2024
Refer to the PR for more information because the changelog is massive.

Update build workflow to run on `cdat-migration-fy24` branch

CDAT Migration Phase 2: Add CDAT regression test notebook template and fix GH Actions build (#743)

- Add Makefile for quick access to multiple Python-based commands such as linting, testing, cleaning up cache and build files
- Fix some lingering unit tests failure
- Update `xcdat=0.6.0rc1` to `xcdat >=0.6.0` in `ci.yml`, `dev.yml` and `dev-nompi.yml`
- Add `xskillscore` to `ci.yml`
- Fix `pre-commit` issues

CDAT Migration Phase 2: Regression testing for `lat_lon`, `lat_lon_land`, and `lat_lon_river` (#744)

- Add Makefile that simplifies common development commands (building and installing, testing, etc.)
- Write unit tests to cover all new code for utility functions
  - `dataset_xr.py`, `metrics.py`, `climo_xr.py`, `io.py`, `regrid.py`
- Metrics comparison for  `cdat-migration-fy24` `lat_lon` and `main` branch of `lat_lon` -- `NET_FLUX_SRF` and `RESTOM` have the highest spatial average diffs
- Test run with 3D variables (`_run_3d_diags()`)
  - Fix Python 3.9 bug with using pipe command to represent Union -- doesn't work with `from __future__ import annotations` still
  - Fix subsetting syntax bug using ilev
  - Fix regridding bug where a single plev is passed and xCDAT does not allow generating bounds for coordinates of len <= 1 -- add conditional that just ignores adding new bounds for regridded output datasets, fix related tests
  - Fix accidentally calling save plots and metrics twice in `_get_metrics_by_region()`
- Fix failing integration tests pass in CI/CD
  - Refactor `test_diags.py` -- replace unittest with pytest
  - Refactor `test_all_sets.py` -- replace unittest with pytest
  - Test climatology datasets -- tested with 3d variables using `test_all_sets.py`

CDAT Migration Phase 2: Refactor utilities and CoreParameter methods for reusability across diagnostic sets (#746)

- Move driver type annotations to `type_annotations.py`
- Move `lat_lon_driver._save_data_metrics_and_plots()` to `io.py`
- Update `_save_data_metrics_and_plots` args to accept `plot_func` callable
- Update `metrics.spatial_avg` to return an optionally `xr.DataArray` with `as_list=False`
- Move `parameter` arg to the top in `lat_lon_plot.plot`
- Move `_set_param_output_attrs` and `_set_name_yr_attrs` from `lat_lon_driver` to `CoreParameter` class

Regression testing for lat_lon variables `NET_FLUX_SRF` and `RESTOM` (#754)

Update regression test notebook to show validation of all vars

Add `subset_and_align_datasets()` to regrid.py (#776)

Add template run scripts

CDAT Migration Phase: Refactor `cosp_histogram` set (#748)

- Refactor `cosp_histogram_driver.py` and `cosp_histogram_plot.py`
- `formulas_cosp.py` (new file)
  - Includes refactored, Xarray-based `cosp_histogram_standard()` and `cosp_bin_sum()` functions
  - I wrote a lot of new code in `formulas_cosp.py` to clean up `derivations.py` and the old equivalent functions in `utils.py`
- `derivations.py`
  - Cleaned up portions of `DERIVED_VARIABLES` dictionary
  - Removed unnecessary `OrderedDict` usage for `cosp_histogram` related variables (we should do this for the rest of the variables in in #716)
  - Remove unnecessary `convert_units()` function calls
  - Move cloud levels passed to derived variable formulas to `formulas_cosp.CLOUD_BIN_SUM_MAP`
- `utils.py`
  - Delete deprecated, CDAT-based `cosp_histogram` functions
- `dataset_xr.py`
  - Add `dataset_xr.Dataset._open_climo_dataset()` method with a catch for dataset quality issues where "time" is a scalar variable that does not match the "time" dimension array length, drops this variable and replaces it with the correct coordinate
  -  Update `_get_dataset_with_derivation_func()` to handle derivation functions that require the `xr.Dataset` and `target_var_key` args (e.g., `cosp_histogram_standardize()` and `cosp_bin_sum()`)
- `io.py`
  - Update `_write_vars_to_netcdf()` to write test, ref, and diff variables to individual netCDF (required for easy comparison to CDAT-based code that does the same thing)
- Add `cdat_migration_regression_test_netcdf.ipynb` validation notebook template for comparing `.nc` files

CDAT Migration Phase 2: Refactor `zonal_mean_2d()` and `zonal_mean_2d_stratosphere()` sets (#774)

Refactor 654 zonal mean xy (#752)

Co-authored-by: Tom Vo <[email protected]>

CDAT Migration - Update run script output directory to NERSC public webserver (#793)

[PR]: CDAT Migration: Refactor `aerosol_aeronet` set (#788)

CDAT Migration: Test `lat_lon` set with run script and debug any issues (#794)

CDAT Migration: Refactor `polar` set (#749)

Co-authored-by: Tom Vo <[email protected]>

Align order of calls to `_set_param_output_attrs`

CDAT Migration: Refactor `meridional_mean_2d` set (#795)

CDAT Migration: Refactor `aerosol_budget` (#800)

Add `acme.py` changes from PR #712 (#814)

* Add `acme.py` changes from PR #712

* Replace unnecessary lambda call

Refactor area_mean_time_series and add ccb slice flag feature (#750)

Co-authored-by: Tom Vo <[email protected]>

[Refactor]: Validate fix in PR #750 for #759 (#815)

CDAT Migration Phase 2: Refactor `diurnal_cycle` set (#819)

CDAT Migration: Refactor annual_cycle_zonal_mean set (#798)

* Refactor `annual_cycle_zonal_mean` set

* Address PR review comments

* Add lat lon regression testing

* Add debugging scripts

* Update `_open_climo_dataset()` to decode times as workaround to misaligned time coords
- Update `annual_cycle_zonal_mean_plot.py` to convert time coordinates to month integers

* Fix unit tests

* Remove old plotter

* Add script to debug decode_times=True and ncclimo file

* Update plotter time values to month integers

* Fix slow `.load()` and multiprocessing issue
- Due to incorrectly updating `keep_bnds` logic
- Add `_encode_time_coords()` to workaround cftime issue `ValueError: "months since" units only allowed for "360_day" calendar`

* Update `_encode_time_coords()` docstring

* Add AODVIS debug script

* update AODVIS obs datasets; regression test results

---------

Co-authored-by: Tom Vo <[email protected]>

CDAT Migration Phase 2: Refactor `qbo` set (#826)

CDAT Migration Phase 2: Refactor tc_analysis set  (#829)

* start tc_analysis_refactor

* update driver

* update plotting

* Clean up plotter
- Remove unused variables
- Make `plot_info` a constant called `PLOT_INFO`, which is now a dict of dicts
- Reorder functions for top-down readability

* Remove unused notebook

---------

Co-authored-by: tomvothecoder <[email protected]>

CDAT Migration Phase 2: Refactor `enso_diags` set (#832)

CDAT Migration Phase 2: Refactor `streamflow` set (#837)

[Bug]: CDAT Migration Phase 2: enso_diags plot fixes (#841)

[Refactor]: CDAT Migration Phase 3: testing and documentation update (#846)

CDAT Migration Phase 3 - Port QBO Wavelet feature to Xarray/xCDAT codebase (#860)

CDAT Migration Phase 2: Refactor arm_diags set (#842)

Add performance benchmark material (#864)

Add function to add CF axis attr to Z axis if missing for downstream xCDAT operations (#865)

CDAT Migration Phase 3: Add Convective Precipitation Fraction in lat-lon (#875)

CDAT Migration Phase 3: Fix LHFLX name and add catch for non-existent or empty TE stitch file (#876)

Add support for time series datasets via glob and fix `enso_diags` set (#866)

Add fix for checking `is_time_series()` property based on `data_type` attr (#881)

CDAT migration: Fix African easterly wave density plots in TC analysis and convert H20LNZ units to ppm/volume (#882)

CDAT Migration: Update `mp_partition_driver.py` to use Dataset from `dataset_xr.py` (#883)

CDAT Migration - Port JJB tropical subseasonal diags to Xarray/xCDAT (#887)

CDAT Migration: Prepare branch for merge to `main` (#885)

[Refactor]: CDAT Migration - Update dependencies and remove Dataset._add_cf_attrs_to_z_axes() (#891)

CDAT Migration Phase 2: Refactor core utilities and  `lat_lon` set (#677)

Refer to the PR for more information because the changelog is massive.

Update build workflow to run on `cdat-migration-fy24` branch

CDAT Migration Phase 2: Add CDAT regression test notebook template and fix GH Actions build (#743)

- Add Makefile for quick access to multiple Python-based commands such as linting, testing, cleaning up cache and build files
- Fix some lingering unit tests failure
- Update `xcdat=0.6.0rc1` to `xcdat >=0.6.0` in `ci.yml`, `dev.yml` and `dev-nompi.yml`
- Add `xskillscore` to `ci.yml`
- Fix `pre-commit` issues

CDAT Migration Phase 2: Regression testing for `lat_lon`, `lat_lon_land`, and `lat_lon_river` (#744)

- Add Makefile that simplifies common development commands (building and installing, testing, etc.)
- Write unit tests to cover all new code for utility functions
  - `dataset_xr.py`, `metrics.py`, `climo_xr.py`, `io.py`, `regrid.py`
- Metrics comparison for  `cdat-migration-fy24` `lat_lon` and `main` branch of `lat_lon` -- `NET_FLUX_SRF` and `RESTOM` have the highest spatial average diffs
- Test run with 3D variables (`_run_3d_diags()`)
  - Fix Python 3.9 bug with using pipe command to represent Union -- doesn't work with `from __future__ import annotations` still
  - Fix subsetting syntax bug using ilev
  - Fix regridding bug where a single plev is passed and xCDAT does not allow generating bounds for coordinates of len <= 1 -- add conditional that just ignores adding new bounds for regridded output datasets, fix related tests
  - Fix accidentally calling save plots and metrics twice in `_get_metrics_by_region()`
- Fix failing integration tests pass in CI/CD
  - Refactor `test_diags.py` -- replace unittest with pytest
  - Refactor `test_all_sets.py` -- replace unittest with pytest
  - Test climatology datasets -- tested with 3d variables using `test_all_sets.py`

CDAT Migration Phase 2: Refactor utilities and CoreParameter methods for reusability across diagnostic sets (#746)

- Move driver type annotations to `type_annotations.py`
- Move `lat_lon_driver._save_data_metrics_and_plots()` to `io.py`
- Update `_save_data_metrics_and_plots` args to accept `plot_func` callable
- Update `metrics.spatial_avg` to return an optionally `xr.DataArray` with `as_list=False`
- Move `parameter` arg to the top in `lat_lon_plot.plot`
- Move `_set_param_output_attrs` and `_set_name_yr_attrs` from `lat_lon_driver` to `CoreParameter` class

CDAT Migration Phase 2: Refactor `zonal_mean_2d()` and `zonal_mean_2d_stratosphere()` sets (#774)

CDAT Migration Phase 2: Refactor `qbo` set (#826)
tomvothecoder added a commit that referenced this pull request Dec 5, 2024
Refer to the PR for more information because the changelog is massive.

Update build workflow to run on `cdat-migration-fy24` branch

CDAT Migration Phase 2: Add CDAT regression test notebook template and fix GH Actions build (#743)

- Add Makefile for quick access to multiple Python-based commands such as linting, testing, cleaning up cache and build files
- Fix some lingering unit tests failure
- Update `xcdat=0.6.0rc1` to `xcdat >=0.6.0` in `ci.yml`, `dev.yml` and `dev-nompi.yml`
- Add `xskillscore` to `ci.yml`
- Fix `pre-commit` issues

CDAT Migration Phase 2: Regression testing for `lat_lon`, `lat_lon_land`, and `lat_lon_river` (#744)

- Add Makefile that simplifies common development commands (building and installing, testing, etc.)
- Write unit tests to cover all new code for utility functions
  - `dataset_xr.py`, `metrics.py`, `climo_xr.py`, `io.py`, `regrid.py`
- Metrics comparison for  `cdat-migration-fy24` `lat_lon` and `main` branch of `lat_lon` -- `NET_FLUX_SRF` and `RESTOM` have the highest spatial average diffs
- Test run with 3D variables (`_run_3d_diags()`)
  - Fix Python 3.9 bug with using pipe command to represent Union -- doesn't work with `from __future__ import annotations` still
  - Fix subsetting syntax bug using ilev
  - Fix regridding bug where a single plev is passed and xCDAT does not allow generating bounds for coordinates of len <= 1 -- add conditional that just ignores adding new bounds for regridded output datasets, fix related tests
  - Fix accidentally calling save plots and metrics twice in `_get_metrics_by_region()`
- Fix failing integration tests pass in CI/CD
  - Refactor `test_diags.py` -- replace unittest with pytest
  - Refactor `test_all_sets.py` -- replace unittest with pytest
  - Test climatology datasets -- tested with 3d variables using `test_all_sets.py`

CDAT Migration Phase 2: Refactor utilities and CoreParameter methods for reusability across diagnostic sets (#746)

- Move driver type annotations to `type_annotations.py`
- Move `lat_lon_driver._save_data_metrics_and_plots()` to `io.py`
- Update `_save_data_metrics_and_plots` args to accept `plot_func` callable
- Update `metrics.spatial_avg` to return an optionally `xr.DataArray` with `as_list=False`
- Move `parameter` arg to the top in `lat_lon_plot.plot`
- Move `_set_param_output_attrs` and `_set_name_yr_attrs` from `lat_lon_driver` to `CoreParameter` class

Regression testing for lat_lon variables `NET_FLUX_SRF` and `RESTOM` (#754)

Update regression test notebook to show validation of all vars

Add `subset_and_align_datasets()` to regrid.py (#776)

Add template run scripts

CDAT Migration Phase: Refactor `cosp_histogram` set (#748)

- Refactor `cosp_histogram_driver.py` and `cosp_histogram_plot.py`
- `formulas_cosp.py` (new file)
  - Includes refactored, Xarray-based `cosp_histogram_standard()` and `cosp_bin_sum()` functions
  - I wrote a lot of new code in `formulas_cosp.py` to clean up `derivations.py` and the old equivalent functions in `utils.py`
- `derivations.py`
  - Cleaned up portions of `DERIVED_VARIABLES` dictionary
  - Removed unnecessary `OrderedDict` usage for `cosp_histogram` related variables (we should do this for the rest of the variables in in #716)
  - Remove unnecessary `convert_units()` function calls
  - Move cloud levels passed to derived variable formulas to `formulas_cosp.CLOUD_BIN_SUM_MAP`
- `utils.py`
  - Delete deprecated, CDAT-based `cosp_histogram` functions
- `dataset_xr.py`
  - Add `dataset_xr.Dataset._open_climo_dataset()` method with a catch for dataset quality issues where "time" is a scalar variable that does not match the "time" dimension array length, drops this variable and replaces it with the correct coordinate
  -  Update `_get_dataset_with_derivation_func()` to handle derivation functions that require the `xr.Dataset` and `target_var_key` args (e.g., `cosp_histogram_standardize()` and `cosp_bin_sum()`)
- `io.py`
  - Update `_write_vars_to_netcdf()` to write test, ref, and diff variables to individual netCDF (required for easy comparison to CDAT-based code that does the same thing)
- Add `cdat_migration_regression_test_netcdf.ipynb` validation notebook template for comparing `.nc` files

CDAT Migration Phase 2: Refactor `zonal_mean_2d()` and `zonal_mean_2d_stratosphere()` sets (#774)

Refactor 654 zonal mean xy (#752)

Co-authored-by: Tom Vo <[email protected]>

CDAT Migration - Update run script output directory to NERSC public webserver (#793)

[PR]: CDAT Migration: Refactor `aerosol_aeronet` set (#788)

CDAT Migration: Test `lat_lon` set with run script and debug any issues (#794)

CDAT Migration: Refactor `polar` set (#749)

Co-authored-by: Tom Vo <[email protected]>

Align order of calls to `_set_param_output_attrs`

CDAT Migration: Refactor `meridional_mean_2d` set (#795)

CDAT Migration: Refactor `aerosol_budget` (#800)

Add `acme.py` changes from PR #712 (#814)

* Add `acme.py` changes from PR #712

* Replace unnecessary lambda call

Refactor area_mean_time_series and add ccb slice flag feature (#750)

Co-authored-by: Tom Vo <[email protected]>

[Refactor]: Validate fix in PR #750 for #759 (#815)

CDAT Migration Phase 2: Refactor `diurnal_cycle` set (#819)

CDAT Migration: Refactor annual_cycle_zonal_mean set (#798)

* Refactor `annual_cycle_zonal_mean` set

* Address PR review comments

* Add lat lon regression testing

* Add debugging scripts

* Update `_open_climo_dataset()` to decode times as workaround to misaligned time coords
- Update `annual_cycle_zonal_mean_plot.py` to convert time coordinates to month integers

* Fix unit tests

* Remove old plotter

* Add script to debug decode_times=True and ncclimo file

* Update plotter time values to month integers

* Fix slow `.load()` and multiprocessing issue
- Due to incorrectly updating `keep_bnds` logic
- Add `_encode_time_coords()` to workaround cftime issue `ValueError: "months since" units only allowed for "360_day" calendar`

* Update `_encode_time_coords()` docstring

* Add AODVIS debug script

* update AODVIS obs datasets; regression test results

---------

Co-authored-by: Tom Vo <[email protected]>

CDAT Migration Phase 2: Refactor `qbo` set (#826)

CDAT Migration Phase 2: Refactor tc_analysis set  (#829)

* start tc_analysis_refactor

* update driver

* update plotting

* Clean up plotter
- Remove unused variables
- Make `plot_info` a constant called `PLOT_INFO`, which is now a dict of dicts
- Reorder functions for top-down readability

* Remove unused notebook

---------

Co-authored-by: tomvothecoder <[email protected]>

CDAT Migration Phase 2: Refactor `enso_diags` set (#832)

CDAT Migration Phase 2: Refactor `streamflow` set (#837)

[Bug]: CDAT Migration Phase 2: enso_diags plot fixes (#841)

[Refactor]: CDAT Migration Phase 3: testing and documentation update (#846)

CDAT Migration Phase 3 - Port QBO Wavelet feature to Xarray/xCDAT codebase (#860)

CDAT Migration Phase 2: Refactor arm_diags set (#842)

Add performance benchmark material (#864)

Add function to add CF axis attr to Z axis if missing for downstream xCDAT operations (#865)

CDAT Migration Phase 3: Add Convective Precipitation Fraction in lat-lon (#875)

CDAT Migration Phase 3: Fix LHFLX name and add catch for non-existent or empty TE stitch file (#876)

Add support for time series datasets via glob and fix `enso_diags` set (#866)

Add fix for checking `is_time_series()` property based on `data_type` attr (#881)

CDAT migration: Fix African easterly wave density plots in TC analysis and convert H20LNZ units to ppm/volume (#882)

CDAT Migration: Update `mp_partition_driver.py` to use Dataset from `dataset_xr.py` (#883)

CDAT Migration - Port JJB tropical subseasonal diags to Xarray/xCDAT (#887)

CDAT Migration: Prepare branch for merge to `main` (#885)

[Refactor]: CDAT Migration - Update dependencies and remove Dataset._add_cf_attrs_to_z_axes() (#891)

CDAT Migration Phase 2: Refactor core utilities and  `lat_lon` set (#677)

Refer to the PR for more information because the changelog is massive.

Update build workflow to run on `cdat-migration-fy24` branch

CDAT Migration Phase 2: Add CDAT regression test notebook template and fix GH Actions build (#743)

- Add Makefile for quick access to multiple Python-based commands such as linting, testing, cleaning up cache and build files
- Fix some lingering unit tests failure
- Update `xcdat=0.6.0rc1` to `xcdat >=0.6.0` in `ci.yml`, `dev.yml` and `dev-nompi.yml`
- Add `xskillscore` to `ci.yml`
- Fix `pre-commit` issues

CDAT Migration Phase 2: Regression testing for `lat_lon`, `lat_lon_land`, and `lat_lon_river` (#744)

- Add Makefile that simplifies common development commands (building and installing, testing, etc.)
- Write unit tests to cover all new code for utility functions
  - `dataset_xr.py`, `metrics.py`, `climo_xr.py`, `io.py`, `regrid.py`
- Metrics comparison for  `cdat-migration-fy24` `lat_lon` and `main` branch of `lat_lon` -- `NET_FLUX_SRF` and `RESTOM` have the highest spatial average diffs
- Test run with 3D variables (`_run_3d_diags()`)
  - Fix Python 3.9 bug with using pipe command to represent Union -- doesn't work with `from __future__ import annotations` still
  - Fix subsetting syntax bug using ilev
  - Fix regridding bug where a single plev is passed and xCDAT does not allow generating bounds for coordinates of len <= 1 -- add conditional that just ignores adding new bounds for regridded output datasets, fix related tests
  - Fix accidentally calling save plots and metrics twice in `_get_metrics_by_region()`
- Fix failing integration tests pass in CI/CD
  - Refactor `test_diags.py` -- replace unittest with pytest
  - Refactor `test_all_sets.py` -- replace unittest with pytest
  - Test climatology datasets -- tested with 3d variables using `test_all_sets.py`

CDAT Migration Phase 2: Refactor utilities and CoreParameter methods for reusability across diagnostic sets (#746)

- Move driver type annotations to `type_annotations.py`
- Move `lat_lon_driver._save_data_metrics_and_plots()` to `io.py`
- Update `_save_data_metrics_and_plots` args to accept `plot_func` callable
- Update `metrics.spatial_avg` to return an optionally `xr.DataArray` with `as_list=False`
- Move `parameter` arg to the top in `lat_lon_plot.plot`
- Move `_set_param_output_attrs` and `_set_name_yr_attrs` from `lat_lon_driver` to `CoreParameter` class

CDAT Migration Phase 2: Refactor `zonal_mean_2d()` and `zonal_mean_2d_stratosphere()` sets (#774)

CDAT Migration Phase 2: Refactor `qbo` set (#826)
@chengzhuzhang chengzhuzhang mentioned this pull request Jan 9, 2025
14 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cdat-migration-fy24 CDAT Migration FY24 Task
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants