Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/support multiple lams to the Cutout class #113

Merged
merged 15 commits into from
Nov 8, 2024

Conversation

paulina-t
Copy link
Contributor

Summary

This PR adds functionality to the Cutout class to support multiple Limited Area Models (LAMs), along with a global dataset. This enhancement allows for hierarchical cutouts across multiple LAMs, handling of overlapping regions and different resolutions.

Motivation

The primary motivation for this change is to allow the anemoi-datasets package to perform cutouts for multiple LAM datasets (such as MEPS and AROME-Arctic) simultaneously, increasing flexibility in data handling.

Key Changes

Multi-LAM support in cutout: Enhanced the Cutout class to support multiple LAM datasets, handling them hierarchically where overlapping areas are masked sequentially.
Grid handling and resolution support: Updated the grids to compute and return grid point counts for each LAM and the global dataset, supporting varying resolutions.
Hierarchical masking: Introduced hierarchical masking of latitude and longitude in overlapping regions for multiple LAMs. The LAMs are provided in a list, with the first element having the highest priority (i.e., it is not masked). LAM no. 2 contains all the points not in LAM no. 1, and so forth. The last LAM in the list has the points that are not in any other LAM, and the global dataset is masked by all the LAMs.

Testing

Notebook Testing: Created a new notebook, grids_multilam.ipynb, based on grids.ipynb, to plot and visually verify that the Cutout class handles multiple LAM datasets correctly. This includes testing that the cutout works for:
- LAMs with different resolutions
- LAMs that do not overlap
- LAMs entirely contained within the domain of another LAM
- LAMs with a coarser resolution than the global dataset
- Compatibility with thinning and cropping

Note that the notebook relies on datasets generated from the files grids*.yaml. I added these files for reproducibility, but you might not want to include them in the repository. Let me know if there are other datasets you prefer to use.

Dependencies

No new dependencies: This update builds on existing functionality in anemoi-datasets without introducing new dependencies.
Compatibility: Fully compatible with single LAMs as well.

Notes for Reviewers

Potential areas of impact:
- Due to the significant updates in grid handling and masking logic, reviewers may want to verify compatibility with other classes or functions that rely on Cutout.
- The behavior of the method collect_supporting_arrays may need review. I modified it to store all LAM masks and the global mask, but please let me know if this differs from the desired output.
Documentation: Please let me know if additional documentation is needed to explain the multi-LAM functionality or if a style change is needed.

@FussyDuck
Copy link

FussyDuck commented Nov 5, 2024

CLA assistant check
All committers have signed the CLA.

@b8raoult b8raoult merged commit e1ab0b8 into ecmwf:develop Nov 8, 2024
10 of 16 checks passed
b8raoult added a commit that referenced this pull request Nov 8, 2024
floriankrb added a commit that referenced this pull request Nov 15, 2024
* Add contributors (#105)

* Add contributors
Co-authored-by: Mario Santa Cruz <[email protected]>

* Feature/masks (#104)

* add masks

Co-authored-by: Florian Pinault <[email protected]>

* Feature/new checkpoints (#106)

* add masks

* Feature/new datasets (#99)

* main changes

* bugfix

* few bugs and add unit tests

* work with more planetary computer ds

* add optional dependencies

* qa

* make test optional when adls is not installed (#110)

* make test optional when adls is not installed

* changelog

* tests

* tests

* split tests

* Xarray-zarr example dataset recipe (#108)

* add a working xarray-zarr example

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: b8raoult <[email protected]>

* missing copyrigths

* missing copyrigths (#111)

* missing copyrigths

* fixing --test (changing only the behaviour of creating datasets with --test)

* more on testing

* fix tests

* Feature/support multiple lams to the Cutout class (#113)

* Enhance Cutout class to support multiple LAMs with hierarchical masking.

---------

Co-authored-by: Paulina Met. <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [pre-commit.ci] pre-commit autoupdate (#112)

* [pre-commit.ci] pre-commit autoupdate

updates:
- [github.com/psf/black-pre-commit-mirror: 24.8.0 → 24.10.0](psf/black-pre-commit-mirror@24.8.0...24.10.0)
- [github.com/astral-sh/ruff-pre-commit: v0.6.9 → v0.7.2](astral-sh/ruff-pre-commit@v0.6.9...v0.7.2)
- [github.com/tox-dev/pyproject-fmt: 2.2.4 → v2.5.0](tox-dev/pyproject-fmt@2.2.4...v2.5.0)

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update .pre-commit-config.yaml (#120)

* fix qa for the notebook on multilam (#123)

* feature/fix qa (#124)

* fix qa for the notebook on multilam
* fix qa for the yaml for multilam

* Naming guidelines not available to all users. fixing #119 (#125)

* upload with ssh (#94)

* add copy to ssh target

* Feature/new checkpoints (#107)

* add masks
* save masks to checkpoint
* name supporting_arrays
* better support for cutout
* force np.datetime64 is seconds
---------

Co-authored-by: Florian Pinault <[email protected]>

* Feature/merge (#126)

* save masks to checkpoint
* force np.datetime64 is seconds
* Call filters from anemoi-transform
* when merging datasets, consider missing dates
* add gcd for frequency

---------

Co-authored-by: Florian Pinault <[email protected]>

* Feature/use anemoi transform (#127)

* Call filters from anemoi-transform

---------

Co-authored-by: Florian Pinault <[email protected]>

* Revert "Feature/merge (#126)" "Feature/new checkpoints (#107)" "upload with ssh (#94)"

* redo "Revert "Feature/merge (#126)" "Feature/new checkpoints (#107)" "upload with ssh (#94)"

* fix merge

* Update to documentation on using missing dates (#128)

* Updated docs on using datasets with missing dates

* Simplify ci: run on develop, and on sundays. And disable downstream-ci-hpc. And test only python 3.11. And test only once when PR are updated. And shortest name to read the full description on github.ci. And test with only ubuntu. (same change as for anemoi-utils ecmwf/anemoi-utils#42) (#129)

* skipping long tests (#132)

---------

Co-authored-by: Matthew Chantry <[email protected]>
Co-authored-by: b8raoult <[email protected]>
Co-authored-by: Mariah Pope <[email protected]>
Co-authored-by: Timothy Smith <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Baudouin Raoult <[email protected]>
Co-authored-by: paulina-t <[email protected]>
Co-authored-by: Paulina Met. <[email protected]>
Co-authored-by: Jesper Dramsch <[email protected]>
Co-authored-by: Håvard Homleid Haugen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants