Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reusable SpatioTemporal functions #40

Merged
merged 7 commits into from
May 30, 2020
Merged

Reusable SpatioTemporal functions #40

merged 7 commits into from
May 30, 2020

Conversation

weiji14
Copy link
Owner

@weiji14 weiji14 commented May 28, 2020

Because time 🕛 matters! Found a better name than geo.py - spatiotemporal.py! Name inspired by the SpatioTemporal Asset Catalog, which is itself a cool project!

Also making a proper 'data package' following https://intake.readthedocs.io/en/latest/data-packages.html, so that we can reuse our data in our jupyter notebook scripts and tests. Basically allowing for:

dataset: xr.Dataset = deepicedrain.catalog.icesat2atl06.to_dask()
dataset: xr.Dataset = intake.cat.atlas_cat.icesat2atl06.to_dask()

TODO in this PR:

  • Rename the BBox class to Region, revamp the geospatial Region.subset function (0e89b7f)
  • Package atlas_catalog.yaml into deepicedrain and test it (7ea9c8d, 02904bb, 3c3d021)
  • Move time conversion code (GPS delta_time to utc_time) to spatiotemporal.py and test it (9956922)
  • Move geographic reprojection code (EPSG:4326 to EPSG:3031) to spatiotemporal.py and test it (4b64234)

TODO in future PRs:

References:

Found a better name than geo.py - spatiotemporal.py! Because time matters. Also renaming the BBox class to Region as the double capital letters in a row didn't look Pythonic. The Region.subset function has been revamped to be a lot more user friendly, giving the actual subsetted xr.Dataset instead of just the boolean array.
@weiji14 weiji14 added the enhancement ✨ New feature or request label May 28, 2020
@weiji14 weiji14 added this to the v0.2.0 milestone May 28, 2020
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

Review Jupyter notebook visual diffs & provide feedback on notebooks.


Powered by ReviewNB

weiji14 added 3 commits May 28, 2020 22:52
Enable importing of the ATLAS intake catalog straight from deepicedrain! This functions almost like a test fixture, enabling us to easily load ICESat-2 data easily in our scripts. I.e. keeping things DRY. Managed to get rid of the pytest fixture in test_calculate_delta.py which did the sample data loading from the catalog before. Renamed the very generic catalog.yaml to a slightly less generic atlas_catalog.yaml. Added some description metadata to that catalog file, and include nested in at11_test_case. Also ignoring .h5 data files now.
So that calling `deepicedrain.catalog` will actually work when people `pip install deepicedrain` without cloning the git repository (otherwise a FileNotFoundError is raised). Added a plugin to pyproject.toml so that the ATLAS catalog can be loaded via intake through `intake.cat.atlas_cat` too! Done by moving atlas_catalog.yaml and tests/ into the deepicedrain folder.

This relies on a bit of magic (good ones), using the Python 3.7+ importlib.resources module to locate the atlas_catalog.yaml file via a relative path where the package is installed (in site-packages). Basically following a modified version of https://github.com/intake/intake-examples/tree/04bbe1880f2a4d2c74c6ea9c54385c380c1b9a1e/data_package. Had to use {{ CATALOG_DIR }} to link to the test_catalog.yaml file too. Side effect of this is that we're bundling all our tests into the `deepicedrain` python package, which might be bad for file size but good for finding test examples I guess.
Ensure that the ATLAS intake catalog is able to be loaded, and secretly document it's usage a little bit. Make the Github Actions test pass by busting the poetry cache through bumping json5 from 0.9.4 to 0.9.5.
@weiji14 weiji14 force-pushed the geo_to_spatiotemporal branch from ea8e8c6 to 3c3d021 Compare May 29, 2020 01:37
Turn the ICESat-2 delta_time to utc_time conversion code in our jupyter notebook into a well tested function! The cool bit is that we can pass in either a dask or numpy backed xarray.DataArray, and get the equivalent output, with dimensions and coordinates preserved! Gotta love [NEP18](https://numpy.org/neps/nep-0018-array-function-protocol.html). Added a chunks statement to test_catalog.yaml, and ensure the file is cached in a relative path. Had to make sure to read the test atl11_dataset using dask and close it properly after each test (?) or subsequent tests will fail, seeing a numpy.array instead of a dask.array (?). Should do proper setup/teardown next time. Also bumping up cftime from 1.1.1.2 to 1.1.3 and certifi from 2019.11.28 to 2020.4.5.1 to bust the CI cache, just in case.
@weiji14 weiji14 force-pushed the geo_to_spatiotemporal branch from 2551cdb to 9956922 Compare May 30, 2020 00:05
@weiji14 weiji14 marked this pull request as ready for review May 30, 2020 03:09
@weiji14 weiji14 force-pushed the geo_to_spatiotemporal branch from 4b64234 to 88a7553 Compare May 30, 2020 03:24
Collapse the geographic reprojection code into a one-liner! Basically wraps around pyproj, and handles lazy dask.DataFrame and xarray.DataArray objects by including the will-be released workaround for handling __array__ objects (scheduled for pyproj 3.0). Reinstated the 'catalog' variable in atl06_play.ipynb, as it's used further down the notebook. Also hashing python files in deepicedrain to check whether we should bust the CI cache to reinstall `deepicedrain`, instead of manually bumping dependencies each time. That said, we'll bump up pyzmq from 19.0.0 to 19.0.1 and keep doing random bumps until this branch is merged into master.
@weiji14 weiji14 force-pushed the geo_to_spatiotemporal branch from 88a7553 to be98dde Compare May 30, 2020 03:54
Provide an example of using `deepicedrain` on the main README.md page. Added a YUML diagram showing how data flows from ATL06 to ATL11. Bumped up pyparsing from 2.4.6 to 2.4.7 for good measure to bust the CI cache. Also listed a few related ICESat-2 projects on Github.
@weiji14 weiji14 merged commit 69174ad into master May 30, 2020
@weiji14 weiji14 deleted the geo_to_spatiotemporal branch May 30, 2020 04:23
weiji14 added a commit that referenced this pull request Jun 3, 2020
Patches #40. The deltatime_to_utctime converter didn't handle pandas.Series properly, as the start_epoch variable would have an index of 0, and the datetime + timedelta operation would only get applied at index 0 instead of along the whole column. Calling squeeze() converts the pandas.Series to a pandas.Timestamp, so that the addition operation is broadcast to the whole column. This also works on an xarray.DataArray and numpy.array. Doesn't work for a dask.Series, but we can work that out when the need arises.
weiji14 added a commit that referenced this pull request Jun 3, 2020
Patches #40. The deltatime_to_utctime converter didn't handle pandas.Series properly, as the start_epoch variable would have an index of 0, and the datetime + timedelta operation would only get applied at index 0 instead of along the whole column. Calling squeeze() converts the pandas.Series to a pandas.Timestamp, so that the addition operation is broadcast to the whole column. This also works on an xarray.DataArray and numpy.array. Doesn't work for a dask.Series, but we can work that out when the need arises.
weiji14 added a commit that referenced this pull request Jun 10, 2020
Patches #40. Allow for converting a single numpy timedelta64 value to datetime64, instead of raising a "ValueError: Could not convert object to NumPy timedelta".
weiji14 added a commit that referenced this pull request Sep 22, 2020
Bumping pyproj from 2.6.0 to 3.0.0 allows us to get rid of the __array__ workaround added in be98dde/#40. Also took the opportunity to improve the documentation of the function's input parameters and output results.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement ✨ New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant