-
-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calculate dhdt with ICESat-2 ATL11 data over Antarctica #41
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Finding change where it's at, and quickly! There's nothing really special about calculating height range, it's just subtracting minimum height from maximum height. The `nanptp` function (which numpy doesn't have) merely accounts for NaN values, and I've put in the extra effort to parallelize it on dask using xr.apply_ufunc, so that it takes minutes to run on ~1 trillion points. Took a long detour packaging up `deepicedrain` properly before this dhdt notebook can be released, but it's worth it to polish out most of the cruft and reduce the amount of boilerplate preprocessing code. Also remove the need to close the atl11 test dataset as using to_dask() instead of read() seems to be nicer.
Check out this pull request on Review Jupyter notebook visual diffs & provide feedback on notebooks. Powered by ReviewNB |
Performing linear regression in parallel, on 10 million points, in about 3 minutes, plus a few extra minutes of preprocessing time. Again, the rate of height change over time is just based on an ordinary least squares linear regression algorithm, nothing too fancy. The `nan_linregress` function (which wraps around scipy.stats.linregress) accounts for NaN values by masking them out, and the linregress results are placed into a single numpy.ndarray so that we can parallelize it using xr.apply_ufunc. This is based on a lot of research, looking at stackoverflow answers and people's Github code snippets (all linked in the Pull Request). Will need to refactor a lot of elements in the coming week to keep things DRY, e.g. collapsing the datashade functionality into a one-liner. Probably need more tests too, and I've added test_nanptp_with_nan for good measure. Also patched Github Actions CI again 4aabf6e that was missing the actual `--no-root` statement.
Add a README.md file in the deepicedrain directory, listing out what each of the files (atlas_catalog.yaml, deltamath.py, spatiotemporal.py) are for! Also shifted usage instructions up on the main README.md, and updated the teaser image to one of dhdt over Antarctica!
Putting the datashader functionality into the Region class, so that we can make use of the bounding box information! Takes in a pandas.DataFrame table of x, y, z points, and outputs an xarray.DataArray grid for visualization purposes at the pre-set scale. Do some simple algebra math to set the correct aspect ratio with only plot_width as input. Standardized on the variable names to be ds_* for xarray.Datasets and df_* for pandas.Dataframes. Storing all of the intermediate Zarr and Parquet data files into an ATLXI folder. Will update plots in another commit once I sort out some issues, and maybe start a new file called visualization.py to handle the plotting code.
Tidy up our rate of height change over time (dhdt) code, putting them into an xarray.Dataset with proper names, and keeping things snappy by using chunks when reading intermediate Zarr stores. The hrange and dhdt plots have been updated to use the correct datashaded image aspect ratio as promised, both in the notebooks and the README.md! Also fixed coordinates of Kamb Ice Stream, as they were actually at Whillans Ice Stream, rookie x/y mistake.
2 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Finding change where it's at over Antarctica, and quickly before the ice melts! Calculating the rate of ice surface elevation change over time (dhdt) using ICESat-2 ATL11 data! For that, we'll do a simple linear regression to fit a trend line through the elevation points over time. Will focus on point locations where there is significant (>0.5 m) elevation change.
Amazing to think how long we've come since ICESat-1, especially in terms of parallel compute code like dask. There's more than a magnitude order increase in data, but it feels like we can do so much more now too.
TODO:
nanptp
, based onnumpy.ptp
, accounting forNaN
values as per ENH: adding new function,np.nanptp
numpy/numpy#13220 (a6f3a31)nan_linregress
, based onscipy.stats.linregress
, see also WIP: Better parallel implementation of linear regression jbusecke/xarrayutils#62 (dedd0f4)Citations:
References:
xarray.apply_ufunc