Skip to content

Latest commit

 

History

History
124 lines (80 loc) · 9.11 KB

README.md

File metadata and controls

124 lines (80 loc) · 9.11 KB

spacetime-elevation

End-of-hackweek summary

Update 26.11.23

New consolidated outcome notebooks 📔 can now be found in notebooks/, using functions coded consistently in spacetime/. 😁

Outcomes

This project of the GeoSMART Hackweek (initial pitch below) resulted in several investigations on Gaussian Process regression for big geospatial data, that may develop into discussions/PRs to upstream packages such as GPyTorch and PyKrige and/or into a consolidation of advances in the package GTSA. More specifically, the main perspectives related to the work done during the hackweek are to:

  1. Ensure consistency of Gaussian Process regression between geostatistical kriging packages (e.g., PyKrige) and machine-learning packages (e.g., GPyTorch, SciKit-Learn),
  2. Add plotting tools and Leave-One-Out Cross-Validation for 1- or 2-D kernels in machine-learning Gaussian Process packages (learning from good practices in geostats),
  3. Provide a scalable GP regression implementation in space, time and space+time for big geospatial data combined into a single package such as GTSA (wide range of tools necessary: Rioxarray, Geocube, Dask, Rechuncker, Xbatcher, GPyTorch) that notably supports adaptive chunks based on estimated kernel lengthscales.

More updates to come as things develop! 😄

Participants

Initial project pitch

Ready to elevate your spatiotemporal prediction skills? 👾 📡 🌐

This is a project of the GeoSMART Hackweek, taking place Oct 23-27 2023 at University of Washington.

Its focus is advancing spatiotemporal prediction of big data geospatial time series, in particular using Gaussian Processes. Our primary objective is to estimate the continuous evolution of snow and ice covered surface elevations, however the methods and tools we develop will be generic to any geospatial time series, whenever possible. We welcome team members that would like to explore other datasets, such as optical imagery, laser altimetry, synthetic aperture radar or temperature grids!

Part of this effort will include further development of the package Geospatial Time Series Analysis, GTSA, that provides routines for time-stacking and fitting geospatial gridded datasets out-of-memory, and possibly other upstream packages such as RioXarray and xDEM.

Graphic from GTSA:

Spatiotemporal prediction of surface elevation changes, and more

Summary

This projects aims to predict continuous spatiotemporal estimates from spatially and temporally sparse measurements.

It is primarily a software and data science method-oriented project, with the following three points of focus (decreasing order of envisioned work):

  1. Software development: Develop a core Python package for scalable 3D (2D space + 1D time) geospatial analysis, building on GTSA.
  2. Data science method: Practice the use of spatiotemporal prediction methods, in particular Gaussian Processes, for big remote sensing data.
  3. Applications: Apply to glacier elevation changes, or snow depth, or more.

Tools that will be used: Xarray, Dask, RioXarray, GPyTorch.

The problem

Observational data in Earth system science, whether ground or remote-sensing-based, is inherently sparse in space and time (e.g., point ground stations, fixed satellite footprint and revisit time). For climate variables such as glaciers and seasonal snow that have substantial seasonal and regional variabilities, it is therefore difficult to reconcile observations between sites and time periods. This limitation largely hampers estimations of past changes (e.g., glacier mass changes, seasonal snow water equivalent) and their ingestion into models for predictions.

Goals

We identify two short-term goals (doable within the Hackweek timespan):

  • Start-up the development of a package on geospatial time series analysis for 3-D space-time arrays which allows to apply existing methods in a scalable manner for georeferenced data,
  • Constrain the covariance of glacier or snow elevations to correctly understand and apply Gaussian Process regression.

And two long-term goals (extending after the Hackweek):

  • Reach a stable version of a tested, documented and open source package on geospatial time series analysis,
  • Publish a comparative study on the performance of spatio-temporal fitting methods (parametric, non-parametric, physically-informed) for surface elevation.

Background on proposed methods

Gaussian Processes are a promising avenue in non-parametric statistical modelling as, by learning the data covariance structure, they can provide a "best-unbiased estimator" for a specific problem using only the data itself. Gaussian Processes have the significant advantage of being independent of any physical assumptions (as in physically-based modelling) or parametrization (for other types of statistical modelling). Moreover, by learning the data covariance, Gaussian Process methods generally have the ability to predict reliable errors along their mean estimates.

There is a lot of overlap between Gaussian Processes and geostatistics, as simple kriging is essentially another name for the same concept as Gaussian Processes. However, the generalization brought by Gaussian Processes to other fields has accelerated related research, in particular in terms of computational efficiency. With this aspect in mind, Gaussian Processes are now better adapted to the application of big data problems.

Background on proposed tools

Based on the above, for computational efficiency, we would utilize Gaussian Processes packages. For scaling, it is best to compute on the GPU, which is integrated in GPyTorch. In order to perform out-of-memory computations on large georeferenced datasets, we would combine Xarray, Dask and RioXarray.

To this end, we aim to use and build upon the existing toolset in the Geospatial Time Series Analysis (GTSA) package: https://github.com/friedrichknuth/gtsa.

Data

Analysis ready dataset of:

  1. Historical (~1930s-1990s) photogrammetric DEMs in CONUS stacked as zarr file and chunked along time dimension,
  2. Modern (2000s-2020s) ASTER and WorldView DEMs worldwide stacked as zarr file and chunked along time dimension.
  3. Glacier outlines shapefiles.

Available via AWS S3.

Additional resources or background reading

Reading and learning:

Video:

Code examples:

Publications:

Tasks

In construction