Skip to content
Andrew Hicks edited this page May 19, 2017 · 3 revisions

Rationale

The Open Data Cube library uses xarray to wrap Earth Observation satellite data from multiple files into a single array (mostly on spatial and temporal dimensions) The xarray.Dataset and DataArray objects used in the project are monkey-patched to have non-standard properties:

  • affine (affine.Affine object)
  • extent (datacube.utils.geometry.Geometry object representing polygon of the corners)
  • geobox (datacube.utils.geometry.Geobox object)

These are lazily computed using the crs attr:

  • crs (datacube.utils.geometry.CRS object, wrapper around osr.SpatialReference) (This causes problems when saving, as it is an object)

There is a desire to make these standard, easier to interact with, easier to create, etc

Existing work

Most of these ideas are superseded by a rasterio backend for xarray

Proposal

Interface

Register an xarray accessor: http://xarray.pydata.org/en/stable/internals.html#extending-xarray

  • geo (too generic, possible candidate for aospy or some other package)
  • geospatial (too long?
  • crs (collision with attrs and data_vars)
  • ???

CRS support

Options are:

NetCDF-CF

NetCDF is the basis for the structure of xarray, but the standard way to store a CRS defined by NetCDF-CF is to use the attrs of an empty data variable. We could keep this and promote it to a dimensionless coordinate.

Rasterio

Rasterio follows pyproj and uses PROJ.4 syntax in dict form as its native CRS syntax. If you want a WKT representation of the CRS, see the CRS class’s wkt attribute. This seems much more sensible!

Properties

Methods

  • reproject(proj, scale)
  • scale(scale, resampling) / resample()
  • add_labels() need better name adds coordinate
  • mask(shape) rasterize the shape and apply to the array
  • other rasterio style features

Note from rasterio (perrygeo commented on Mar 18, 2016): https://github.com/pydata/xarray/issues/790#issuecomment-198343073

re: projecting coordinates to lat-lng. If you consider the raster cells as independent points, you can project them independently but they will likely not be regularly spaced. With few exceptions, if you need to maintain a regular grid, transforming data between projections will alter the shape of the array and require resampling (GDAL and rasterio call the process "warping" to reflect this). There are decisions and tradeoffs to be considered with the various resampling methods, selecting new extents and cell sizes, etc so it's typically not something you want to do on-the-fly for analyses.

I think keeping the xarray coordinates as generic cartesian x-y makes sense, at least initially. Even in many GIS tools, analysis is done on a naive 2D plane and it's assumed that the inputs are of the same projection. I'd recommend doing any reprojection outside of xarray as a pre-processing step (with e.g. gdalwarp or rio warp).

Plotting

Dask

  • Optionally create chunks on GeoTIFF internal blocks, NetCDF chunks etc?
  • dask versions of the above functions, where possible...
  • windowed reproject on nearest resampling

Input

  • NetCDF
  • OpenDAP
  • GeoTIFF
  • GDAL/rasterio object
  • Adding geospatial information to an existing xarray
  • WCS / WCS2.0-EO

Output

  • standard use of xarray.Dataset.to_netcdf()
  • NetCDF-CF compliant file
  • GeoTIFF file
Clone this wiki locally