-
Notifications
You must be signed in to change notification settings - Fork 1
Home
The Open Data Cube library uses xarray to wrap Earth Observation satellite data from multiple files into a single array (mostly on spatial and temporal dimensions)
The xarray.Dataset
and DataArray
objects used in the project are monkey-patched to have non-standard properties:
- affine (
affine.Affine
object) - extent (
datacube.utils.geometry.Geometry
object representing polygon of the corners) - geobox (
datacube.utils.geometry.Geobox
object)
These are lazily computed using the crs
attr:
- crs (
datacube.utils.geometry.CRS
object, wrapper aroundosr.SpatialReference
) (This causes problems when saving, as it is an object)
There is a desire to make these standard, easier to interact with, easier to create, etc
Most of these ideas are superseded by a rasterio
backend for xarray
- https://github.com/pydata/xarray/pull/1070
- https://github.com/pydata/xarray/pull/1260
- https://anaconda.org/jhamman/rasterio_to_xarray/notebook
Register an xarray accessor: http://xarray.pydata.org/en/stable/internals.html#extending-xarray
- geo (too generic, possible candidate for aospy or some other package)
- geospatial (too long?
- crs (collision with attrs and data_vars)
- ???
Options are:
NetCDF is the basis for the structure of xarray
, but the standard way to store a CRS defined by NetCDF-CF is to use the attrs of an empty data variable.
We could keep this and promote it to a dimensionless coordinate.
Rasterio follows pyproj and uses PROJ.4 syntax in dict form as its native CRS syntax. If you want a WKT representation of the CRS, see the CRS class’s wkt attribute. This seems much more sensible!
-
affine
(using affine.Affine from https://github.com/sgillies/affine) -
extent
(using ogr.Geometry or wkt string?) -
crs
(using wkt string?)
reproject(proj, scale)
-
scale(scale, resampling)
/resample()
-
add_labels()
need better name adds coordinate -
mask(shape)
rasterize the shape and apply to the array - other rasterio style features
Note from rasterio (perrygeo commented on Mar 18, 2016): https://github.com/pydata/xarray/issues/790#issuecomment-198343073
re: projecting coordinates to lat-lng. If you consider the raster cells as independent points, you can project them independently but they will likely not be regularly spaced. With few exceptions, if you need to maintain a regular grid, transforming data between projections will alter the shape of the array and require resampling (GDAL and rasterio call the process "warping" to reflect this). There are decisions and tradeoffs to be considered with the various resampling methods, selecting new extents and cell sizes, etc so it's typically not something you want to do on-the-fly for analyses.
I think keeping the xarray coordinates as generic cartesian x-y makes sense, at least initially. Even in many GIS tools, analysis is done on a naive 2D plane and it's assumed that the inputs are of the same projection. I'd recommend doing any reprojection outside of xarray as a pre-processing step (with e.g. gdalwarp or rio warp).
- Optionally create chunks on GeoTIFF internal blocks, NetCDF chunks etc?
- dask versions of the above functions, where possible...
- windowed reproject on
nearest
resampling
- NetCDF
- OpenDAP
- GeoTIFF
- GDAL/rasterio object
- Adding geospatial information to an existing xarray
- WCS / WCS2.0-EO
- standard use of
xarray.Dataset.to_netcdf()
- NetCDF-CF compliant file
- GeoTIFF file