Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Update rasterio backend to store CRS/nodata information in standard locations. #2308

Closed
snowman2 opened this issue Jul 24, 2018 · 10 comments

Comments

@snowman2
Copy link
Contributor

snowman2 commented Jul 24, 2018

Problem description

Currently the way data is stored in the dataaray when using xarray.open_rasterio the crs and nodata information is stored in an attributes. It would be nice to be able to have them stored in standard locations so that other tools (rasterio, QGIS, GDAL) can find the information properly after dumping to a file with to_netcdf().

Proposed solutions

The nodata should be loaded into _FillValue

I propose that the CRS information be stored using the CF spatial_ref convention as it is supported by the main open source GIS tools. To do so, you add the crs coordinate to the dataset/dataarray. And then, you add the spatial_ref attribute to the crs which is stored as a crs WKT string. Next, you add the grid_mapping attribute to all associated variables that contains the coordinate name crs as the grid_mapping.

Here is an example of how it would look on a dataset:

<xarray.Dataset>
Dimensions:      (x: 65, y: 31)
Coordinates:
  * x            (x) float64 ...
  * y            (y) float64  ...
    time         datetime64[ns] ...
    crs          int64 ...
Data variables:
    ndvi         (y, x) float64 ...
Attributes:

Here is how the crs or spatial_ref coodinate variable would look:

<xarray.DataArray 'crs' ()>
array(0)
Coordinates:
    time         datetime64[ns] ...
    crs          int64 0
Attributes:
    spatial_ref:  PROJCS["UTM Zone 15, Northern Hemisphere",GEOGCS["WGS 84",D...

And here is how it would look on the variables:

<xarray.DataArray 'ndvi' (y: 31, x: 65)>
array([[ ...]])
Coordinates:
  * x            (x) float64 ...
  * y            (y) float64 ...
    time         datetime64[ns] ...
    crs          int64 0
Attributes:
    grid_mapping:  crs

More information about this is in #2288.

@djhoese
Copy link
Contributor

djhoese commented Jul 24, 2018

@snowman2 This should mean that open_dataset should handle CRS specially too, right? Currently it doesn't seem to do anything special for the coordinate variable pointed to by grid_mapping.

@snowman2
Copy link
Contributor Author

I don't think so. The crs should already exist on the dataset beforehand and I think it may be a bit too much to expect it to add it to the dataset in that function as there are some that don't have it and not all xarray datasets are geospatially located.

@djhoese
Copy link
Contributor

djhoese commented Jul 24, 2018

I wouldn't expect it to add crs if there wasn't a grid_mapping specified, but if it was then I would. In a simple test where I did xr.open_dataset('my_nc.nc') which has a grid_mapping attribute, xarray does nothing special to create a crs or other named coordinate variable referencing the associated grid_mapping variable.

@snowman2
Copy link
Contributor Author

That is strange. Was the variable put in data_vars and not in coords?

@djhoese
Copy link
Contributor

djhoese commented Jul 25, 2018

This is the output of using xarray with a standard GOES-16 ABI L1B data file:

In [2]: import xarray as xr

In [3]: nc = xr.open_dataset('OR_ABI-L1b-RadF-M3C01_G16_s20181741200454_e20181741211221_c20181741211264.nc')

In [4]: nc.data_vars['Rad']
Out[4]: 
<xarray.DataArray 'Rad' (y: 10848, x: 10848)>
array([[nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       ...,
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan]], dtype=float32)
Coordinates:
    t        datetime64[ns] ...
  * y        (y) float32 0.151858 0.15183 0.151802 0.151774 0.151746 ...
  * x        (x) float32 -0.151858 -0.15183 -0.151802 -0.151774 -0.151746 ...
    y_image  float32 ...
    x_image  float32 ...
Attributes:
    long_name:              ABI L1b Radiances
    standard_name:          toa_outgoing_radiance_per_unit_wavelength
    sensor_band_bit_depth:  10
    valid_range:            [   0 1022]
    units:                  W m-2 sr-1 um-1
    resolution:             y: 0.000028 rad x: 0.000028 rad
    grid_mapping:           goes_imager_projection
    cell_methods:           t: point area: point
    ancillary_variables:    DQF

In [5]: nc.data_vars['goes_imager_projection']
Out[5]: 
<xarray.DataArray 'goes_imager_projection' ()>
array(-2147483647, dtype=int32)
Coordinates:
    t        datetime64[ns] ...
    y_image  float32 ...
    x_image  float32 ...
Attributes:
    long_name:                       GOES-R ABI fixed grid projection
    grid_mapping_name:               geostationary
    perspective_point_height:        35786023.0
    semi_major_axis:                 6378137.0
    semi_minor_axis:                 6356752.31414
    inverse_flattening:              298.2572221
    latitude_of_projection_origin:   0.0
    longitude_of_projection_origin:  -75.0
    sweep_angle_axis:                x

@fmaussion
Copy link
Member

Again, I'm yet to be convinced that this logic should live in xarray, regardless if netCDF or rasterio is used as a backend. The job of xarray is to read the rasterio files in a "xarray way" (e.g. in order to leverage dask) and exposing the file attributes to the user, ideally with semantics/attribute names which are as close as possible as the underlying data model (rasterio).

All the logic you describe could live in a dedicated library which would become a wrapper around xarray's open_* functions, and converting them to the format you describe.

@djhoese
Copy link
Contributor

djhoese commented Jul 25, 2018

@fmaussion I completely agree except now that all of this is being brought up I see why it may have been better to put the 'crs' in the coordinates of the DataArray returned by open_rasterio. Since two DataArrays in different projections are not and should not be considered "mergeable". But I can also see how this walks the line of special handling of a data format by trying to interpret things in a certain way, but that line is already pretty blurry in the case of reading rasterio compatible datasets.

@snowman2
Copy link
Contributor Author

@fmaussion, I guess the main reason for the proposed change is to be able to read in the raster using open_rasterio() and be able to generate a netCDF using to_netcdf() that would be compatible with the rasterio model when using rasterio.open().

Also, since: "xarray.Dataset is an in-memory representation of a netCDF file", to me it makes more sense to store the data in a standard netCDF location that is consistent regardless of the backend. In doing so, other parts of xarray and libraries build off of xarray won't be required to find the same data in multiple locations.

Thanks for taking the time to consider this idea. I have used it quite a bit and it has made the datasets compatible across the various GIS tools I use (rasterio, GDAL, QGIS). It has definitely made my life easier and so I wanted to share what I have learned. But, it is just an idea. I really appreciate the work of the xarray developers and whatever y'all decide is fine with me.

@fmaussion
Copy link
Member

I'm going to leave this decision to @shoyer and @jhamman !

If we go forward with this, we have a couple of possibilities:

  1. add crs to a coordinate variable and keep the attribute too (not so nice because doubled)
  2. add crs to a coordinate variable and deprecate the crs attribute

Same with _FillValue.

@snowman2
Copy link
Contributor Author

snowman2 commented Oct 9, 2018

Feel free to re-open if this becomes something of interest. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants