-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CRS/projection information to xarray objects #2288
Comments
This would also be very helpful for geoviews. |
This has been discussed from time to time, although not in such an extensive way: thanks! For me, the long list of discussion points that you mention is a clear argument for a dedicated package which will have to solve all these issues. A good example of difficulty to tackle is the fact that geo libraries often do not use the same standards (see the incompatibility between Regarding the xarray side: I also brought this up for my own geo-lib a while ago, and it seems that the cleanest solution to carry a |
@fmaussion Note that I am the one who started the PROJ.4 CRS in cartopy pull request (SciTools/cartopy#1023) and that it was this work that I copied to pyresample for my own pyresample work since I didn't want to wait for everything to be flushed out in cartopy. You can see an example of the It's also these cartopy CRS issues that make me think that Cartopy CRS objects aren't the right solution for this type of logic as a "how to represent CRS objects". In my experience (see: my cartopy PR 😉) and watching and talking with people at SciPy 2018 is that multiple projects have work arounds for passing their CRS/projection information to cartopy. In my biased experience/opinion PROJ.4 is or can be used in quite a few libraries/fields. If PROJ.4 or something that accepts PROJ.4 isn't used then we might as well come up with a new standard way of defining projections...just kidding. Side note: FYI the geotiff format does not currently accept the sweep axis parameter |
Also I should add the |
geopandas would be a great template for a geoxarray package ;-) |
Correct me if I'm wrong, but from the xarray side it would already be enough if there is a way in xarray to have a special The solution currently is to store What could also be a possibility (without knowing how the internals would look like) is to have a registry of "special attributes" names which would always be preserved by xarray's operations. This registry would live in xarray and can be updated by downstream libraries and/or accessors. (xref: #1614 ). [1] : my preference goes for a simple PROJ4 string |
@fmaussion I guess you're right. And that set of attributes to keep during certain operations would be very nice in my satpy library. We currently have to do a lot of special handling of that. The one thing that a crs coordinate (PROJ.4 dict or str) doesn't handle is specifying what other coordinates define the X/Y projection coordinates. This logic also helps with non-uniform datasets where a longitude and latitude coordinate are needed. Of course, a downstream library could just define some type of standard for this. However, there are edge cases where I think the default handling of these coordinates by xarray would be bad. For example, satpy doesn't currently use
But I guess that is intended behavior and if the |
I think it would make more sense to think about using multiple |
I've thought about this a little more and I agree with @fmaussion that this doesn't need to be added to xarray. I think if "we", developers who work with projected datasets, can agree that "crs" in an xarray objects coordinates is a PROJ.4 string then that's half the battle of passing them between libraries. If not a PROJ.4 string, other ideas (dict?)? I initially had the idea to start a new One thing that just came to mind while typing this that is another difficulty is that there will still be the need to have an object like pyresample's When I started typing this I thought I had it all laid out in my head, not anymore. 😢 |
I am really excited about this discussion. I know of other libraries that have done the same thing and have written internal libraries myself. If possible, I would hope that we could follow the CF convention on this as it makes the output netCDF file compatible with QGIS, GDAL, and rasterio when written using To do so, you add the
And then, you add the Next, you add the See an example here. After that, you could store all kinds of information inside the |
@snowman2 I thought about that too, but here are the reasons I came up with for why this might not be the best idea:
The result of this github issue should either be a new package that solves all (90+%) of these topics or an easy to implement, easy to use, geolocation description best practice so that libraries can more easily communicate. I think with the CF standard CRS object we would definitely need a new library to provide all the utilities for converting to and from various things. Lastly, I don't know if I trust CF to be the one source of truth for stuff like this. If I've missed some other obvious benefits of this or if working with WKT or the CF standard CRS attributes isn't actually that complicated let me know. |
Here is an example of how it would look on a dataset:
Here is how the
And here is how it would look on the variables:
@djhoese Whether or not we use the CF convention is not what I am concerned about. What I think would benefit the most people is with the file format to be able to do Another benefit is that it keeps the Also, as a side note if you use the center pixel coordinates, then GDAL, rasterio, and QGIS are able to read in the file and determine it's affine/transform without a problem. For the new library, if you have a For example, using the recommended method to extend xarray, you could add a crs property: from rasterio.crs import CRS
........
@property
def crs(self):
""":obj:`rasterio.crs.CRS`:
Projection from `xarray.DataArray`
"""
if self._crs is not None:
return self._crs
try:
# look in grid_mapping
self._crs = CRS.from_string(self._obj.coords[self._obj.grid_mapping].spatial_ref)
except AttributeError:
raise ValueError("Spatial reference not found.")
return self._crs And if you call your extension ds.geo.crs To get proj.4 string: ds.geo.crs.to_string() To get WKT string: ds.geo.crs.wkt To get EPSG code: ds.geo.crs.to_epsg() |
@snowman2 Awesome. Thanks for the info, this is really good stuff to know. In your own projects and use of raster-like data, do you ever deal with non-uniform/non-projected data? How do you prefer to handle/store individual lon/lat values for each pixel? Also it looks like xarray would have to be updated to add the "crs" coordinate since currently it is not considered a coordinate variable. So a new library may need to have custom to_netcdf/open_dataset methods, right? It kind of seems like a new library may be needed for this although I was hoping to avoid it. All of the conversions we've talked about could be really useful to a lot of people. I'm not aware of an existing library that handles these conversions as one of its main purposes and they always end up as a "nice utility" that helps the library as a whole. It seems like a library to solve this issue should be able to do the following:
Beyond reading/writing NetCDF and geotiff files I would be worried that this new library could easily suffer from major scope creep. Especially since this is one of the main purposes of the satpy library, even if it is dedicated to satellite imagery right now. @snowman2 I'm guessing the data cube project has similar use cases. If the reading/writing is limited to a specific set of formats then I could see pyresample being a playground for this type of functionality. The main reason for a playground versus a new from-scratch package would be the use of existing utilities in pyresample assuming resampling is a major feature of this new specification. Yet another braindump...complete. |
I have dealt with non-uniform data in the geographic projection. I have found it easiest to deal with it if you can determine the original projection and project the coordinates back to that projection so it is uniform. But, I am by no means an expert in this arena. Most if the time I work "normal" data.
rasterio/GDAL/QGIS all seem to use the centroid.
Actually, it is not difficult to add as it stands: ds.coords['crs'] = 0
ds.coords['crs'].attrs = dict(spatial_ref="PROJCS["UTM Zone 15, Northern Hemisphere",GEOGCS["WGS 84",D...") But, if a Example: ds.geo.set_crs("+init=epsg:4326") I think that minor modifications will be needed once the crs is set properly on the xarray dataset. Because after that, the I could see the first pass of the extension/library simply performing:
|
Regarding non-uniform datasets, I think we have a small misunderstanding. I'm talking about things like data from polar-orbiting satellites where the original data is only geolocated by longitude/latitude values per pixel and the spacing between these pixels is not uniform so you need every original longitude and latitude coordinate to properly geolocate the data (data, longitude, and latitude arrays all have the same shape). When it comes to the topics in this issue this is an problem because you would expect the lat/lon arrays to be set as coordinates but if you are dealing with dask arrays that means that these values are now fully computed (correct me if I'm wrong). For your example of adding a In your example of methods is |
That is interesting, I am definitely not an expert with non-uniform datasets. From the satellite datasets I have used, the 2D latitude and longitude coordinates are stored in the datasets and are not super useful. I usually have to use other ways to recreate the grid coordinates in the original projection (ex. SMAP uses the EASE Grid 2.0 but it stores the latitude/longitude of the points in the file) or reproject & flatten the coordinates. I have had to do this with weather data and made an xarray extension pangaea to handle it. So, that is what I was referring to when I misunderstood your question.
The files I have created have the The CF stuff is supported by rasterio, GDAL, QGIS and that is why I like it. If there is another way that is as well supported, I am not opposed to it.
The |
Ok so the netcdf files that you have created and are reading with This means that to properly associate a CRS with a DataArray/Dataset this new library would require its own version of |
It is not in the dimension, it is the coordinate attribute in the variable. That is handled automatically by xarray when writing From the ncdump:
It would definitely be a good idea to ensure that the |
I was talking about Also note that having the |
The example I gave was just demonstrating that the dimension is not required for the I agree with the functionality that would support standardizing the This all sounds like it is heading in a good direction. 👍 |
I was talking with @dopplershift the other day on gitter and he brought up a very important point: no matter how CRS information is represented the user should be able to access the individual parameters (reference longitude, datum, etc). This lead me to think that a new CRS class is probably needed, even though I wanted to avoid it, because it would likely be one of the easiest ways to provide access to the individual parameters. There are already cartopy CRS objects that IMO are difficult to create and rasterio CRS objects that require gdal which is a pretty huge dependency to require users to install just to describe their data. That said, I think no matter how it is coded I don't want to duplicate all the work that has been done in rasterio/gdal for handling WKT and converting between different CRS formats. The other thing I've been pondering during idle brain time is: is it better for this library to require an xarray object to have projection information described in one and only one way (a CRS object instance for example) or does the xarray accessor handling multiple forms of this projection information. Does having a CRS object in |
Lots of good thoughts there. I think a lot depends on who you plan on having for a user base. I like My preference to have the CRS object something created/retrieved by the accessor based on information in the file. If it is not, users will have to remove the CRS object when using |
For the user base I think if we can cover as many groups as possible that would be best. I know there are plenty of people who need to describe CRS information in their data, but don't use geotiffs and therefore don't really need rasterio/gdal. The group I immediately thought of was the metpy group which is why I talked to @dopplershift in the first place. The immediate need for this group (based on his scipy talk) will be people reading NetCDF files and putting the data on a cartopy plot. I think @dopplershift and I agreed that when it comes problems building/distributing software dealing with this type of data the cause is almost always gdal/libgdal. I'm in favor of making it optional if possible. For the I just did a search for "geoxarray" on github and @wy2136's repositories came up where they are importing a |
Sorry for the confusion from the |
@wy2136 Very cool. We have the ability in satpy (via pyresample) to create cartopy CRS objects and therefore cartopy plots from our xarray DataArray objects: https://github.com/pytroll/pytroll-examples/blob/master/satpy/Cartopy%20Plot.ipynb It would be nice if we could work together in the future since it looks like you do a lot of the same stuff. When I make an official "geoxarray" library I think I'm going to make a "geo" accessor for some of these same operations (see above conversations). |
Also, please let me know about your ideas for features you would like to see in a crs library, either here or raise as an issue. Creating library specific objects might be a good idea, if you want to give some use cases and reasonings. In what ways are library soecific crs objects difficult to create? Don’t they provide shortcuts for creating from proj4 strings? |
@karimbahgat Thanks for the info and questions. As for xarray, it is a generic container format (array + dimensions + coordinates for those dimensions + attributes) but resembles the format of data stored in netcdf files. It can technically hold any N-dimensional data. This issue in particular is what is a good "standard" way for multiple libraries to represent CRS information in xarray's objects. I think the lack of documentation is pycrs is my biggest hurdle right now as I don't know how I'm supposed to use the library, but I want to. It may also be that my use cases for CRS information are different than yours, but the structure of the package is not intuitive to me. But again a simple example of passing a PROJ.4 string to something and getting a CRS object would solve all that. I'll make some issues on pycrs when I get a chance (add travis/appveyor tests, add documentation, base classes for certain things, etc). For geotiff's CRS I think with most geotiff-reading libraries you load the CRS info as a PROJ.4 string. |
Agree about the documentation, so have vastly improved the documentation now, with a full list of examples for all functionality. Hopefully that makes it clearer how to use PyCRS and if it would be a good fit for your project. Have also added doctests of all the readme examples with Travis CI for both Py2 and 3. Appreciate all the issues raised and suggestions 👍 Thinking of releasing this as the first major stable version 1.0.0 on pypi for more reliablity, once the issues have been cleared up. |
May be late on the wagon, we were playing around with xarray dataset to address some general geospatial problems. We have published the preliminary library at xgeo. I would love to collaborate together if you guys have interest. |
@Geosynopsis Cool. Your library is the third library that does something similar to what's discussed here (at least recently created ones). I'm glad there are so many people who need this functionality. The packages are: My un-started geoxarray project where I've tried to move these types of conversations (https://github.com/geoxarray/geoxarray) and rioxarray (https://github.com/corteva/rioxarray) which combines xarray and rasterio and started by @snowman2. Given what your project is trying to do maybe you could add the geopandas functionality on to rioxarray instead of a separate package? Let's discuss in an issue on rioxarray if possible, feel free to start it. |
Here is an alpha version of a CRSIndex heavily drawing on @benbovy's RasterIndex As a non-expert, I very arbitrarily chose to propagate CRS info using a so |
That's nice @dcherian 👍 |
That's great @dcherian! Some comments (notably regarding your notes in your linked notebook): A lot of boilerplate code in your But as you suggest it, it would be nice if we could also reuse the CRS-related logic with other kinds of index structures (like kd-trees). I've been thinking a bit about the general issue of flexible geospatial xarray indexes but I'm not sure yet how best it could be solved.
This should be supported with #6800 (I need to re-submit a PR targeting
Yes this should be clarified, i.e., whether the
There's some discussion in #4366 about adding a new
That's a tricky one to improve.
Maybe related to #6836 ?
I agree. The index should probably be dropped in that case (i.e., reduction of both the x and y dimensions), leaving the |
Agreed.
Yeah I don't think the PandasMetaIndex is a good pathway for CRSIndex, since we'd want other underlying tree-like structures.
Hadn't seen that. That would be great!
I actually didn't understand what I was supposed to do with
My takeaway was that on the API side, we should prioritize adding Re: reduction to scalar, I agree it seems tricky. |
I've proposed adding CRSIndex to rioxarray to experiment with propagating CRS info in existing workflows: corteva/rioxarray#588 |
For anyone interested, we continued with @scottyhq experimenting on this topic in https://github.com/benbovy/xproj. Yet another Xarray extension project on this topic :-). We expanded on @dcherian's idea of attaching the CRSIndex to a scalar, "spatial_ref"-like coordinate (#2288 (comment)), with the main difference that the CRSIndex is attached only to that coordinate, leaving other geospatial coordinates with their own index. This is just a rough proof-of-concept but hopefully it will quickly reach a state where one can easily try it together with other Xarray geospatial tools. |
Problem description
This issue is to start the discussion for a feature that would be helpful to a lot of people. It may not necessarily be best to put it in xarray, but let's figure that out. I'll try to describe things below to the best of my knowledge. I'm typically thinking of raster/image data when it comes to this stuff, but it could probably be used for GIS-like point data.
Geographic data can be projected (uniform grid) or unprojected (nonuniform). Unprojected data typically has longitude and latitude values specified per-pixel. I don't think I've ever seen non-uniform data in a projected space. Projected data can be specified by a CRS (PROJ.4), a number of pixels (shape), and extents/bbox in CRS units (xmin, ymin, xmax, ymax). This could also be specified in different ways like origin (X, Y) and pixel size. Seeing as xarray already computes all
coords
data it makes sense for extents and array shape to be used. With this information provided in an xarray object any library could check for these properties and know where to place the data on a map.So the question is: Should these properties be standardized in xarray Dataset/DataArray objects and how?
Related libraries and developers
I know @WeatherGod also showed interest on gitter.
Complications and things to consider
pyproj.Proj
object?The text was updated successfully, but these errors were encountered: