Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Use the recommended method for xarray extensions. #519

Closed
snowman2 opened this issue Jul 23, 2018 · 17 comments
Closed

Proposal: Use the recommended method for xarray extensions. #519

snowman2 opened this issue Jul 23, 2018 · 17 comments

Comments

@snowman2
Copy link
Contributor

snowman2 commented Jul 23, 2018

It would be nice for the datacube project to use the recommended method for extending xarray as issues may arise if overriding existing methods when users use open datacube as it will also impact how xarray behaves standalone.

This is currently used here.

This will prevent potential conflicts that may arise in the future if more methods/properties are added and will ensure that it will not impact the xarray module itself.

The xarray extension could be called odc.

So instead of:

xds.geobox

It would be:

xds.odc.geobox

Thoughts?

@snowman2 snowman2 changed the title Proposal: Use the official method for xarray extensions. Proposal: Use the recommended method for xarray extensions. Jul 23, 2018
@alfredoahds
Copy link

This is another proposal stemming from some of our internal use cases. We've got a few helper methods that give users access to additional metadata and provide a few shortcuts that help with subsetting/serialization. Using an xarray extension would allow for a pretty simple pathway to adding new features without having to patch the Dataset/DataArray classes themselves.

Additionally, some of the geo_xarray functionality could probably find a home there as well.

@Kirill888
Copy link
Member

Thanks @snowman2 and @alfredoahds this looks like a good improvement suggestion over existing solution and opens a door for more convenience methods. I'm on board with this proposal.

only possible concern is "how long xarray library had this interface?". But even if it's relatively recent feature it's worthwhile having, even if it means requiring more recent xarray.

@alfredoahds
Copy link

I'm fairly certain the accessor registration functionality was introduced in v0.9.0 just looking at the history on the xarray.core.extensions module. I'll verify that before proceeding, but it looks like that should work just fine with the pin you all have set in the requirements/setup.py. Looking forward to submitting a PR!

@andrewdhicks
Copy link
Contributor

There are a few other groups that also use crs and affine that we want to be compatible with.
There's a discussion going on here: pydata/xarray#2288
If possible, we should try to use whatever comes out of this and have as little odc-specific stiff as possible.

We added the property monkey-patching just before xarray released the accessor registration functionality, which was some time ago.

I've been wanting to move out the geo_xarray stuff for a while, but I was hoping the xarray.open_rasterio method would introduce a bit of consistency to the geo-side of things. I didn't get much further than writing it up here: https://github.com/andrewdhicks/geo-xarray/wiki

@Kirill888
Copy link
Member

Agree with @andrewdhicks that we should try to keep crs/affine compatible with whatever other libraries decide to use eventually. I still think that having odc.* extension for other user convenience methods is still useful. Think methods like export_cog or to_geojson or to_rgb, things like that, although not sure how much better ds.odc.to_rgb() compared to datacube.{some_ns}.to_rgb(ds)

@snowman2
Copy link
Contributor Author

snowman2 commented Jul 24, 2018

If you want your crs to be compatible with GDAL, QGIS, and rasterio, you need to store your CRS in a specific way. I posted it in pydata/xarray#2288. I would recommend doing that and then having a property/method ds.odc.crs that pulls it out into a rasterio.crs.CRS. Also, if you have the data stored properly, GDAL & rasterio automatically determine the affine/transform from the coordinates.

@snowman2
Copy link
Contributor Author

@andrewdhicks I like what you have for ideas for porting geo_xarray.py to a standalone library geoxarray. I should have looked at that sooner. Is this something that the datacube developers are planning on moving into its own package inside the opendatacube project?

@andrewdhicks
Copy link
Contributor

I was thinking it would be a new repo under the opendatacube github org. It got put on the backburner while we waited for the rasterio features in xarray to become stable (that wiki page is over a year old) but it looks like it might be time to pick it up again. I'm hoping to find some time in the next month, but that might get overtaken by the work the pyresample folks want to do over on xarray#2288.

Would you be looking to use the library in pangea, and would it have the same sort of requirements as odc?

I hadn't realised this list had gotten so big:
http://xarray.pydata.org/en/latest/faq.html#what-other-projects-leverage-xarray

@snowman2
Copy link
Contributor Author

snowman2 commented Jul 27, 2018

That would be really great if you could create the new repo/project. I think that would solve most of the odc proposal mentioned in this issue.

As a side note, I have taken it, moved it to an xarray extension, and used it in our own library internally to support users of the opendatacube plugin we have. I have also added features/bugfixes to it internally, and can share pieces of it (after following the correct internal procedures) if you decide to put it up.

Also, I had some thoughts this morning about potential names for it. Do you think it would make sense to keep the functionality limited to a rasterio-like schema? I was thinking that doing so could set the scope and help make what it does more clear. If so, it could be called something like rioxarray and the xarray accessor could be rio.

@snowman2
Copy link
Contributor Author

snowman2 commented Apr 16, 2019

rioxarray is now available: https://corteva.github.io/rioxarray/

@andrewdhicks
Copy link
Contributor

Wow @snowman2 this is great!
@uchchwhash - you should have a look at this

@snowman2
Copy link
Contributor Author

snowman2 commented Oct 4, 2019

Is there interest to use rioxarray in datacube?

@omad
Copy link
Member

omad commented Oct 15, 2019

@snowman2 Definitely! But, it's still in the backlog with a lot of other internal changes we want to make. The core developers have been doing a lot more application and product delivery work lately, not so much time for internals.

@snowman2
Copy link
Contributor Author

Sounds good. Would it be useful for me to have a go at integrating it into the code?

@omad
Copy link
Member

omad commented Oct 15, 2019

Thanks for the offer! I think we need more detailed conversation with @Kirill888 , @uchchwhash and @alexgleith . To make sure we're all on the same page and not doubling up or wasting effort. I'll try and sound out availability.

@snowman2
Copy link
Contributor Author

Sounds like a good plan to me 👍

@stale
Copy link

stale bot commented Aug 8, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants