Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestions for ShapelySTRTreeIndex #3

Closed
benbovy opened this issue Nov 15, 2022 · 6 comments · Fixed by #7
Closed

Suggestions for ShapelySTRTreeIndex #3

benbovy opened this issue Nov 15, 2022 · 6 comments · Fixed by #7

Comments

@benbovy
Copy link
Member

benbovy commented Nov 15, 2022

Here are a few suggestions for ShapelySTRTreeIndex:

Automatically create a new index from selected geometries

Currently, calling .sel() or .isel() with a coordinate or dimension baked by a ShapelySTRTreeIndex won't add a new index in the resulting Dataset / DataArray. Users have to manually call .set_xindex() again if they want to further narrow the selection.

We should probably create a new index instance automatically for convenience. This is supported by the Xarray Index API.

Lazy index

Building the R-Tree may be expensive. If we allow automatic creation of new index instances like suggested above, it is probably a good idea to defer the R-Tree build until it is actually needed. If I understand correctly, Geopandas follows a similar approach?

Lazy or direct build could be controlled by an __init__ option. When setting the index explicitly via .set_xindex(), deferring the build is probably not what users want.

Support arbitrary dimensions

This may not be high-priority, but it might be useful to support coordinates with an arbitrary number of dimensions. A few use-cases are detailed here.

I'm not sure how this would work for "orthogonal" indexing (i.e., each dimension indexed separately) since the selection result may be a sparse n-d coordinate, but for "point-wise" advanced indexing (see docs) this could work pretty well. Here is an example of implementation for converting selected indices returned from a flat index back to the original coordinate shape.

Other index operations (alignment, roll, etc.) may be hard to support with n-d coordinates.

Support slice objects passed to sel?

Inspired from GeoSeries.cx. Here it would looks like ds.sel(geom_coord=(xslice, yslice)). That said, using a shapely.box works already great so slices may not add much value here.

Spatial join and Xarray alignment

Should we leverage the full capabilities of ShapelySTRTreeIndex for automatic alignment of Xarray objects with geometry coordinates? It may be tricky:

  • xarray.align() exposes a join option but doesn't support arbitrary options. It would be hard to add such support as this function is used extensively in Xarray internals for many operations. We would need to find other ways of passing predicate to the underlying shapely.STRtree.query() (e.g., index __init__ option? context manager?)
  • Xarray alignment is a complex procedure that relies on Index.join() and Index.reindex_like() and where those two methods are called at different stages of the procedure. Not sure how we could avoid two repetitive calls to shapely.STRtree.query().

Probably it is a bad idea and it'd be better that:

  • ShapelySTRTreeIndex only supports exact alignment
  • we support spatial join via a distinct API, like in GeoPandas
@benbovy
Copy link
Member Author

benbovy commented Nov 15, 2022

ShapelySTRTreeIndex only supports exact alignment

"Exact alignment" is confusing, it would either mean:

  1. two indexes are exactly equal (all geometries perfectly match)
  2. perform join (inner, outer, etc.) by looking at which geometries exactly match and which geometries don't

Does shapely provide ways to do 1 and/or 2 efficiently? Is it possible to create and use a "classic" pandas.Index from an array of shapely geometries?

@martinfleis
Copy link
Member

Thanks for this!

Automatically create a new index from selected geometries

+1 on that

Lazy index

We can potentially move the tree creation to sel before there is any query but with shapely 2.0, the tree creation is no longer that expensive as it used to be with the rtree package. But it is an easy change so it is worth it.

Support arbitrary dimensions

I think it worth exploring but I'll need to spend a bit more time with this to wrap my mind around the idea and implementation details. It is certainly not a top priority now.

Support slice objects passed to sel?

Geopandas takes the bounds passed to cx and creates the box itself, so this would practically just wrap the current solution into a more convenient solution. There's just a bit overhead on computation of total_bounds if you don't pass all 4 coords.

Spatial join and Xarray alignment

On this I am not sure, especially because I don't have a mental model of how xarray's join works under the hood.

  • two indexes are exactly equal (all geometries perfectly match)
  • perform join (inner, outer, etc.) by looking at which geometries exactly match and which geometries don't

Does shapely provide ways to do 1 and/or 2 efficiently? Is it possible to create and use a "classic" pandas.Index from an array of shapely geometries?

A perfect match can be tested using equals or equals_exact following the STRTree query but there are edge cases where this is not super reliable due to floating points and other issues.

pandas Index can be created and in the example notebook it is a pandas index what I did before you created the custom one. It just has a bit limited functionality.

One other thing I would add is to support query-based selection for arrays (already have it on a local branch, will push later).

@benbovy
Copy link
Member Author

benbovy commented Nov 16, 2022

pandas Index can be created and in the example notebook it is a pandas index what I did before you created the custom one. It just has a bit limited functionality.

That would be already useful.

Here is a broader proposal: instead of a ShapelySTRTreeIndex why not provide an SpatialVectorIndex (or any better name) that would encapsulate:

  • an xarray.indexes.PandasIndex to provide all the basic functionality already offered by pandas indexes in Xarray (alignment, etc.) and that could be used for “strict” selection of geometries (exact matches look-up)
  • a shapely.STRtree for advanced geometry-based selection (nearest / query) via .sel and spatial join via custom API (accessor)
  • a pyproj.CRS (Support re-projecting of coordinates #5) for CRS-aware alignment, re-projection, etc.

Assuming that it is reasonably cheap to create both a shapely tree (lazy) and a pandas index from an array of geometries…

One other thing I would add is to support query-based selection for arrays

Ah interesting! I'm curious on how does the resulting DataArray / Dataset look like.

@martinfleis
Copy link
Member

Here is a broader proposal: instead of a ShapelySTRTreeIndex why not provide an SpatialVectorIndex

That sounds good!

I assume it will be relatively cheap to create both. I guess that if we subclass xarray.indexes.PandasIndex and override only what requires special spatial approach, it will work.

I'm curious on how does the resulting DataArray / Dataset look like.

see #4

@benbovy
Copy link
Member Author

benbovy commented Nov 17, 2022

@martinfleis I'm happy to give it a shot, unless you are already on it?

@martinfleis
Copy link
Member

Go for it!

@benbovy benbovy mentioned this issue Nov 18, 2022
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants