Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

time-varying geometries #78

Open
dcherian opened this issue Aug 27, 2024 · 10 comments
Open

time-varying geometries #78

dcherian opened this issue Aug 27, 2024 · 10 comments

Comments

@dcherian
Copy link
Contributor

We are seeing lots of questions about handling time-varying geometries:

  • For example, can we have varying geometries for the Great Salt Lake as its area changes over the years, along with Sentinel-2 surface reflectance of shortwave infrared bands?
  • What happens when the geometries themselves vary with time? national boundaries change; glaciers move.

Would be good to describe this use case in the docs as appropriate.

@martinfleis
Copy link
Member

I believe this means that geometry should be values of the DataArray, not coordinates, right? Like stuff described in this paper (code). If you also want reflectance at the same time, you would need a Dataset with one DataArray of reflectance and the other of geometries, with coordinates being time and either ID of a geom or a GeometryIndex but that would need to reflect only one point in time.

While this use case is not directly supported by Xvec, it is certainly within scope. And given shapely is based on numpy.ufuncs there's nothing blocking us dumping geoms to values. We'll just need to build some CRS support and related methods to get the basic functionality. For more I'd need to understand the exact use case and its needs.

cc-ing @loreabad6 in this discussion as this is something she's been working on within the R space. I will speak with @loreabad6 about this in a few weeks to ensure the same support she built in R lands in Xvec but I am not entirely sure how much work it is.

@xarray-contrib xarray-contrib deleted a comment Aug 27, 2024
@dcherian
Copy link
Contributor Author

Technically the indexes can be backed by nD variables, so in theory there's a possible extension. Of course @benbovy is the expert here.

@loreabad6
Copy link

Thanks @martinfleis for the ping, I also sent this at the webinar but this is the working repo for the current implementation in R: https://github.com/loreabad6/post
I am currently working on details for the class for tabular formats (since post aims to exchange between nested data frames and array representations) but I hope that before the SDSL I can add a couple of examples on array operations with time-variant geometries as shown in the webinar today (excellent work by the way @e-marshall!)

@martinfleis
Copy link
Member

@e-marshall will a recording be available? I didn't make it in the end due to family duties.

@martinfleis
Copy link
Member

Technically the indexes can be backed by nD variables, so in theory there's a possible extension

Do you have an example? That sounds like a complicated case to wrap a head around.

@dcherian
Copy link
Contributor Author

Many here pydata/xarray#7041 (kd tree index, raster index)

@benbovy
Copy link
Member

benbovy commented Aug 28, 2024

I mentioned a few possible use cases of nD indexed coordinate variables with a GeometryIndex some time ago here: https://discourse.pangeo.io/t/vector-data-cubes/2904/12. I'd be curious to see other examples!

I guess that one possible way to support it in xvec would be to:

  1. keep storing the indexed geometries as a 1-dimensional pandas.Index wrapped into a lightweight nD-compatible array adapter.
  2. keep track of the original nD shape of the coordinate variable in GeometryIndex.
  3. for data selection, one could use numpy.unravel_index to compute the indices along each dimension of the nD coordinate from the selected flat indices.

Implementing 1 and 2 would need a little bit of work around Xarray's PandasIndex. Something like this might work:

  • GeometryIndex.from_variables() should pass a flattened coordinate variable to PandasIndex.from_variables()
  • GeometryIndex.create_variables() should encapsulate the pandas.Index into a wrapper almost identical to PandasIndexingAdapter but that keeps track of the coordinate's original nD shape (note: we could support that in Xarray to avoid duplicating wrapper classes, if that makes sense). We also need to return an xarray.Variable object here since xarray.IndexVariable is still limited to the 1-dimensional case and needs to be refactored (and I think it is pretty safe now to use a regular Variable for an indexed coordinate).

Regarding 3, this is the approach used in Xoak (see here). Xoak doesn't use Xarray indexes yet but the logic remains the same and could probably be implemented in GeometryIndex.

Note that I haven't looked into the other features provided by GeometryIndex (e.g., query) so there may be complications in the n-dimensional case.

Also from what I understand the serialization of nD geometry variables using CF-conventions may be tricky, although this is not an issue specific to indexes.

@e-marshall
Copy link

e-marshall commented Aug 28, 2024

@e-marshall will a recording be available? I didn't make it in the end due to family duties.

@martinfleis I believe it will be, but I'm not sure exactly when. I'll add a link to this thread when it is available! Here is the link

@benbovy
Copy link
Member

benbovy commented Sep 18, 2024

Inspiring talk @loreabad6 at the SDSL workshop today! It would be nice to have some of the Post features implemented here as well.

A tentative API:

dataset.xvec.summarise_geometry(
    coord_name=(var_name, summary_dim, summary_func)
)

where summarise_geometry is a convenient method that:

  1. takes an input n-dimensional variable var_name, which must be present in the dataset, contain shapely geometries and have at least the summary_dim dimension
  2. applies the summary_func function (or a pre-defined function if a string is passed) to that input variable, reducing all dimensions except summary_dim
  3. assign the result as a new 1-d coordinate with a GeometryIndex in the output dataset

(I assume reducing over a flexible number of dimensions is possible with shapely 2's universal functions?)

That could be useful to have (maybe alongside n-dimensional coordinate with a GeometryIndex if this proves to be possible)

@martinfleis
Copy link
Member

I assume reducing over a flexible number of dimensions is possible with shapely 2's universal functions?

Generally yes, you should be able to specify dimension of an array along which the reduction happens but I don't think this is properly tested so we may hit some weirdness assuming 1d array.

One thing we should figure out is the CRS metadata for geometries as values in the array as that either needs to be linked to the GeometryIndex or solved differently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants