Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cell Boundary aware operations #10

Open
andersy005 opened this issue Jun 1, 2020 · 17 comments
Open

Cell Boundary aware operations #10

andersy005 opened this issue Jun 1, 2020 · 17 comments
Labels
opinion wanted User input requested

Comments

@andersy005
Copy link
Member

For the last few months, I've been working on xgriddedaxis, a tool for working with one-dimensional axes with their respective cell boundaries information. xgriddedaxis was motivated by the fact that xarray is not aware of cell boundary variables when performing operations such as resampling along the time coordinate. The main objective of xgriddedaxis is to provide a set of utilities that enables fluid translation between data at different intervals while being aware of the cell boundary variables.

Is this something that falls within cf-xarray's scope? If so, I am happy to help out with the implementation for this in cf-xarray.

Ccing @kmpaul, @matt-long

@andersy005
Copy link
Member Author

andersy005 commented Jun 1, 2020

This issue may overlap with #9, #8

@kmpaul
Copy link

kmpaul commented Jun 1, 2020

Based on the examples in the README, it sounds like xgriddedaxis should be included in cf-xarray. It depends on the bounds attribute, which infers CF-compliance, and it supplies some excellent remapping utilities.

@kmpaul
Copy link

kmpaul commented Jun 1, 2020

@andersy005 I'm not sure if I am remembering this correctly, but my memory tells me that we envisioned that the xgriddedaxis would fit into the existing xarray workflow of dealing with resampled if and only if the data variables had a bounds attribute and corresponding variable. Correct? In that sense, then this seems to fit best into cf-xarray.

@dcherian
Copy link
Contributor

  1. One solution here is to decode the bounds attribute to a Pandas IntervalIndex or a future xarray CellIndex. (Allow DataArray to hold cell boundaries as coordinate variables pydata/xarray#1475)
  2. It may be prudent to wait till xarray’s Index API is clear. There has been some discussion of allowing “regridders” to hook into xarray’s indexing functionality. (API for multi-dimensional resampling/regridding pydata/xarray#486, API design for pointwise indexing pydata/xarray#475)

xarray has CZI funding to make good progress on explicit indexes this year, so some answers to the above points should be available in the next few months.

However, should cf-xarray provide an interim solution? If so, I'd favour an explicit opt-in ds.cf.resample(T="D", weight_bounds=True).

@kmpaul
Copy link

kmpaul commented Jun 15, 2020

Personally, since I think we have an interim solution already, I like the idea of the ds.cf.resample interface.

@dcherian
Copy link
Contributor

I agree.

But since all the other wrapped functions just rewrite the arguments, I think it would be good to make the use of bounds an explicit opt-in behaviour. This would also let you do ds.cf.resample(T="D") when the bounds are absent.

@kmpaul
Copy link

kmpaul commented Jun 15, 2020

Ah. Yes. Agree. 👍

@andersy005
Copy link
Member Author

andersy005 commented Jun 17, 2020

But since all the other wrapped functions just rewrite the arguments, I think it would be good to make the use of bounds an explicit opt-in behaviour. This would also let you do ds.cf.resample(T="D") when the bounds are absent.

Will this generate new bounds as well or will it just use existing bounds? In other words, with ds.cf.resample(time="D", weight_bounds=True), will the user get a new time coordinate with corresponding time_bounds?

@dcherian
Copy link
Contributor

New bounds would be a good idea IMO.

@dcherian
Copy link
Contributor

This discussion also lines up with the "Climatological Statistics" section of the conventions: http://cfconventions.org/Data/cf-conventions/cf-conventions-1.8/cf-conventions.html#climatological-statistics

This seems like a useful feature under the ds.cf.clim namespace at least for simple reductions like cf.clim.mean(). But we shouldn't be reimplementing anything in xclim (https://xclim.readthedocs.io/en/stable/indicators.html)

@kmpaul
Copy link

kmpaul commented Jun 18, 2020

I'm concerned about colliding namespaces. Is it fair to say that we could implement functionality in xclim in a thin layer of cf-xarray, where it makes sense? That is, make xclim a dependency for some features of cf-xarray?

@dcherian
Copy link
Contributor

As far as I can tell, xclim doesn't do climatologies so there should be no overlap or collision.

@kmpaul
Copy link

kmpaul commented Jun 18, 2020

Ok. I guess I need to look more closely at xclim and learn what it actually does. 😄

@huard
Copy link

huard commented Sep 2, 2020

xclim was initially designed to compute "climate indicators", e.g. cooling_degree_days, maximum_annual_precipitation, etc. Since then, we've added bias correction algorithms and some ensemble analysis functionalities, but there has been no effort invested so far on climatological means.

If cf-array implemented climatological operations that enforce CF-Conventions, this is something we would certainly be interested in integrating. See Ouranosinc/xclim#74 for some related work that has stalled.

@dcherian
Copy link
Contributor

@malmans2 what do you think about this issue?

@dcherian dcherian added the opinion wanted User input requested label Feb 24, 2021
@dcherian
Copy link
Contributor

dcherian commented Feb 24, 2021

The code in xgriddedaxis is really small (!) so I think it's OK to copy it over.

I like the .cf.resample(time="M", weight_bounds=True) opt-in API Or we could do .cf.weighted_resample(time="M")

Under-the-hood, i think this should do

weights = cfxr.resample_weights(input, output, freq)
obj.weighted(weights).resample(time=freq)

which would require solving pydata/xarray#3937 first. We'd also have to test out using sparse weights in the weighted operation.

We'd have to add a CFResample class that attaches new bounds after an operation like .mean

We can explore upstreaming the weighting during discussions of the new xarray indexing API.

@dcherian
Copy link
Contributor

I've started here with groupby.weighted which will also allow .resample.weighted: pydata/xarray#5480.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
opinion wanted User input requested
Projects
None yet
Development

No branches or pull requests

4 participants