Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration with Uncertainties #3

Open
TomNicholas opened this issue Apr 7, 2020 · 10 comments
Open

Integration with Uncertainties #3

TomNicholas opened this issue Apr 7, 2020 · 10 comments

Comments

@TomNicholas
Copy link
Member

TomNicholas commented Apr 7, 2020

It would be absolutely great to be able to propagate unit-aware arrays with uncertainties through xarray, but it's unclear to me to what extent pint is currently integrated with the Uncertainties package.

I also haven't thought much about whether this would present any additional challenges on the xarray side.

@jthielen
Copy link
Collaborator

jthielen commented Apr 8, 2020

Right now, Pint does not simultaneously support NumPy and Uncertainties, so this integration would likely have to be on hold until they would become mutually supported.

xref pydata/xarray#3509

@miketynes
Copy link

@jthielen Do you happen to know if there are any plans to support integrating both packages at once?

@jthielen
Copy link
Collaborator

As far as I'm aware, it's a desired feature but has no roadmap or timeline to implementation. Though, I'd encourage you to ask directly on the Pint issue, as the Pint maintainers themselves may have a better idea!

@miketynes
Copy link

Thanks for your response. Somehow I didn't notice that this issue was actually on a different repo!

@varchasgopalaswamy
Copy link

varchasgopalaswamy commented Sep 23, 2022

With regards to uncertainties, I think one difficulty with the existing uncertainties package is that you have to hand-write the propagation rules for each operation, which is definitely not feasible for the entire numpy library. If you used an auto-differentiation library like JAX, you could imagine trying to have that implement __array_function__ for all numpy functions. I tried, and was a little successful with getting something like this working here, but there's a ton of issues with the implementation that I don't think can be resolved with the hacky way I implemented it.

Probably the best way to do it would be to re-implement it in xarray. Would there be any interest in this? I would definitely be interesting in contributing towards getting something like this to work.

@TomNicholas
Copy link
Member Author

@varchasgopalaswamy if you can get uncertainty support working in xarray in a general and interoperable way, people would love that! I would expect it to be a pretty big project though.

re-implement it in xarray

Can you expand on what you mean by this? Making uncertainties work in a way that doesn't prevent you from using xarray as your top-level object, and also using different array types underneath (e.g. dask, cupy), would be critical for widespread adoption.

There are also some discussions about potentially wrapping Jax in xarray ( I can find links if you want).

@varchasgopalaswamy
Copy link

Agreed, I don't expect it to be easy - but I would find it useful for my research. We're thinking of adopting xarray throughout our code, which requires support for uncertainty propagation, so I have a vested interest in getting this to work!

Can you expand on what you mean by this?

I am still new to xarray and I need to get more familiar with the data model and terminology, so I might not be saying this right.

Looking through this repo, it looks like how pint integration is being done is by either converting the value stored in a DataArray to a Quantity, or by having a "units" entry in attrs. Let's say you have two DataArrays that you add together. How does pint enter the picture here? Do you quantify the dataset values, add the two Quantity arrays together, and then create a new DataArray with the resulting Quantity array?

@TomNicholas
Copy link
Member Author

TomNicholas commented Sep 25, 2022

We're thinking of adopting xarray throughout our code, which requires support for uncertainty propagation, so I have a vested interest in getting this to work!

This is how every cool feature gets made :)

Looking through this repo, it looks like how pint integration is being done is by either converting the value stored in a DataArray to a Quantity, or by having a "units" entry in attrs.

Xarray objects are wrappers of numpy arrays, or of arrays that can be treated as if they were numpy arrays (so-called "duck-typed arrays"). Xarray expects that the wrapped array exposes certain methods and attributes (e.g. .shape, .mean(), __add__()), which act in the same way. The rules for what exactly counts as a numpy-like array has recently become a lot more formalized (thankfully) via the python array API standard.

Xarray organises and aligns the wrapped data, but the actual computation is always performed by the underlying array type. So da + da will ultimately call ndarray.__add__ if each da is wrapping a numpy array. You can see what particular array object the DataArray is wrapping by calling .data.

So in the case of pint, xarray is wrapping a pint.Quantity, and all operations are delegated to the pint.Quantity object. (pint.Quantity happens to further wrap numpy.ndarray, but this isn't a requirement from xarray's perspective, it's just an implementation detail of pint.)

Let's say you have two DataArrays that you add together. How does pint enter the picture here? Do you quantify the dataset values, add the two Quantity arrays together, and then create a new DataArray with the resulting Quantity array?

Basically yes, but if you start with a Dataset with quantified values, the other steps are handled automatically by xarray delegating to the wrapped pint.Quantity objects.

EDIT with some links:

This issue might interest you - it's about wrapping ragged-length arrays with xarray, but it
talks about the API requirements in more detail. pydata/xarray#4285

Comment about JAX in xarray pydata/xarray#3232 (comment)

jax-ml/jax#1565

Project board for xarray wrapping all the arrays

@westurner
Copy link

westurner commented Jan 1, 2025

With regards to uncertainties, I think one difficulty with the existing uncertainties package is that you have to hand-write the propagation rules for each operation, which is definitely not feasible for the entire numpy library. If you used an auto-differentiation library like JAX, you could imagine trying to have that implement array_function for all numpy functions. I tried, and was a little successful with getting something like this working here, but there's a ton of issues with the implementation that I don't think can be resolved with the hacky way I implemented it.

Probably the best way to do it would be to re-implement it in xarray.

Do the error propagation rules for tensors differ from those for xarrays? Could there be a portable spec and test cases and a comparison document notebook for array/tensor broadcasting rules?

xarray implements different broadcasting rules than numpy;
xarray docs > api > ufuncs:
https://docs.xarray.dev/en/stable/api.html#universal-functions ::

for xarray objects backed by non-NumPy array types (e.g. cupy, sparse, or jax), they will ensure that the computation is dispatched to the appropriate backend.

A page in the xarray docs on tensors would be helpful;

  • Are xarrays tensors?
  • How do xarrays compare to numpy and tensor libraries like PyTorch, TensorFlow, xtensor, einops' and JAX.array in terms of (default?) broadcasting rules?
  • How do xarrays compare to [same] in terms of error propagation with uncertainties?
  • Are there specs for tensor metadata?
  • Are there specs for tensor error propagation and broadcasting?
  • Are xarrays named tensors?

Named Tensors, metadata, tensor ops,

(FWIW sympy's lambdify doesn't work w/ JAX; but sympy2jax does: https://github.com/patrick-kidger/sympy2jax#see-also-other-libraries-in-the-jax-ecosystem )

xarray_jax
https://github.com/allen-adastra/xarray_jax#sharp-edges

@keewis
Copy link
Collaborator

keewis commented Jan 5, 2025

I'm going to try to answer your questions to the best of my knowledge, but I didn't look too much into uncertainty propagation. While this issue is about integration of pint-xarray with uncertainties, I think this discussion would be better to have on xarray's repository or (preferably) a specialized package like xarray-jax.

Are xarrays tensors?

If you're referring to the ML term (an alias of "n-dimensional array"), then yes. If you're talking about the term in physics, then usually not. If you look for "array" in the xarray docs you should get a lot of documentation.

Do the error propagation rules for tensors differ from those for xarrays?

There's no built-in error propagation in xarray, it is up to the wrapped library to define those.

xarray implements different broadcasting rules than numpy

Indeed, xarray "aligns" data instead of broadcasting it (xr.broadcast is the equivalent of numpy-type broadcasting, which first transposes the data to match dimension names). However, the alignment either removes data or places a constant value (nan by default for float) into the expanded data, both of which shouldn't affect the uncertainties.

Are there specs for tensor metadata?

Yes, although those are domain specific. For climate science / meteorology / oceanography and related topics, there's the CF conventions, and there may be more for other domains.

Are xarrays named tensors?

xarray.namedarray.NamedArray (and xarray.Variable) are "named tensors", or at least pretty close. xarray.DataArray provides additional features like coordinates and indexes, xarray.Dataset can be seen as a mapping of DataArrays, and xarray.DataTree as a tree of Datasets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants