Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of polyfit and polyval #3733

Merged
merged 23 commits into from
Mar 25, 2020

Conversation

aulemahal
Copy link
Contributor

@aulemahal aulemahal commented Jan 30, 2020

  • Closes Implement polyfit? #3349
  • Tests added
  • Passes isort -rc . && black . && mypy . && flake8
  • Fully documented, including whats-new.rst for all changes and api.rst for new API

Following discussions in #3349, I suggest here an implementation of polyfit and polyval for xarray. However, this is still work in progress, a lot of testing is missing, all docstrings are missing. But, mainly, I have questions on how to properly conduct this.

My implementation mostly duplicates the code of np.polyfit, but making use of dask.array.linalg.lstsq and dask.array.apply_along_axis for dask arrays. The same method as in xscale.signal.fitting.polyfit, but I add NaN-awareness in a 1-D manner.
The version with numpy is also slightly different of np.polyfit because of the NaN skipping, but I wanted the function to replicate its behaviour. It returns a variable number of DataArrays, depending on the keyword arguments (coefficients, [ residuals, matrix rank, singular values ] / [covariance matrix]).
Thus giving a medium-length function that has a lot of duplicated code from numpy.polyfit. I thought of simply using a xr.apply_ufunc, but that makes chunking along the fitted dimension forbidden and difficult to return the ancillary results (residuals, rank, covariance matrix...).

Questions:
1 ) Are the functions where they should go?
2 ) Should xarray's implementation really replicate the behaviour of numpy's? A lot of extra code could be removed if we'd say we only want to compute and return the residuals and the coefficients. All the other variables are a few lines of code away for the user that really wants them, and they don't need the power of xarray and dask anyway.

@dcherian
Copy link
Contributor

Just a quick comment:

xr.apply_ufunc, but that makes chunking along the fitted dimension forbidden

dask="allowed" should work and is appropriate here since you're using a dask function.

difficult to return the ancillary results

yes we really should fix this. One trick is to stack the returned information in to one array and split that array up into multiple dataarrays after the apply_ufunc call.

@aulemahal
Copy link
Contributor Author

aulemahal commented Jan 30, 2020

Oh dask="allowed", that's true!
For the coeficients and residuals, I can easily do this stacking and unstacking and return only one variable. So, apply_ufunc could be used. However, it would not be very useful right now, considering they way I implemented the least-squares fit directly. Meaning: the left hand side computation and scaling and the not-fitted-dimensions stacking would be done on each call of numpy.polyfit if I did call it directly. Right now, we save some of this overhead by doing it once before iterating on 1d axes (which is anyway needed with skipna=True).

xarray/core/dataarray.py Outdated Show resolved Hide resolved
@aulemahal aulemahal mentioned this pull request Feb 3, 2020
xarray/core/computation.py Outdated Show resolved Hide resolved
xarray/core/dataarray.py Outdated Show resolved Hide resolved
xarray/core/dataarray.py Outdated Show resolved Hide resolved
doc/computation.rst Outdated Show resolved Hide resolved
doc/computation.rst Outdated Show resolved Hide resolved
doc/computation.rst Outdated Show resolved Hide resolved
xarray/core/dataarray.py Outdated Show resolved Hide resolved
@max-sixty
Copy link
Collaborator

This looks really excellent. Thanks @aulemahal !

Any thoughts @dcherian @TomNicholas before we merge?

@pep8speaks
Copy link

pep8speaks commented Mar 25, 2020

Hello @aulemahal! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-03-25 14:38:12 UTC

@max-sixty
Copy link
Collaborator

Thanks @aulemahal . Sorry I missed this: I don't think I get notifications when code is pushed, only on a comment. Ping on green and I'll merge.

@aulemahal
Copy link
Contributor Author

Ping. It be green.

@dcherian
Copy link
Contributor

I can take a look later this week if needed

@max-sixty
Copy link
Collaborator

Great, I'm merging, but post back here for further feedback and we can iterate.

Thank you very much @aulemahal !

@max-sixty max-sixty merged commit ec215da into pydata:master Mar 25, 2020
@aulemahal
Copy link
Contributor Author

Yay! Many thanks, this will greatly improve our projects.

dcherian added a commit to dcherian/xarray that referenced this pull request Mar 28, 2020
* upstream/master: (54 commits)
  Limit repr of arrays containing long strings (pydata#3900)
  expose a few zarr backend functions as semi-public api (pydata#3897)
  Use drawstyle instead of linestyle in plot.step. (pydata#3274)
  Implementation of polyfit and polyval (pydata#3733)
  misplaced quote in whatsnew (pydata#3889)
  Rename ordered_dict_intersection -> compat_dict_intersection (pydata#3887)
  Control attrs of result in `merge()`, `concat()`, `combine_by_coords()` and `combine_nested()` (pydata#3877)
  xfail test_uamiv_format_write (pydata#3885)
  Use `fixes` in PR template (pydata#3886)
  Tweaks to "how_to_release" (pydata#3882)
  whatsnew section for 0.16.0
  Release v0.15.1
  whatsnew for 0.15.1 (pydata#3879)
  update panel documentation (pydata#3880)
  reword the whats-new entry for unit support (pydata#3878)
  Raise error when assigning to IndexVariable.values & IndexVariable.data (pydata#3862)
  Re-enable tests xfailed in pydata#3808 and fix new CFTimeIndex failures due to upstream changes (pydata#3874)
  add spacing in the versions section of the issue report (pydata#3876)
  map_blocks: allow user function to add new unindexed dimension. (pydata#3817)
  Delete associated indexes when deleting coordinate variables. (pydata#3840)
  ...
dcherian added a commit to MeraX/xarray that referenced this pull request Mar 29, 2020
* upstream/master: (75 commits)
  Implement idxmax and idxmin functions (pydata#3871)
  Update pre-commit-config.yaml (pydata#3911)
  Revert "Use `fixes` in PR template (pydata#3886)" (pydata#3912)
  update the docstring of diff (pydata#3909)
  Un-xfail test_dayofyear_after_cftime_range (pydata#3907)
  Limit repr of arrays containing long strings (pydata#3900)
  expose a few zarr backend functions as semi-public api (pydata#3897)
  Use drawstyle instead of linestyle in plot.step. (pydata#3274)
  Implementation of polyfit and polyval (pydata#3733)
  misplaced quote in whatsnew (pydata#3889)
  Rename ordered_dict_intersection -> compat_dict_intersection (pydata#3887)
  Control attrs of result in `merge()`, `concat()`, `combine_by_coords()` and `combine_nested()` (pydata#3877)
  xfail test_uamiv_format_write (pydata#3885)
  Use `fixes` in PR template (pydata#3886)
  Tweaks to "how_to_release" (pydata#3882)
  whatsnew section for 0.16.0
  Release v0.15.1
  whatsnew for 0.15.1 (pydata#3879)
  update panel documentation (pydata#3880)
  reword the whats-new entry for unit support (pydata#3878)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement polyfit?
6 participants