Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(trivial) xarray.quantile silently resolves dask arrays #1524

Closed
crusaderky opened this issue Aug 24, 2017 · 9 comments
Closed

(trivial) xarray.quantile silently resolves dask arrays #1524

crusaderky opened this issue Aug 24, 2017 · 9 comments

Comments

@crusaderky
Copy link
Contributor

In variable.py, line 1116, you're missing a raise statement:

        if isinstance(self.data, dask_array_type):
            TypeError("quantile does not work for arrays stored as dask "
                      "arrays. Load the data via .compute() or .load() prior "
                      "to calling this method.")

Currently looking into extending dask.percentile() to support more than 1D arrays, and then use it in xarray too.

@rabernat
Copy link
Contributor

rabernat commented Aug 24, 2017 via email

@crusaderky
Copy link
Contributor Author

Dask only supports 1d. One would first need to expand dask to support N-dimensional arrays like numpy does. I plan to di it if/when I have the time

@jhamman
Copy link
Member

jhamman commented Aug 28, 2017

@crusaderky - thanks for this report. I just opened #1529 which takes care of the trivial part of this issue. If you want to tackle bringing dask.percentile in, that would be awesome.

@acrosby
Copy link

acrosby commented Jul 12, 2018

Now that SciPy is going on, is there any momentum here for trying to add the dask implementation in someway? This is an issue for some of our workloads, would be great if someone was looking into it or could point me in the direction to start adapting the current source to support it.

@shoyer
Copy link
Member

shoyer commented Jul 13, 2018

@acrosby if you're at SciPy, I'd be happy to chat about this tomorrow or over the weekend if you're staying for the sprints. This is not an immediate priority for me, but it would be straightforward to make quantile work over non-chunked dimensions by rewriting it to use apply_ufunc.

Approximate quantiles over chunked dimensions could be done by leveraging dask.array.percentile, but that algorithm has some accuracy concerns. See dask/dask#1225 for discussion.

@shoyer
Copy link
Member

shoyer commented Jul 13, 2018

See also dask/dask#3099

@acrosby
Copy link

acrosby commented Jul 18, 2018

Thanks @shoyer I had forgotten that the dask implementation has its own problems anyway.

@rafa-guedes
Copy link
Contributor

@shoyer does dask/dask#4677 solve those accuracy concerns?

@shoyer
Copy link
Member

shoyer commented Jul 23, 2019

@shoyer does dask/dask#4677 solve those accuracy concerns?

Yes, to some degree. I'm still troubled by that the "default" algorithm (which is selected by default) has no error bounds. It seems a little backwards to me to default to a fast algorithm with unknown accuracy.

Also, it still only works on 1D arrays, which would not be terribly useful for us.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants