-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(trivial) xarray.quantile silently resolves dask arrays #1524
Comments
Dask implements percentile now
http://dask.pydata.org/en/latest/array-api.html#dask.array.percentile
So perhaps our version of quantile can be refactored to accommodate actual lazy computation on dask arrays, rather than simply erroring.
In any case, I agree that automatic silent eager evaluation of dask arrays is bad.
…Sent from my iPhone
On Aug 24, 2017, at 11:54 AM, crusaderky ***@***.***> wrote:
In variable.py, line 1116, you're missing a raise statement:
if isinstance(self.data, dask_array_type):
TypeError("quantile does not work for arrays stored as dask "
"arrays. Load the data via .compute() or .load() prior "
"to calling this method.")
Currently looking into extending dask.percentile() to support more than 1D arrays, and then use it in xarray too.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Dask only supports 1d. One would first need to expand dask to support N-dimensional arrays like numpy does. I plan to di it if/when I have the time |
@crusaderky - thanks for this report. I just opened #1529 which takes care of the trivial part of this issue. If you want to tackle bringing dask.percentile in, that would be awesome. |
Now that SciPy is going on, is there any momentum here for trying to add the dask implementation in someway? This is an issue for some of our workloads, would be great if someone was looking into it or could point me in the direction to start adapting the current source to support it. |
@acrosby if you're at SciPy, I'd be happy to chat about this tomorrow or over the weekend if you're staying for the sprints. This is not an immediate priority for me, but it would be straightforward to make quantile work over non-chunked dimensions by rewriting it to use Approximate quantiles over chunked dimensions could be done by leveraging dask.array.percentile, but that algorithm has some accuracy concerns. See dask/dask#1225 for discussion. |
See also dask/dask#3099 |
Thanks @shoyer I had forgotten that the dask implementation has its own problems anyway. |
@shoyer does dask/dask#4677 solve those accuracy concerns? |
Yes, to some degree. I'm still troubled by that the "default" algorithm (which is selected by default) has no error bounds. It seems a little backwards to me to default to a fast algorithm with unknown accuracy. Also, it still only works on 1D arrays, which would not be terribly useful for us. |
In variable.py, line 1116, you're missing a raise statement:
Currently looking into extending dask.percentile() to support more than 1D arrays, and then use it in xarray too.
The text was updated successfully, but these errors were encountered: