-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] dask-cudf .describe()
broken with NumPy 1.20
#7289
Comments
Should be fixed by dask/dask#7162 . |
This issue has been labeled |
dask/dask#7162 has been merged, can we close this @shwina ? |
@pentschev with the latest nightlies, this is still broken: Dask: '2021.03.0' Snippet:
Result:
|
@randerzander could you check that you have at least the versions below? It works for me on a new environment created just now: In [1]: import dask_cudf, cudf, numpy as np, dask, cupy as cp
In [2]: np.__version__
Out[2]: '1.20.1'
In [3]: cp.__version__
Out[3]: '8.5.0'
In [4]: dask.__version__
Out[4]: '2021.03.0'
In [5]: cudf.__version__
Out[5]: '0.19.0a+270.g267d29ba5a'
In [6]: dask_cudf.__version__
Out[6]: '0.19.0a+270.g267d29ba5a'
In [7]: df = cudf.DataFrame({'id': [0, 1, 2], 'val': [0, 1, 2]})
...: ddf = dask_cudf.from_cudf(df, npartitions=2)
...:
...: ddf.describe().compute()
...:
Out[7]:
id val
count 3.0 3.0
mean 1.0 1.0
std 1.0 1.0
min 0.0 0.0
25% 0.5 0.5
50% 1.0 1.0
75% 1.5 1.5
max 2.0 2.0 |
Oh, I see what happened, it's actually now broken for NumPy < 1.20, since dask/dask#7172 . @shwina the line https://github.com/dask/dask/pull/7172/files#diff-b0bd5609b1b3853c06a0e8bbe312694bffc952fd1b116bd4ecb6f85a0ea7874bR231 doesn't work for NumPy < 1.20 with CuPy because it lacks NEP-35 (a.k.a., I see 4 options:
|
Requiring NumPy 1.20+ makes a lot of sense to me. There's a lot of really important improvements particularly for RAPIDS there (of course you already know this Peter) and I think we are going to find it hard to get things working for older NumPy versions |
This is my personal preference too, but I guess we need to know whether there's users who for whatever reason can't upgrade to NumPy 1.20+, if not, then we should explicitly pin |
Yep I'm happy to make that PR if we go that route 🙂 |
It looks like the code was already using the |
So do you have thoughts about 2-4 then? |
I'm most in favor of number 3 personally, but not sure how much a rabbit hole that is. |
@shwina as you worked on this do you see a way of approaching 3? |
It's true, but in the previous code both |
This is resolved |
The
.describe()
method of dask-cudf fails with cudf 0.18 (nightly) and NumPy v1.20. Minimal repro:The text was updated successfully, but these errors were encountered: