-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
where
dtype upcast with numpy 2
#8402
Comments
Thanks for the well written issue. I'm not sure what to do here: Perhaps @shoyer has thoughts. |
FYI it looks like this effects |
I think we should probably make an exception for Python built-in numbers when casting to arrays. Something closer to:
|
@shoyer do you have any memory why the scalar types are converted to arrays? I don't think your suggestion catches the case where |
If I remember correctly, the array API does not support python scalar types (or numpy scalars), so we have to convert them to 0D arrays (see also the discussion in #7721). |
I don't remember exactly why this is, but I'm pretty confident I would have
written unit tests to cover it. So I would suggest changing it and finding
out!
…On Fri, Nov 3, 2023 at 2:40 PM Justus Magin ***@***.***> wrote:
If I remember correctly, the array API does not support python scalar
types (or numpy scalars), so we have to convert them to 0D arrays (see also
the discussion in #7721 <#7721>).
—
Reply to this email directly, view it on GitHub
<#8402 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJJFVXCVZJUT6J6F6YGC3DYCVQENAVCNFSM6AAAAAA626T2DGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOJTGEZTKNRWHA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I started hacking around with your idea @shoyer and had the realization that diff --git a/xarray/core/duck_array_ops.py b/xarray/core/duck_array_ops.py
index 4f245e59..8fbe4731 100644
--- a/xarray/core/duck_array_ops.py
+++ b/xarray/core/duck_array_ops.py
@@ -202,13 +202,13 @@ def as_shared_dtype(scalars_or_arrays, xp=np):
arrays = [asarray(x, xp=cp) for x in scalars_or_arrays]
else:
- arrays = [asarray(x, xp=xp) for x in scalars_or_arrays]
+ arrays = [x if isinstance(x, (int, float, complex)) else asarray(x, xp=xp) for x in scalars_or_arrays]
# Pass arrays directly instead of dtypes to result_type so scalars
# get handled properly.
# Note that result_type() safely gets the dtype from dask arrays without
# evaluating them.
out_type = dtypes.result_type(*arrays)
- return [astype(x, out_type, copy=False) for x in arrays]
+ return [astype(x, out_type, copy=False) if hasattr(x, "dtype") else x for x in arrays] Without the second modification
I'll try investigating more tomorrow if I have the time, but I get 63 failures with no changes so... 🤷♂️ |
Ok, very interesting. I think this issue will need to be fixed to preserve behavior when numpy 2.0 comes out. So this test fails right now with numpy 2 (not including my changes):
As you can see it is specifically checking the result dtype and they are not equal (the result's type was upcast to 64-bit float). This is WITHOUT my changes. This test is fixed when I rerun the tests with my hacky solution mentioned above. As for tests that fail with my changes that didn't before:
This is kind of expected given how inelegant my update was. |
Are there any issues or milestones to track for general numpy 2 compatibility work? I'm running into this particular issue again in another one of my projects (with xarray main) and my little hack above suggested by shoyer seems to fix things still, but I don't know enough about the xarray internals to know if any of those changes are pull request worthy or bound to conflict with other things in xarray. |
Good timing on the bump! We have some convo here: #8844 AFAIK no one has looked at this though I nominally volunteered to do so. It sounds like you've made some progress. Can you open a PR please, it might be easier to help then? |
What happened?
I'm testing my code with numpy 2.0 and current
main
xarray and dask and ran into a change that I guess is expected given the way xarray does things, but want to make sure as it could be unexpected for many users.Doing
DataArray.where
with an integer array less than 64-bits and an integer as the new value will upcast the array to 64-bit integers (python'sint
). With old versions of numpy this would preserve the dtype of the array. As far as I can tell the relevant xarray code hasn't changed so this seems to be more about numpy making things more consistent.The main problem seems to come down to:
xarray/xarray/core/duck_array_ops.py
Line 218 in d933578
As this converts my scalar input
int
to a numpy array. If it didn't do this array conversion then numpy works as expected. See the MCVE for the xarray specific example, but here's the numpy equivalent:From a numpy point of view, the second where call makes sense that 2 arrays should be upcast to the same dtype so they can be combined. But from an xarray user point of view, I'm entering a scalar so I expect it to be the same as the first where call above.
What did you expect to happen?
See above.
Minimal Complete Verifiable Example
MVCE confirmation
Relevant log output
No response
Anything else we need to know?
Numpy 1.x preserves the dtype.
Environment
The text was updated successfully, but these errors were encountered: