-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add to_numpy() and as_numpy() methods #5568
Conversation
Irritating linting error:
|
@TomNicholas I think that should fix it. But it's become Haskell-esque! |
Hello @TomNicholas! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2021-07-21 21:11:19 UTC |
The only failure is the same as in #5600 , so I think this should be ready to merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great. Thanks @TomNicholas
xarray/core/variable.py
Outdated
try: | ||
data = data.to_numpy() | ||
except AttributeError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this always failing? Should we avoid it until such a protocol actually exists?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be my inclination, too.
I can also imagine some other library implementing to_numpy()
in a weird way, which could cause issues. At the least, there should be a defensive check of isinstance(data, np.ndarray)
afterwards.
@@ -2540,3 +2542,68 @@ def test_clip(var): | |||
var.mean("z").data[:, :, np.newaxis], | |||
), | |||
) | |||
|
|||
|
|||
@pytest.mark.parametrize("Var", [Variable, IndexVariable]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could stick these in VariableSubclassObjects
? Could also do that later if it's a good idea
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just had a go at that and it broke the pint-only test through interaction with TestVariableWithDask
, which tries to chunk.
@dcherian I think `.to_numpy()` does actually work in the case of an
IndexVariable, because the underlying pandas index has that method. That's
not going to be used yet, but we should perhaps leave it in to prepare for
the flexible indexes?
Other than that I like your other suggestions, please commit and merge as
you see fit!
…On Fri, 16 Jul 2021, 10:44 Deepak Cherian, ***@***.***> wrote:
***@***.**** approved this pull request.
looks great. Thanks @TomNicholas <https://github.com/TomNicholas>
------------------------------
In xarray/core/dataarray.py
<#5568 (comment)>:
> @@ -623,7 +623,7 @@ def __len__(self) -> int:
@Property
def data(self) -> Any:
- """The array's data as a dask or numpy array"""
+ """The array's data as a numpy-like array"""
⬇️ Suggested change
- """The array's data as a numpy-like array"""
+ """
+ The DataArray's data as an array. The underlying array type
+ (e.g. dask, sparse, pint) is preserved.
+
+ See Also
+ --------
+ DataArray.to_numpy
+ DataArray.as_numpy
+ DataArray.values
+ """
------------------------------
In xarray/core/dataarray.py
<#5568 (comment)>:
> @@ -632,13 +632,42 @@ def data(self, value: Any) -> None:
@Property
def values(self) -> np.ndarray:
- """The array's data as a numpy.ndarray"""
+ """
+ The array's data as a numpy.ndarray.
+
+ If the array's data is not a numpy.ndarray this will attempt to convert
+ it naively using np.array(), which will raise an error if the array
+ type does not support coercion like this.
⬇️ Suggested change
- type does not support coercion like this.
+ type does not support coercion like this (e.g. cupy).
------------------------------
In xarray/core/dataarray.py
<#5568 (comment)>:
> return self.variable.values
@values.setter
def values(self, value: Any) -> None:
self.variable.values = value
+ def to_numpy(self) -> np.ndarray:
+ """
+ Coerces wrapped data to numpy and returns a numpy.ndarray.
+
+ See also
+ --------
+ DataArray.as_numpy : Same but returns the surrounding DataArray instead.
+ Dataset.as_numpy
⬇️ Suggested change
- Dataset.as_numpy
+ Dataset.as_numpy
+ DataArray.values
+ DataArray.data
------------------------------
In xarray/core/dataarray.py
<#5568 (comment)>:
> +
+ See also
+ --------
+ DataArray.as_numpy : Same but returns the surrounding DataArray instead.
+ Dataset.as_numpy
+ """
+ return self.variable.to_numpy()
+
+ def as_numpy(self: T_DataArray) -> T_DataArray:
+ """
+ Coerces wrapped data and coordinates into numpy arrays, returning a DataArray.
+
+ See also
+ --------
+ DataArray.to_numpy : Same but returns only the data as a numpy.ndarray object.
+ Dataset.as_numpy : Converts all variables in a Dataset.
⬇️ Suggested change
- Dataset.as_numpy : Converts all variables in a Dataset.
+ Dataset.as_numpy : Converts all variables in a Dataset.
+ DataArray.values
+ DataArray.data
------------------------------
In xarray/core/pycompat.py
<#5568 (comment)>:
>
+ self.module = duck_array_module
+ self.version = duck_array_version
+ self.type = duck_array_type
+ self.available = self.version > "0.0.0"
⬇️ Suggested change
- self.available = self.version > "0.0.0"
+ self.available = duck_array_module is not None
v. minor: this seems cleaner?
------------------------------
In xarray/core/variable.py
<#5568 (comment)>:
> +dask_array_type = DuckArrayModule("dask").type
+cupy_array_type = DuckArrayModule("cupy").type
+sparse_array_type = DuckArrayModule("sparse").type
import these from pycompat as earlier?
------------------------------
In xarray/core/variable.py
<#5568 (comment)>:
> + try:
+ data = data.to_numpy()
+ except AttributeError:
Isn't this always failing? Should we avoid it until such a protocol
actually exists?
------------------------------
In xarray/core/variable.py
<#5568 (comment)>:
> @@ -1069,6 +1069,32 @@ def chunk(self, chunks={}, name=None, lock=False):
return self._replace(data=data)
+ def to_numpy(self) -> np.ndarray:
+ """Coerces wrapped data to numpy and returns a numpy.ndarray"""
+ # TODO an entrypoint so array libraries can choose coercion method?
+ data = self.data
+ try:
+ data = data.to_numpy()
+ except AttributeError:
+ if isinstance(data, dask_array_type):
+ data = self.compute().data
⬇️ Suggested change
- data = self.compute().data
+ data = data.compute()
to match the rest?
------------------------------
In xarray/tests/test_variable.py
<#5568 (comment)>:
> @@ -2540,3 +2542,68 @@ def test_clip(var):
var.mean("z").data[:, :, np.newaxis],
),
)
+
+
***@***.***("Var", [Variable, IndexVariable])
you could stick these in VariableSubclassObjects? Could also do that
later if it's a good idea
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#5568 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AISNPIYHDVQ7DOWCTQH4REDTYBOW5ANCNFSM47XKDGNQ>
.
|
The only error is the ffspec error now |
xarray/core/variable.py
Outdated
try: | ||
data = data.to_numpy() | ||
except AttributeError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be my inclination, too.
I can also imagine some other library implementing to_numpy()
in a weird way, which could cause issues. At the least, there should be a defensive check of isinstance(data, np.ndarray)
afterwards.
Co-authored-by: Stephan Hoyer <[email protected]>
Okay thanks @shoyer - I've removed the type check and the attempt to call |
* main: (31 commits) Refactor index vs. coordinate variable(s) (pydata#5636) pre-commit: autoupdate hook versions (pydata#5685) Flexible Indexes: Avoid len(index) in map_blocks (pydata#5670) Speed up _mapping_repr (pydata#5661) update the link to `scipy`'s intersphinx file (pydata#5665) Bump styfle/cancel-workflow-action from 0.9.0 to 0.9.1 (pydata#5663) pre-commit: autoupdate hook versions (pydata#5660) fix the binder environment (pydata#5650) Update api.rst (pydata#5639) Kwargs to rasterio open (pydata#5609) Bump codecov/codecov-action from 1 to 2.0.2 (pydata#5633) new blank whats-new for v0.19.1 v0.19.0 release notes (pydata#5632) remove deprecations scheduled for 0.19 (pydata#5630) Make typing-extensions optional (pydata#5624) Plots get labels from pint arrays (pydata#5561) Add to_numpy() and as_numpy() methods (pydata#5568) pin fsspec (pydata#5627) pre-commit: autoupdate hook versions (pydata#5617) Add dataarray scatter with 3d support (pydata#4909) ...
* upstream/main: (31 commits) Refactor index vs. coordinate variable(s) (pydata#5636) pre-commit: autoupdate hook versions (pydata#5685) Flexible Indexes: Avoid len(index) in map_blocks (pydata#5670) Speed up _mapping_repr (pydata#5661) update the link to `scipy`'s intersphinx file (pydata#5665) Bump styfle/cancel-workflow-action from 0.9.0 to 0.9.1 (pydata#5663) pre-commit: autoupdate hook versions (pydata#5660) fix the binder environment (pydata#5650) Update api.rst (pydata#5639) Kwargs to rasterio open (pydata#5609) Bump codecov/codecov-action from 1 to 2.0.2 (pydata#5633) new blank whats-new for v0.19.1 v0.19.0 release notes (pydata#5632) remove deprecations scheduled for 0.19 (pydata#5630) Make typing-extensions optional (pydata#5624) Plots get labels from pint arrays (pydata#5561) Add to_numpy() and as_numpy() methods (pydata#5568) pin fsspec (pydata#5627) pre-commit: autoupdate hook versions (pydata#5617) Add dataarray scatter with 3d support (pydata#4909) ...
* upstream/main: (34 commits) Use same bool validator as other inputs (pydata#5703) conditionally disable bottleneck (pydata#5560) Refactor index vs. coordinate variable(s) (pydata#5636) pre-commit: autoupdate hook versions (pydata#5685) Flexible Indexes: Avoid len(index) in map_blocks (pydata#5670) Speed up _mapping_repr (pydata#5661) update the link to `scipy`'s intersphinx file (pydata#5665) Bump styfle/cancel-workflow-action from 0.9.0 to 0.9.1 (pydata#5663) pre-commit: autoupdate hook versions (pydata#5660) fix the binder environment (pydata#5650) Update api.rst (pydata#5639) Kwargs to rasterio open (pydata#5609) Bump codecov/codecov-action from 1 to 2.0.2 (pydata#5633) new blank whats-new for v0.19.1 v0.19.0 release notes (pydata#5632) remove deprecations scheduled for 0.19 (pydata#5630) Make typing-extensions optional (pydata#5624) Plots get labels from pint arrays (pydata#5561) Add to_numpy() and as_numpy() methods (pydata#5568) pin fsspec (pydata#5627) ...
* upstream/main: (307 commits) Use same bool validator as other inputs (pydata#5703) conditionally disable bottleneck (pydata#5560) Refactor index vs. coordinate variable(s) (pydata#5636) pre-commit: autoupdate hook versions (pydata#5685) Flexible Indexes: Avoid len(index) in map_blocks (pydata#5670) Speed up _mapping_repr (pydata#5661) update the link to `scipy`'s intersphinx file (pydata#5665) Bump styfle/cancel-workflow-action from 0.9.0 to 0.9.1 (pydata#5663) pre-commit: autoupdate hook versions (pydata#5660) fix the binder environment (pydata#5650) Update api.rst (pydata#5639) Kwargs to rasterio open (pydata#5609) Bump codecov/codecov-action from 1 to 2.0.2 (pydata#5633) new blank whats-new for v0.19.1 v0.19.0 release notes (pydata#5632) remove deprecations scheduled for 0.19 (pydata#5630) Make typing-extensions optional (pydata#5624) Plots get labels from pint arrays (pydata#5561) Add to_numpy() and as_numpy() methods (pydata#5568) pin fsspec (pydata#5627) ...
pre-commit run --all-files
whats-new.rst
api.rst