Picking up #1118: Do not convert subclasses of `ndarray` unless required #2956

keewis · 2019-05-12T23:12:43Z

This picks up the work done in #1118, plus converting the unittest
based tests to pytest (since that seems to be used in the other
tests).

closes #1118

Some of the suggestions there are not implemented yet:

the disabling of bottleneck if a ndarray subclass is used
testing for operations like xarray.concat. This does not work with any of the unit libraries (at least, pint, astropy.units, unyt and quantities don't support it yet).
testing with other unit libraries that use subclasses (e.g. astropy.units or unyt)
renaming the test file to test_units_subclass.py
fix indexing (NumpyIndexingAdapter)

The requirements files for CI changed since then, so maybe more than
the two right now need the quantities entry.

so the unit information is kept on import.

pep8speaks · 2019-05-12T23:12:46Z

Hello @keewis! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-08-26 13:40:47 UTC

max-sixty · 2019-05-13T00:19:19Z

Looks great! Thanks @keewis

I don't know this area that well, so I'll defer to someone else on a proper review

The namespace class has been useful for switching off the tests using only one decorator, but now that the tests get skipped if necessary on module level, it is not required anymore.

keewis · 2019-05-20T00:01:26Z

Most of the items above are implemented in unit-support, even support for array wrappers with units like pint (if my tests are sufficient for deciding that: pint still complains/warns about stripped units). The calls to asarray and asanyarray are skipped based on whether any of the __array_*__ methods and the attributes unit or units exist.

I did not push these commits to this PR because most of it does not have anything to do with subclasses. I did not open a new PR for the branch either because it builds on this one and because I'd like to receive feedback on any of this first.

Combining this with dask arrays should still fail, though.

shoyer · 2019-05-20T04:59:16Z

This looks like a good start to me. So we can start making progress on this, I think it would make sense to hide this behavior behind an option, e.g., xarray.set_options(enable_experimental_ndarray_subclass_support=True).

I would consider writing some tests with two of the ndarray subclasses that ship with NumPy:

np.matrix
np.ma.MaskedArray

Both of these subclasses violate some important principles of subclassing, namely they aren't substitutable with objects of their parent class. Thus, I think it would be reasonable to explicitly "blacklist" them for use inside in xarray.DataArray. I think we already explicitly convert MaskedArray objects, but perhaps we should also add a check for np.matrix objects.

In terms of operations, one big class that comes to mind are aggregations, e.g., .mean(). It would be good to make sure that these work.

shoyer · 2019-05-20T05:14:18Z

xarray/tests/test_quantities.py

+
+
+@pytest.fixture
+def data():


This is a minor point, but I would recommend against using pytest fixtures for test data. (Though we do use this style currently in xarray in a few places.)

Fixtures are great for setup/teardown (e.g., cleaning up temporary files), but when you test depends on data values, it's best if the data values can be found in the test itself. Otherwise, the logic in a test is not self-contained, which means you have to understand a much bigger context of code.

All this is true for even helper functions, but pytest's fixtures are even more magical:

They use magic name matching instead of Python's lookup rules (Fixtures without name matching magic pytest-dev/pytest#3834)

They can add arbitrary teardown logic, beyond merely creating an object.

So in summary:

Prefer creating test data in a test itself, if your test relies on any aspect of the data values.

If you do want to re-use constructors, prefer using normal helper functions to fixtures. These are easier to understand.

right, so I used helper functions instead

this improves the readability of the information printed by pytest.

…is set

black

keewis · 2019-08-20T20:35:02Z

I'm not too sure about putting _asarray into npcompat.py, but apart from that I would say it is ready to be merged.

However, the question of whether or not this PR is obsolete now still remains: if one were to replace quantities with pint, the test checking that this does not work without the proposed changes enabled is the only one that fails.

max-sixty · 2019-08-20T22:06:10Z

xarray/tests/test_quantities.py

+    return x, y, xp
+
+
+def create_coords(x=None, y=None, xp=None):


You write these in whichever format works best for you; don't take this as a dictum to change. But:

You might find that writing these using pytest fixtures makes the code nicer, and is easier to write

Here, you'd

add @pytest.fixture to this function,

rename it to coords,

and then any test function you wanted access to its result you'd add its name like def test_units_in_data_and_coords(coords):

You can have fixtures taking data from other fixtures, so you can still have the chain you're creating here, and you get parameterization for free!

well, that was what I did before, but when I created this PR, I was told in #2956 (comment) that data fixtures were adding complexity to the tests so I should either create the data in the tests themselves or use helper functions.

I don't think it particularly matters whether you write fixtures or helper functions. My point was more about trying to keep test-specific inputs and assertions as close to each other as possible, so you can read a test without constantly referring to a helper function that defines the inputs.

I hadn't seen that, sorry you're caught in the middle there @keewis.

For this case, you can do it how you wish; I think @shoyer 's emphasis was on the locality rather than the format - so no benefit from create_x over an x fixture

(then we can all have a separate discussion about locality vs generality without holding this work up)

should I then leave it as is or rewrite it so that the data is created in each individual test? None of these actually need all of data, dimensions and coordinates as subclass instances. Actually, they are almost the same as the ones in the original PR. I would be happy with both, though I do think rewriting may actually make the tests a little bit easier to understand.

Assuming it's as straight swap between create_X and a fixture X (i.e. no change to the locality), I think they would be better as pytest fixtures. They're acceptable as either, though.

@shoyer what are your thoughts about the case at hand?

What I meant was that at the moment I have a single 2d array subclass as data and both dimensions and an extra coordinate are also subclass instances. The dimensions (and the coordinate) only matter in the tests they are used in, so most of the data arrays used could be in place created 1d arrays.

I modified the tests to create the data in place.

OK great! Cheers @keewis

shoyer · 2019-08-20T22:50:25Z

However, the question of whether or not this PR is obsolete now still remains: if one were to replace quantities with pint, the test checking that this does not work without the proposed changes enabled is the only one that fails.

Does pint do __array_function__ yet? It looks like the answer is "not yet" but there's a branch with experimental support: hgrecco/pint#764

We could add the experimental option for allowing subclasses for testing purposes, but ultimately I think we want to insist that objects we use inside data implement __array_function__, regardless of whether they are a numpy array subclass or duck array. Otherwise there's no way complicated operations like concat are going to work properly.

I'd love to add tests that verify that xarray can properly wrap arrays with units as soon as possible. We could even merge tests that are all marked "xfail" and that currently only pass when using the experimental version of pint from hgrecco/pint#764.

As we saw with sparse array support, this will be useful for two reasons:

It will help us identify remaining issues that need to be solved either in xarray or in an array-with-units library like pint.
There are also likely some subtle tweaks we can do inside xarray to better support arrays with units, e.g., displaying units in the xarray.Dataset repr.

dopplershift · 2019-08-20T23:02:38Z

Yeah, pint doesn't do __array_function__ yet. It'd be great to get this is, if for only that it lowers the barrier to make sure xarray works with pint in the future.

keewis · 2019-08-20T23:06:27Z

Do you have a rough idea of which operations these tests should check? The ones currently implemented are mean() and an xfailing concat. There is also the sel(), but as it only tries to use values with the matching unit, it works even though it should probably fail.

jthielen · 2019-08-20T23:07:27Z

I'd love to add tests that verify that xarray can properly wrap arrays with units as soon as possible. We could even merge tests that are all marked "xfail" and that currently only pass when using the experimental version of pint from hgrecco/pint#764.

This is actually something that I've been working towards recently, but I've ran into some delays. I'm still working on the pint side of things since there is a fair number of numpy functions not yet implemented and tested in hgrecco/pint#764.

@keewis Would you want to take the lead on these xarray + pint with __array_function__ tests, since you've been working on related tests here, or would you want me to see what I can come up with based on what I've been doing so far?

keewis · 2019-08-20T23:15:00Z

I think I would like to tackle this, but I definitely will need some help with finding examples for some of the more advanced operations.

@shoyer should I add these in a new PR or should I repurpose this one?

shoyer · 2019-08-20T23:21:42Z

You are welcome to continue here or in a new PR, whichever you prefer. I don't have a strong opinion, please do whichever seems easier for you.

I think the best examples of complex operations to check for compatibility can be found in xarray/tests/test_sparse.py and xarray/tests/test_dask.py

keewis · 2019-08-21T00:08:53Z

then I would like to finish this one and modify the tests in a new one.

keewis · 2019-08-26T14:26:34Z

We could add the experimental option for allowing subclasses for testing purposes, but ultimately I think we want to insist that objects we use inside data implement __array_function__, regardless of whether they are a numpy array subclass or duck array. Otherwise there's no way complicated operations like concat are going to work properly.

@shoyer: What kind of testing would that be? If this is something that would be removed again in the future and has no actual value, feel free to close (but don't forget the original PR). Otherwise, I'd be happy to continue working on this.

keewis · 2019-12-25T14:11:45Z

Since we have been making a lot of progress using __array_function__ (which does not exclude subclasses but is much more flexible), this approach is outdated and I'm closing this and #1118.

keewis added 8 commits May 12, 2019 16:42

import the unittest test suite for quantities

dcc4bfe

make sure no divide by zero occurs

f6925be

use asanyarray instead of asarray in as_compatible_data

ef13531

so the unit information is kept on import.

preserve ndarray subclasses with the data accessor

b373ecf

now the sel test passes, too, so don't xfail it

a2aced1

remove the last divide-by-zero possibility

97683a4

add quantities to some of the requirements files

2ece12b

rename the test file to match the name of the original test file

7a25fb6

remove trailing whitespace

4348e0b

keewis added 7 commits May 13, 2019 14:45

fix a typo

dbeaed8

replace the single data fixture with multiple smaller ones

f792478

add a test for combining data arrays

b2e3ae2

replace the requires_quantities decorator with skipif on module level

453c693

convert the test methods from the namespace class to functions

0d4b543

The namespace class has been useful for switching off the tests using only one decorator, but now that the tests get skipped if necessary on module level, it is not required anymore.

also check that units on the data itself survive

8beaf76

fix the order of imports

b4d4288

shoyer reviewed May 20, 2019

View reviewed changes

keewis added 9 commits May 20, 2019 18:26

assert in the comparison function instead of asserting the result

1ad1d6d

this improves the readability of the information printed by pytest.

use data creation helpers instead of data fixtures

2b654a5

add an option to switch on the support for subclasses

c52bdf4

modify duck_array_ops.asarray to work like asanyarray if enabled

92e62b3

add a function that uses asanyarray instead of asarray if the option …

280abf3

…is set

use the new asarray function instead of using options directly

24d2771

explicitly convert matrix objects to ndarrays

2ea846e

wrap the option name and validator lines

5a4db0c

add tests to ensure the matrix and MaskedArray classes get converted

9809596

keewis added 5 commits August 19, 2019 16:22

Merge commit 'd089df38' into member-arrays-with-units

c1e513a

black

Merge branch 'master' into member-arrays-with-units

5477bca

move the function deciding between asarray and asanyarray to npcompat

c787809

make sure the original arrays are used as comparison

c653eaa

isort

5aee870

keewis force-pushed the member-arrays-with-units branch from 44d796a to 5aee870 Compare August 20, 2019 15:28

allow passing custom arrays to the helper functions

c2944c5

max-sixty reviewed Aug 20, 2019

View reviewed changes

keewis added 3 commits August 21, 2019 01:52

create the test data in the tests to increase the readability

8e7d7ce

Merge branch 'master' into member-arrays-with-units

25f5800

black

e13e273

jthielen mentioned this pull request Aug 21, 2019

Insuring interoperability of pint with other (non-unit) array-like types hgrecco/pint#845

Closed

reuse the coordinate dict

a89a1e5

keewis mentioned this pull request Aug 22, 2019

tests for arrays with units #3238

Merged

16 tasks

keewis added 2 commits August 22, 2019 16:47

ignore the missing type annotations for quantities

fe6a799

Merge branch 'master' into member-arrays-with-units

c4d8512

keewis closed this Dec 25, 2019

keewis mentioned this pull request Dec 25, 2019

Do not convert subclasses of ndarray unless required #1118

Closed

keewis deleted the member-arrays-with-units branch December 25, 2019 14:14

keewis mentioned this pull request Nov 3, 2024

Allow wrapping astropy.units.Quantity #9705

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Picking up #1118: Do not convert subclasses of `ndarray` unless required #2956

Picking up #1118: Do not convert subclasses of `ndarray` unless required #2956

keewis commented May 12, 2019 •

edited

Loading

pep8speaks commented May 12, 2019 •

edited

Loading

max-sixty commented May 13, 2019

keewis commented May 20, 2019

shoyer commented May 20, 2019

shoyer May 20, 2019

keewis May 20, 2019

keewis commented Aug 20, 2019

max-sixty Aug 20, 2019

keewis Aug 20, 2019

shoyer Aug 20, 2019

max-sixty Aug 20, 2019

keewis Aug 20, 2019

max-sixty Aug 20, 2019

keewis Aug 20, 2019 •

edited

Loading

keewis Aug 21, 2019

max-sixty Aug 21, 2019

shoyer commented Aug 20, 2019

dopplershift commented Aug 20, 2019

keewis commented Aug 20, 2019 •

edited

Loading

jthielen commented Aug 20, 2019

keewis commented Aug 20, 2019

shoyer commented Aug 20, 2019

keewis commented Aug 21, 2019

keewis commented Aug 26, 2019

keewis commented Dec 25, 2019



		@pytest.fixture
		def data():

Picking up #1118: Do not convert subclasses of ndarray unless required #2956

Picking up #1118: Do not convert subclasses of ndarray unless required #2956

Conversation

keewis commented May 12, 2019 • edited Loading

pep8speaks commented May 12, 2019 • edited Loading

Comment last updated at 2019-08-26 13:40:47 UTC

max-sixty commented May 13, 2019

keewis commented May 20, 2019

shoyer commented May 20, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

keewis commented Aug 20, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

keewis Aug 20, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shoyer commented Aug 20, 2019

dopplershift commented Aug 20, 2019

keewis commented Aug 20, 2019 • edited Loading

jthielen commented Aug 20, 2019

keewis commented Aug 20, 2019

shoyer commented Aug 20, 2019

keewis commented Aug 21, 2019

keewis commented Aug 26, 2019

keewis commented Dec 25, 2019

Picking up #1118: Do not convert subclasses of `ndarray` unless required #2956

Picking up #1118: Do not convert subclasses of `ndarray` unless required #2956

keewis commented May 12, 2019 •

edited

Loading

pep8speaks commented May 12, 2019 •

edited

Loading

keewis Aug 20, 2019 •

edited

Loading

keewis commented Aug 20, 2019 •

edited

Loading