-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Picking up #1118: Do not convert subclasses of ndarray
unless required
#2956
Conversation
so the unit information is kept on import.
Looks great! Thanks @keewis I don't know this area that well, so I'll defer to someone else on a proper review |
The namespace class has been useful for switching off the tests using only one decorator, but now that the tests get skipped if necessary on module level, it is not required anymore.
Most of the items above are implemented in unit-support, even support for array wrappers with units like I did not push these commits to this PR because most of it does not have anything to do with subclasses. I did not open a new PR for the branch either because it builds on this one and because I'd like to receive feedback on any of this first. Combining this with |
This looks like a good start to me. So we can start making progress on this, I think it would make sense to hide this behavior behind an option, e.g., I would consider writing some tests with two of the ndarray subclasses that ship with NumPy:
Both of these subclasses violate some important principles of subclassing, namely they aren't substitutable with objects of their parent class. Thus, I think it would be reasonable to explicitly "blacklist" them for use inside in xarray.DataArray. I think we already explicitly convert MaskedArray objects, but perhaps we should also add a check for In terms of operations, one big class that comes to mind are aggregations, e.g., |
xarray/tests/test_quantities.py
Outdated
|
||
|
||
@pytest.fixture | ||
def data(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a minor point, but I would recommend against using pytest fixtures for test data. (Though we do use this style currently in xarray in a few places.)
Fixtures are great for setup/teardown (e.g., cleaning up temporary files), but when you test depends on data values, it's best if the data values can be found in the test itself. Otherwise, the logic in a test is not self-contained, which means you have to understand a much bigger context of code.
All this is true for even helper functions, but pytest's fixtures are even more magical:
- They use magic name matching instead of Python's lookup rules (Fixtures without name matching magic pytest-dev/pytest#3834)
- They can add arbitrary teardown logic, beyond merely creating an object.
So in summary:
- Prefer creating test data in a test itself, if your test relies on any aspect of the data values.
- If you do want to re-use constructors, prefer using normal helper functions to fixtures. These are easier to understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, so I used helper functions instead
this improves the readability of the information printed by pytest.
44d796a
to
5aee870
Compare
I'm not too sure about putting However, the question of whether or not this PR is obsolete now still remains: if one were to replace |
xarray/tests/test_quantities.py
Outdated
return x, y, xp | ||
|
||
|
||
def create_coords(x=None, y=None, xp=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You write these in whichever format works best for you; don't take this as a dictum to change. But:
You might find that writing these using pytest fixtures makes the code nicer, and is easier to write
Here, you'd
- add
@pytest.fixture
to this function, - rename it to
coords
, - and then any test function you wanted access to its result you'd add its name like
def test_units_in_data_and_coords(coords):
You can have fixtures taking data from other fixtures, so you can still have the chain you're creating here, and you get parameterization for free!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, that was what I did before, but when I created this PR, I was told in #2956 (comment) that data fixtures were adding complexity to the tests so I should either create the data in the tests themselves or use helper functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it particularly matters whether you write fixtures or helper functions. My point was more about trying to keep test-specific inputs and assertions as close to each other as possible, so you can read a test without constantly referring to a helper function that defines the inputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hadn't seen that, sorry you're caught in the middle there @keewis.
For this case, you can do it how you wish; I think @shoyer 's emphasis was on the locality rather than the format - so no benefit from create_x
over an x
fixture
(then we can all have a separate discussion about locality vs generality without holding this work up)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should I then leave it as is or rewrite it so that the data is created in each individual test? None of these actually need all of data, dimensions and coordinates as subclass instances. Actually, they are almost the same as the ones in the original PR. I would be happy with both, though I do think rewriting may actually make the tests a little bit easier to understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming it's as straight swap between create_X
and a fixture X
(i.e. no change to the locality), I think they would be better as pytest fixtures. They're acceptable as either, though.
@shoyer what are your thoughts about the case at hand?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I meant was that at the moment I have a single 2d array subclass as data and both dimensions and an extra coordinate are also subclass instances. The dimensions (and the coordinate) only matter in the tests they are used in, so most of the data arrays used could be in place created 1d arrays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I modified the tests to create the data in place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK great! Cheers @keewis
Does pint do We could add the experimental option for allowing subclasses for testing purposes, but ultimately I think we want to insist that objects we use inside I'd love to add tests that verify that xarray can properly wrap arrays with units as soon as possible. We could even merge tests that are all marked "xfail" and that currently only pass when using the experimental version of pint from hgrecco/pint#764. As we saw with sparse array support, this will be useful for two reasons:
|
Yeah, |
Do you have a rough idea of which operations these tests should check? The ones currently implemented are |
This is actually something that I've been working towards recently, but I've ran into some delays. I'm still working on the pint side of things since there is a fair number of numpy functions not yet implemented and tested in hgrecco/pint#764. @keewis Would you want to take the lead on these xarray + pint with |
I think I would like to tackle this, but I definitely will need some help with finding examples for some of the more advanced operations. @shoyer should I add these in a new PR or should I repurpose this one? |
You are welcome to continue here or in a new PR, whichever you prefer. I don't have a strong opinion, please do whichever seems easier for you. I think the best examples of complex operations to check for compatibility can be found in |
then I would like to finish this one and modify the tests in a new one. |
@shoyer: What kind of testing would that be? If this is something that would be removed again in the future and has no actual value, feel free to close (but don't forget the original PR). Otherwise, I'd be happy to continue working on this. |
Since we have been making a lot of progress using |
This picks up the work done in #1118, plus converting the
unittest
based tests to
pytest
(since that seems to be used in the othertests).
closes #1118
Some of the suggestions there are not implemented yet:
ndarray
subclass is usedxarray.concat
. This does not work with any of the unit libraries (at least,pint
,astropy.units
,unyt
andquantities
don't support it yet).astropy.units
orunyt
)test_units_subclass.py
NumpyIndexingAdapter
)The requirements files for CI changed since then, so maybe more than
the two right now need the
quantities
entry.