Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for numpy arrays (and dicts) to approx. #2492

Merged
merged 13 commits into from
Jul 6, 2017

Conversation

kalekundert
Copy link
Contributor

This pull request addresses #1994 and adds support for numpy arrays to approx(). It also adds support for dicts (and other mapping types), because doing so was low hanging fruit and I thought the old behavior (accept dicts but only compare their keys) was potentially surprising.

Adding support for numpy arrays ended up requiring major refactoring of the approx code, due to some intricacies of whether the __eq__ operator is called on the left or right operand, and how that can be affected by inheritance. But the external interface is unchanged, the new code is all documented, and everything is tested.

At the moment, this code uses the same algorithm to compare numpy arrays as it does to compare anything else. However, @RonnyPfannschmidt may be right that it would be better to defer to numpy.allclose(), as he was commenting on the thread for #1994. It wouldn't be hard to get either behavior, but I think it's worth trying to come to a consensus on.

I don't really think it matters. The two algorithms will only give different answers if the difference between the two numbers being compared is extremely close to the tolerance. If those fine differences matter to you, both algorithms allow fine control over the precise tolerance. If they don't, both algorithms have defaults that can reasonably say whether two numbers are close enough to be considered the same or not. The advantage of using the pytest algorithm is that it allows us to print out the tolerance in the repr string, which I find useful for debugging.

This fixes pytest-dev#1994.  It turned out to require a lot of refactoring because
subclassing numpy.ndarray was necessary to coerce python into calling
the right `__eq__` operator.
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.2%) to 91.919% when pulling 89292f0 on kalekundert:features into 9bd8907 on pytest-dev:features.

Copy link
Member

@RonnyPfannschmidt RonnyPfannschmidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this change also begs fo additional documentation,
its not necessary to push trough in this pr, but it would be good to have it before the next release



try:
import numpy as np
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a module level always imposed import of numy doesn't sit well with me, please discuss @nicoddemus

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it means every time numpy is installed, it will be imported for a pytest run - no matter whether the tests actually use it or not

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, we should try to avoid importing non-standard library at the module level.

Since we are ApproxNumpy subclasses np.ndarray (for reasons well explained in the docstring 👍 ), one way to implement this cleanly would be to move ApproxNumpy to its own module, and import it conditionally in approx.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point, I'll try to avoid importing numpy unless I really need it.

Another possibility might be to construct a class on the fly using type(). That would be a little more magical, but it has the advantage of keeping all the code in one file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can construct a class in a function and return it - combine that with a memoize pattern and its good to go without type() usage

Copy link
Member

@nicoddemus nicoddemus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall awesome work @kalekundert!



try:
import numpy as np
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, we should try to avoid importing non-standard library at the module level.

Since we are ApproxNumpy subclasses np.ndarray (for reasons well explained in the docstring 👍 ), one way to implement this cleanly would be to move ApproxNumpy to its own module, and import it conditionally in approx.

return iter(self.expected)

def _yield_comparisons(self, actual):
return zip(actual, self.expected)
Copy link
Member

@nicoddemus nicoddemus Jun 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be itertools.izip in Python 2. Please add something like this to _pytest.compat:

if _PY2:
    zip = itertools.izip

And then import it here:

from _pytest.compat import zip

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.


from collections import Mapping, Sequence
try:
String = basestring # python2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use STRING_TYPES from _pytest.compat instead

@@ -49,6 +310,8 @@ class approx(object):

>>> (0.1 + 0.2, 0.2 + 0.4) == approx((0.3, 0.6))
True
>>> {'a': 0.1 + 0.2, 'b': 0.2 + 0.4} == approx({'a': 0.3, 'b': 0.6})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I think it's worth emphasizing that it will compare only the values

@@ -49,6 +310,8 @@ class approx(object):

>>> (0.1 + 0.2, 0.2 + 0.4) == approx((0.3, 0.6))
True
>>> {'a': 0.1 + 0.2, 'b': 0.2 + 0.4} == approx({'a': 0.3, 'b': 0.6})
True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also please mention that it also supports numpy arrays

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about:

     The same syntax also works on sequences of numbers::
  
          >>> (0.1 + 0.2, 0.2 + 0.4) == approx((0.3, 0.6))
          True

     Dictionary *values*::    

          >>> {'a': 0.1 + 0.2, 'b': 0.2 + 0.4} == approx({'a': 0.3, 'b': 0.6})
          True

     And ``numpy`` arrays::
          <numpy array example>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good to me.


def test_numpy_array(self):
try:
import numpy as np
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use pytest.importorskip instead

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I think we should add numpy as a tox dependency to run the tests in tox.ini.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good to know. I always learn something about using pytest whenever I sumbit a pull request!


def test_numpy_array_wrong_shape(self):
try:
import numpy as np
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto for pytest.importorskip

"""
Perform approximate comparisons for numpy arrays.

This class must inherit from numpy.ndarray in order to allow the approx
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this, @kalekundert. I'll make some comments because I'm interested in using this :)

Nice documentation here, my gut reaction was "why inheritance when composition would be simpler", but this makes it clear that it's because approx also supports the syntax assert a == approx(b).

Do you think it's too late to break the approx interface and only allow assert approx(a) == b, since this could simplify the implementations a lot?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. I think it is reasonable to document to users that approx should always be on the left side.

Btw, this wouldn't actually break anything because numpy arrays are not currently supported anyway. 😉

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, but it would be inconsistent with approx usage for simple numbers and lists, since it currently supports both sides (it is indeed documented as being used on the right side (assert a == approx(b))).

Copy link
Member

@tadeu tadeu Jun 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about this again... the problem on forcing it to be on the left-side is that it would be weird to use the standard assert obtained == expected, since the "approximation" is usually done on the "expected" value, i.e., this makes much more sense and is easier to read:

assert calculate_something() == approx(5.0, rtol=1e-3)  # must be in range 5.0 +- 0.005

So it seems ok to complicate the implementation a little...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is reasonable to document to users that approx should always be on the left side.

Wait sorry, brain fart on my part, I totally meant on the right side:

assert f(x) == approx(3.0)

Which I think is the more natural way to write it, as it even reads well in English ("f(x) is approximately 3.0").

So I guess it is important to support this feature. Sorry for the noise.

ensured by the approx() delegator function.
"""
assert isinstance(expected, np.ndarray)
obj = super(ApproxNumpy, cls).__new__(cls, expected.shape)
Copy link
Member

@tadeu tadeu Jun 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this creating a copy of the array? Or allocating an array unnecessarily?

There's a "view casting" operation that could be helpful on eliminating this copy: https://docs.scipy.org/doc/numpy-1.12.0/user/basics.subclassing.html#view-casting

But maybe the shape here could be () (and the real shape kept on an attribute such as self.shape), since it seems the inheritance is being done only to make __eq__ work, as the real array is on ApproxBase.expected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I didn't think about this code too much when I wrote it. I'll check out view-casting, but my initial impression is that just passing () as the shape will be the simplest and least error-prone approach.


def __eq__(self, actual):
try:
actual = np.array(actual)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np.asarray should be more efficient if the argument is an array already


def __repr__(self):
item = lambda k, v: "'{0}': {1}".format(k, self._approx_scalar(v))
return '{' + ', '.join(item(k,v) for k,v in self.expected.items()) + '}'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this code seems to be reimplementing dict repr. What about:

    def __repr__(self):
        return repr({k: self._approx_scalar(v)} for k, v in self.expected.items())

Would it work?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's a good idea to imitate a dict __repr__, or a list/tuple below - after all, those aren't dicts/lists/tuples, so if they act like they are, that can be pretty confusing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approx __repr__ is still distinguishable from the dict/list/tuple __repr__ because tolerances are included with each value. In other words, repr(approx(1.0)) will look like {'a': 1.0 ± 1e-6} rather than {'a': 1.0}.

Also, the reason I tried to mimic the format of the built-in data structures in the first place is that when I get assertion failures, I find it easier to spot the important differences if the actual and expected values are formatted the same. If they're formatted differently (i.e. one has braces and the other doesn't), my eye immediately goes to those differences first. So I think the braces improve clarity more than they might detract from it.


assert a12 != approx(a21)
assert a21 != approx(a12)

Copy link
Member

@tadeu tadeu Jun 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is a feature request 😄 , but it would be nice to have tests and documented behaviour for NaN's too, something like:

assert np.nan == approx(np.nan)
assert np.inf == approx(np.inf)
assert -np.inf == approx(-np.inf)
assert np.array([np.nan, np.inf, -np.inf]) == approx(np.array([np.nan, np.inf, -np.inf]))
assert np.nan != approx(np.inf)
assert np.inf != approx(-np.inf)
assert np.nan != approx(0.0)

Some applications of arrays rely on using NaN for missing values, so even though the default behaviour of NaN != NaN, it'd be nice to be able to test that the missing values are the same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are tests and documented behavior for float('inf') and float('nan'). Is that what you mean, or are you suggesting that np.inf and np.nan behave differently and should be tested/documented separately?

I like the idea of having an option to consider NaNs equal to each other, and it should be easy to implement.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that what you mean, or are you suggesting that np.inf and np.nan behave differently and should be tested/documented separately?

Ahh, no, I've used np.nan because I'm more used to that, but float('inf') and float('nan') should work in the same way, perhaps even better because they don't need numpy.

I like the idea of having an option to consider NaNs equal to each other, and it should be easy to implement.

Yeah, this was the main suggestion, and to test Numpy arrays with these values too. :)


def __repr__(self):
open, close = '()' if isinstance(self.expected, tuple) else '[]'
return open + ApproxBase.__repr__(self) + close
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Watch out because tuples of one element should end with a ,, like (1.0,).
Suggestion:

return type(self.expected)(self._approx_scalar(x) for x in self._yield_expected())

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A custom sequence can have an __init__ you know nothing about - this will break for any custom sequence which has an __init__ which doesn't work like that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about this:

    def __repr__(self):
        seq_type = type(self.expected)
        if seq_type not in (tuple, list, set):
            seq_type = list
        return repr(seq_type(self._approx_scalar(x) for x in self._yield_expected()))

def _approx_scalar(self, x):
return ApproxScalar(x, rel=self.rel, abs=self.abs)

def _yield_expected(self, actual):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be unnecessary, most use cases are just iterating on self.expected, and the actual parameter is not present in the signature of the implementations in the derived classes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, copy-and-paste bug.

- Avoid importing numpy unless necessary.

- Mention numpy arrays and dictionaries in the docs.

- Add numpy to the list of tox dependencies.

- Don't unnecessarily copy arrays or allocate empty space for them.

- Use code from compat.py rather than writing py2/3 versions of things
  myself.

- Avoid reimplementing __repr__ for built-in types.

- Add an option to consider NaN == NaN, because sometimes people use NaN
  to mean "missing data".
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.03%) to 92.101% when pulling 8badb47 on kalekundert:features into 9bd8907 on pytest-dev:features.

It used to be a class, but it's a function now.
They seem like more trouble that they're worth.
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.03%) to 92.101% when pulling 5076955 on kalekundert:features into 9bd8907 on pytest-dev:features.

Travis was not successfully installing numpy with python<=2.6,
python<=3.3, or PyPy.  I decided that it didn't make sense to use numpy
for all the tests, so instead I made new testing environments
specifically for numpy.
Not compatible with python26.
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.2%) to 91.917% when pulling 5d24968 on kalekundert:features into 9bd8907 on pytest-dev:features.

I thought the file was just out of date, but adding py36 made Travis
complain "InterpreterNotFound: python3.6", so I guess it was correct as
it was.
return repr({
k: self._approx_scalar(v)
for k,v in self.expected.items()})
return repr(dict(
Copy link
Member

@RonnyPfannschmidt RonnyPfannschmidt Jun 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i propose using return "approx({data!r})".format(data=dict(...))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.2%) to 91.917% when pulling 9597e67 on kalekundert:features into 9bd8907 on pytest-dev:features.

# Create the delegate class on the fly. This allow us to inherit from
# ``np.ndarray`` while still not importing numpy unless we need to.
import numpy as np
cls = type('ApproxNumpy', (ApproxNumpyBase, np.ndarray), {})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

each use with numpy creates a new type here, i suggest a inner function with a mutable default argument to cache it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I added a class method to take care of creating and caching the new type. I think it made the code a little more clear, too.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.2%) to 91.886% when pulling c111e9d on kalekundert:features into 9bd8907 on pytest-dev:features.

@kalekundert
Copy link
Contributor Author

I think this is ready to be merged. Let me know if there's anything else that could be improved!

Slightly more general, probably doesn't make a difference.
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.2%) to 91.886% when pulling 7a1a439 on kalekundert:features into 9bd8907 on pytest-dev:features.

@RonnyPfannschmidt RonnyPfannschmidt merged commit ef62b86 into pytest-dev:features Jul 6, 2017
@kalekundert kalekundert deleted the features branch July 22, 2017 17:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants