Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a convenience function for floating-point comparisons #1441

Merged
merged 16 commits into from
Mar 15, 2016

Conversation

kalekundert
Copy link
Contributor

This pull request adds a convenience class to assert that two floating-point numbers (or two sets of numbers) are equal to each other within some margin. You give the class the number you're expecting to get, and the class overrides the "==" operator to do the comparison behind the scenes with a reasonable margin for error. Most unit testing frameworks have some sort of assertAlmostEqual() function to provide this functionality, but to my knowledge pytest doesn't have anything of the sort. I think this pull request provides a useful feature in a way that dovetails nicely with the way pytest deconstructs assert statements.

Tests and documentation are included. I ran all the tests on python2.7 and python3.4, but I'm a little worried that the way I use unicode in approx.__repr__ might not work in python3.2. I copied most of the new documentation below for convenience:

Due to the intricacies of floating-point arithmetic, numbers that we would intuitively expect to be the same are not always so:

>>> 0.1 + 0.2 == 0.3
False

This problem is commonly encountered when writing tests, e.g. when making sure that floating-point values are what you expect them to be. One way to deal with this problem is to assert that two floating-point numbers are equal to within some appropriate margin:

>>> abs((0.1 + 0.2) - 0.3) < 1e-6
True

However, comparisons like this are tedious to write and difficult to understand. Furthermore, absolute comparisons like the one above are usually discouraged in favor of relative comparisons, which can't even be easily written on one line. The approx class provides a way to make floating-point comparisons that solves both these problems:

>>> from pytest import approx
>>> 0.1 + 0.2 == approx(0.3)
True

approx also makes is easy to compare ordered sets of numbers, which would otherwise be very tedious:

>>> (0.1 + 0.2, 0.2 + 0.4) == approx((0.3, 0.6))
True

By default, approx considers two numbers to be equal if the relative error between them is less than one part in a million (e.g. 1e-6). Relative error is defined as abs(x - a) / x where x is the value you're expecting and a is the value you're comparing to. This definition breaks down when the numbers being compared get very close to zero, so approx will also consider two numbers to be equal if the absolute difference between them is less than one part in a trillion (e.g. 1e-12).

This was a challenge because it had to work in python2 and python3,
which have almost opposite unicode models, and I couldn't use the six
library.  I'm also not sure the solution I found would work in python3
before python3.3, because I use the u'' string prefix which I think was
initially not part of python3.
@RonnyPfannschmidt
Copy link
Member

Overall nice work, ill take a deeper look tomorow

@The-Compiler
Copy link
Member

Some random thoughts:

  • approx is described as a function once, and as a class once
  • How does this compare to Python 3.5's math.isclose? If it doesn't use the same algorithm, maybe it should? I know a lot of thought and discussion went into that one 😉
  • Maybe it should have some more unittests (maybe inspired by the ones of math.isclose)?

@nicoddemus
Copy link
Member

Nice work, thanks for the PR! 😄


def __repr__(self):
from collections import Iterable
utf_8 = lambda s: s.encode('utf-8') if sys.version_info.major == 2 else s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this is not supported in py26: sys.version_info.major. You will have to use sys.version_info[0] == 2.

@tadeu
Copy link
Member

tadeu commented Mar 8, 2016

👍

@kalekundert
Copy link
Contributor Author

@The-Compiler Thanks for pointing out math.isclose, I didn't know about it. I did know about numpy.isclose, but for some reason I wasn't thinking about it when I wrote this code. So I just spent a few minutes comparing my class to those two functions. The basic algorithm is the same in all the cases, but there are a few meaningful differences:

Inf and NaN:

The standard library and numpy both handle these cases properly, but I don't. So I should fix that.

Default tolerances:
  • math.isclose(rel_tol=1e-9, abs_tol=0.0)
  • numpy.isclose(rtol=1e-5, atol=1e-8)
  • pytest.approx(rel=1e-6, abs=1e-12)

Seeing as how the standard library and numpy don't agree on the defaults, it's probably not that important. But I don't think it's a good idea for the default absolute tolerance to be 0.0. This means that math.isclose(x, 0.0) is the same as x == 0.0 by default, which strikes me as something that could be unpleasantly surprising. That behavior might be more correct in some mathematical sense, but it's probably less useful for testing. I could use the numpy defaults, which are a little bit more relaxed than the ones I made up.

Application of tolerances:
  • math.isclose: True if the relative tolerance is met w.r.t. either a or b or if the absolute tolerance is met.
  • numpy.isclose: Adds the relative tolerance w.r.t. b and the absolute tolerance. Asymmetric, because it doesn't consider the relative tolerance w.r.t. a.
  • pytest.approx: True if the relative tolerance is met w.r.t b or is the absolute tolerance is met. Asymmetric, but I think that makes sense for testing where b is the "expected value". In the special case that the user explicitly specifies an absolute tolerance but not a relative tolerance, only the absolute tolerance is considered.

I don't think these distinctions are really important, because you won't usually be close enough to the tolerance to notice the difference. That said, I like the way I did it the best. (I guess that's not surprising.) I like that if the user explicitly specifies an absolute tolerance, then that's exactly what gets used.

Overall I think I'll incorporate some ideas from the standard library and numpy, but I won't provide the exact same interface as either. I'll also steal some unit tests from the standard library, like you suggested, because they are much more thorough than the doctests I wrote.

@The-Compiler
Copy link
Member

Thanks for the explanations! I don't have any hard feelings either way, but it's good to know how it relates to the "previous art" 😄

@RonnyPfannschmidt
Copy link
Member

good research, i think its important to document those differences in a comprehensible manner

@nicoddemus
Copy link
Member

Excellent summary! I agree with @RonnyPfannschmidt, I would strongly encourage you to add it to the documentation.

This commit also:

- Dramatically increases the number of unit tests , mostly by borrowing
  from the standard  library's unit tests for math.isclose().

- Refactors approx() into two classes, one of which handles comparing
  individual numbers (ApproxNonIterable) and another which uses the
  first to compare individual numbers or sequences of numbers.
@kalekundert
Copy link
Contributor Author

I think this branch is ready to merge. Let me know what you think.

@@ -7,7 +7,8 @@
namespace in which your doctests run.
Thanks `@milliams`_ for the complete PR (`#1428`_).

*
* New ``approx()`` function for easily comparing floating-point numbers in
tests.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a "Thanks @kalekundert for the complete PR". 😄

@nicoddemus
Copy link
Member

Looks very good to me! 😁

@nicoddemus
Copy link
Member

I agree it seems ready to merge, after others take another look!

return ', '.join(repr(x) for x in self.expected)

def __eq__(self, actual):
from collections import Iterable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not do import collections at the top of the file instead?

@The-Compiler The-Compiler mentioned this pull request Aug 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants