Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tests are brittle. Use the concept of in_delta #565

Closed
martinfrances107 opened this issue Dec 8, 2020 · 7 comments
Closed

Tests are brittle. Use the concept of in_delta #565

martinfrances107 opened this issue Dec 8, 2020 · 7 comments

Comments

@martinfrances107
Copy link
Contributor

martinfrances107 commented Dec 8, 2020

When I say brittle ... I mean for example

look at

    fn test_rotate_line() {
        let line0 = Line::from([(0., 0.), (0., 2.)]);
        let line1 = Line::from([(1., 0.9999999999999999), (-1., 1.)]);
        assert_eq!(line0.rotate(90.), line1);
    }

the 0.9999999999999999 is a hard coded rounding artifact -- the expected value is 1.0.

numerically two numbers x0 and x1 are considered equal if

(x0 - x1).abs() < T::EPSILON

I am coming from a related d3 "geo-spatial" library and they have a in_delta concept.
d3-geo the in_delta() function

In short a test helper function which compares objects on a point by point basis and asserts that two objects are identical if every thing is "in delta"

Anyway I am going to work on this.

@martinfrances107 martinfrances107 changed the title Tests are brittle. use the concept of in_delta Tests are brittle. Use the concept of in_delta Dec 8, 2020
@michaelkirk
Copy link
Member

michaelkirk commented Dec 8, 2020

Most places we try to use the approx crate when comparing floats - I'd recommend basing any approach on that for uniformity.

(as a side note, I think most of the time we actually want a "relatively equals" check vs a "within an epsilon" check, but that's somewhat pedantic.)

I haven't looked to see if it's possible - but I personally think it'd be ideal to be able to say something like:

    fn test_rotate_line() {
        let line0 = Line::from([(0., 0.), (0., 2.)]);
        let line1 = Line::from([(1., 1.0), (-1., 1.)]);
        approx::assert_relative_eq!(line0.rotate(90.), line1);
    }

Maybe we can implement the approx traits for geo? e.g. https://docs.rs/approx/0.4.0/approx/trait.RelativeEq.html

edited: clarity

@martinfrances107
Copy link
Contributor Author

Thank you for shaping my thinking, the comments above were really useful -- and save a lots of time for me.

I have the start of solution pushed into a branch in my area

abs_diff_eq

-- so not ready for a PR yet.

But I am far enough alone to know that I need to discuss a major breaking change.

First a minor comment

'relative equal' is for dissimilar type like i32 and f64.

What we need is AbsDiffEq - the approximate equality check for identical types.

For Point the solution looks like

p1.abs_diff_eq(&p2)

here is a fragment of the doc test in my branch

    /// use geo_types::Point;
    /// use approx::AbsDiffEq;
    /// let p = Point::new(1.0, 1.0);
    /// let p_x = Point::new(0.999999, 1.0);
    /// let p_y = Point::new(1.0, 1.00001);
    /// assert!(p.abs_diff_eq(&p_x, 1e-2));
    /// assert!(!p.abs_diff_eq(&p_x, 1e-12));

I need to signal a major breaking change for all the other geometry types Line, LineString, Polygon etc

There is a security issue we need to be aware of .. Any implementation of a trait need to be
A) in the module defining the trait ( in this case the abs_diff_eq trait is in approx )
or
B) in the module holding the struct ... ( in this case Point, Line, LineString are in geo-types. )

for security in cannot be defined in a third module C)

as you can see from my version of Line - below what this mean is that I need a breaking change that is to move the coords_iter() trait from the geo module into the geo-types module - to meet that security concern and so I do a zip which will be used to compare an object point by poiint

impl<T> AbsDiffEq for Line<T>
where
    T: CoordinateType + Float,
{
    type Epsilon = T;

    #[inline]
    fn default_epsilon() -> Self::Epsilon {
        T::from(1e-6).unwrap()
    }

    #[inline]
    fn abs_diff_eq(&self, other: &Line<T>, epsilon: Self::Epsilon) -> bool {
        let z = (*self).coords_iter().zip(*other.coords_iter());
        let mut is_match = true;
        for a_b in z {
            let (a, b) = a_b;
            if (a.x - b.x).abs() >= epsilon || (a.y - b.y).abs() >= epsilon {
                println!("Fail {:?} {:?}", a, b);
                is_match = false;
                break;
            }
        }
        is_match
    }
}

@michaelkirk
Copy link
Member

michaelkirk commented Dec 9, 2020

'relative equal' is for dissimilar type like i32 and f64.

My understanding is that "relative equality" aims to solve the repeated problem of "Which epsilon should I choose?".

diff < T::EPSILON makes sense for some numbers, but for very large and for very small numbers it's not a reasonable choice.

I really enjoyed this https://randomascii.wordpress.com/2012/02/25/comparing-floating-point-numbers-2012-edition/ which I think uses the same terminology.

Here's the relevant excerpt:

Epsilon comparisons

If comparing floats for equality is a bad idea then how about checking whether their difference is within some error bounds or epsilon value, like this:

bool isEqual = fabs(f1 – f2) <= epsilon;

With this calculation we can express the concept of two floats being close enough that we want to consider them to be equal. But what value should we use for epsilon?

Given our experimentation above we might be tempted to use the error in our sum, which was about 1.19e-7f. In fact, there’s even a define in float.h with that exact value, and it’s called FLT_EPSILON.

Clearly that’s it. The header file gods have spoken and FLT_EPSILON is the one true epsilon!

Except that that is rubbish. For numbers between 1.0 and 2.0 FLT_EPSILON represents the difference between adjacent floats. For numbers smaller than 1.0 an epsilon of FLT_EPSILON quickly becomes too large, and with small enough numbers FLT_EPSILON may be bigger than the numbers you are comparing (a variant of this led to a flaky Chromium test)!

For numbers larger than 2.0 the gap between floats grows larger and if you compare floats using FLT_EPSILON then you are just doing a more-expensive and less-obvious equality check. That is, if two floats greater than 2.0 are not the same then their difference is guaranteed to be greater than FLT_EPSILON. For numbers above 16777216 the appropriate epsilon to use for floats is actually greater than one, and a comparison using FLT_EPSILON just makes you look foolish. We don’t want that.

...

Relative epsilon comparisons

The idea of a relative epsilon comparison is to find the difference between the two numbers, and see how big it is compared to their magnitudes. In order to get consistent results you should always compare the difference to the larger of the two numbers. In English:

To compare f1 and f2 calculate diff = fabs(f1-f2). If diff is smaller than n% of max(abs(f1),abs(f2)) then f1 and f2 can be considered equal.

so tldr; the epsilon is chosen relative to the size of the values you're comparing.

I need to signal a major breaking change for all the other geometry types Line, LineString, Polygon etc
... to move the coords_iter() trait from the geo module into the geo-types module

Hmm... that might be possible. But how heinous would it be to instead implement the equality traits without using coords-iter?

Also, I wonder if it's worth putting the approx-equality impls behind a feature flag since we try to keep geo-types super svelte by default.

@martinfrances107
Copy link
Contributor Author

Hmm... that might be possible. But how heinous would it be to instead implement the equality traits without using coords-iter?

I have found a less "bull in a china shop approach" .. which will eventually look like succinct library grade code..

This is the point where I shut up and implement it.

@martinfrances107
Copy link
Contributor Author

Thanks for correcting me and providing a common set of definitions.. That really help michaelkirk++

I am about to upload and submit a PR.

This is more inline with library grade code. In short I started over.

To balance effort with usefulness I have only implemented 'relative_eq' for a few of the 10 geometry types.
Point, MultiPoint, Line, LineString. As more are need I am happy to add more.

In terms of review .. I mostly copied the algorithm for Points::relative_eq() from approx's default reference implementation. I used the euclidean distance .. but I considered using the simpler Manhattan distance.

An overview of the PR

  1. The reference implementation of relative_eq() makes use of the method abs_diff_eq() that is why it is has to be included.
  2. It handles points with coordinates of f64:INFINITY.
  3. When comparing data structure on a point-by-point basis tests are included to ensure the rejection data structures with dissimilar lengths. Given the way zipping works .. they would otherwise be accepted!

@martinfrances107
Copy link
Contributor Author

Ok here is the PR

#567

@michaelkirk
Copy link
Member

Done in #567, using approx and relative_eq

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants