Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added exception and unittest for dist_mat_to_vec of analysis/psa.py (Part of #597) #1183

Merged
merged 15 commits into from
Feb 4, 2017

Conversation

vedantrathore
Copy link
Contributor

@vedantrathore vedantrathore commented Jan 22, 2017

Part of #597

Changes made in this Pull Request:

  • Added unittest for validating errors
  • Added test for raising exceptions

PR Checklist

  • Tests?
  • Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

@vedantrathore
Copy link
Contributor Author

@kain88-de @jbarnoud can anyone please look into it?

Copy link
Contributor

@jbarnoud jbarnoud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for tackling the issue. There are some changes to make, though.

In addition to the in-line comments, there are a few things that I would like the tests to look at:

  • dist_mat_to_vec issue a warning in some cases, this should be tested; look at assert_warns as it may help you to do so;
  • what if one of the parameter is negative?
  • what if N is null?

Testing the edge cases is very important. They are what breaks first, they also what can be broken in a first place. Therefore, testing them is crucial to find potential unexpected outputs.

def test_dist_mat_to_vec_i_less_j(self):
"""Test that distance is correct if i < j"""
err_msg = "dist_mat_to_vec function returning wrong values"
assert_(PSA.dist_mat_to_vec(5, 3, 4) is 9, err_msg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The is statement tests identity, not equality. In the present case, you should use the == comparison; even better in that specific context would be to use assert_equal rather than assert_. With assert_equal the traceback is more informative as it tells you what value you got and what value you expected.

assert_(PSA.dist_mat_to_vec(5, 3, 4) is 9, err_msg)

def test_dist_mat_to_vec_i_greater_j(self):
"""Test that distance is correct if i < j"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy-paste error. You forgot to reverse the comparison.

@@ -93,6 +93,16 @@ def test_dendrogram_produced(self):
err_msg = "Dendrogram dictionary object was not produced"
assert_(type(self.plot_data[1]) is dict, err_msg)

def test_dist_mat_to_vec_i_less_j(self):
"""Test that distance is correct if i < j"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is misleading: dist_mat_to_vec does not return a distance, but an index. The docstring of the function says:

Convert distance matrix indices (in the upper triangle) to the index of
the corresponding distance vector.

out of bounds of N"""

with self.assertRaises(ValueError):
PSA.dist_mat_to_vec(5, 6, 6)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, add cases with only i out of bonds, and with only j out of bonds.

not Integers"""

with self.assertRaises(ValueError):
PSA.dist_mat_to_vec(5, '6', '7')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if i, or j is a float?

@vedantrathore
Copy link
Contributor Author

@jbarnoud Thanks I'll look into it

@vedantrathore
Copy link
Contributor Author

@jbarnoud Is this okay? do tell me if something is wrong

Copy link
Contributor

@jbarnoud jbarnoud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better, but not all the test cases are addressed.

out of bounds of N"""

with self.assertRaises(ValueError):
PSA.dist_mat_to_vec(5, 6, 4)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have each of these line in a different with block, or in a different test. The way it is now makes it difficult to now which one failed if there is a failure.

@@ -138,7 +139,7 @@ def test_dist_mat_to_vec_func_bad_integers(self):

with self.assertRaises(ValueError):
PSA.dist_mat_to_vec(5, '6', '7')

PSA.dist_mat_to_vec(5, float(6), 7)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above: separate the assertions.

out of bounds of N"""

with self.assertRaises(ValueError):
PSA.dist_mat_to_vec(5, 6, 4)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment to tell what index is out of bonds. It is obvious when you are in the context of writing the test, but it makes you loose quite a lot of time when you have to figure it out latter on.

@vedantrathore
Copy link
Contributor Author

@jbarnoud is it okay now?

@jbarnoud
Copy link
Contributor

It is OK for the in-line comments. I would like to see the edge cases I mentioned tested, though. Also, I would add one edge case to the list I made before: what if i == j?

@vedantrathore
Copy link
Contributor Author

@jbarnoud I think all the test cases are covered now?

@jbarnoud
Copy link
Contributor

I do not see those corresponding to this comment: #1183 (review)

@vedantrathore
Copy link
Contributor Author

@jbarnoud for the null and negative part, the function dist_mat_to_vec is not validating the inputs, so should I add a validation statement inside the function that raises ValueError when inputs is null or negative?

@jbarnoud
Copy link
Contributor

It is not an issue if i or j equals 0. Yet, there seems to be edge cases where the function is not defined and that are not accounted for:

  • i == j where the function is not defined
  • i < 0 or j < 0

@sseyler I think you wrote the function, did you omit these cases on purpose? Do you think we should add the checks?

@sseyler
Copy link
Contributor

sseyler commented Jan 25, 2017

@jbarnoud The couple of tests I included were there to quickly capture a couple of the possible erroneous inputs; they're definitely not exhaustive. I think you covered most of the corner cases, but to summarize (just in case), acceptable values of N, i, and j should be:

  • N ,i, and j integers
  • i > 0 and i < N-1 (i runs from 0 to N-2)
  • j > i and j < N (j runs from 1 to N-1)
  • N >= 2 (N is the length of a square distance matrix, so a distance vector is only defined when there are at least N = 2 objects to compare)

Assuming I haven't missed anything, all other values of N, i, and j don't make sense in this context. (So definitely unit test any cases falling outside of the above restrictions!)

By the way, thanks for the work on this, guys!

@vedantrathore
Copy link
Contributor Author

so @jbarnoud should make any changes or just rebase my commits?

@jbarnoud
Copy link
Contributor

@vedantrathore Well, writing the tests highlighted an issue with the code you were testing. Since the issue is small and easy to fix, it would be good to fix it in the same PR.

@vedantrathore
Copy link
Contributor Author

@jbarnoud do tell me if I added a wrong function. I think this function is already tested.

try:
validate_dist_mat_to_vec_inputs(N, i, j)
except Exception as E:
raise E
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't want to catch the exception and to some special error handling with it then you don't need the try/except block. You can remove it here.

:param N: int
:param i: int
:param j: int
:return: null
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We follow numpy style docs in new code. You can have a look into other files for the look.

iValue = int(i)
jValue = int(j)
except ValueError:
raise ValueError
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Integer evaluation isn't done like that. Here we just check if the input can be converted to an integer. So '1' would still be a valid input here with this code. The correct test is isinstance(var, [class]). So here it would be

if not (isinstance(N, int) or isinstance(i, int) or isinstance(j, int)):
    raise ValueError("N,i,j all must be of type int")

Here you should also check if this still works with numpy integer types.

@jbarnoud
Copy link
Contributor

@vedantrathore I think you went a bit overkill here. A couple of elifs in the existing chain would probably suffice.

The case where i == j is already accounted for as it falls in the else. The cases where i, and j are too large are tested, but the tests are wrong (i == N should not be valid). What is missing are the case where i or j is negative, the case where N < 2, and the case where one value is not an integer. I would skip the type checking as we never check types elsewhere.

Having a function to test the inputs of the function is not a bad idea. It can make the code clearer as testing the input is a different task as using it. If you choose to go for the separate function, then the validation function should do all the validation, leaving only the logic to the main function. Since the main function we have is quite specific in its error messages, the validation function should keep the specificity; this means at the very least raising the exception with an explanation. As a general rule, giving a explanation when raising an exception is rarely a bad idea. Finally, but this is just a matter of style and taste, I would prefix the name of the validation function with a _ to signal it is a "private" function that is not exposed in the API.

And just a last thing: playing a bit with dist_mat_to_vec, I noticed that there is a tiny bug in it. Line 586 from the version in develop is a print but should be a warning issued with warnings.warn (or should there be a message at all?). Also, the error message is missing the call to format to replace the placeholders with the values of the index. So line 586 should be:

warnings.warn(war_str.format(j, i))

This is why we need tests! Adding coverage for an handful of lines of code opened a can of worms. A small can, but a van still. @vedantrathore you picked well your function to test 😃

I let you come up with your solution. I have one in mind if you are stuck.

@vedantrathore
Copy link
Contributor Author

@jbarnoud @kain88-de Thanks for the reviews, I'll take a look into it and rectify the error 😃

@vedantrathore
Copy link
Contributor Author

@jbarnoud Did I do it right? or should change something.

with self.assertRaises(ValueError):
PSA.dist_mat_to_vec(5, -1, 2)

# Check if i is negative
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy paste error. "Check if j is negative.

if i < 0 or j < 0 or N < 2:
error_str = "Matrix indices are invalid; i and j must be greater than 0 and N must be greater the 2"
raise ValueError(error_str)

if i > N or j > N:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to @sseyler comment, this test should be i > N - 1 or j > N. Yet, because the function knows how to look at the lower triangle too, it should be (j > i and (i > N - 1 or j > N)) or (j < i and (i > N or j > N - 1)).

Copy link
Contributor Author

@vedantrathore vedantrathore Feb 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbarnoud, but this doesn't include the case in which N > 2, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also the case where i or j is negative.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am only talking about replacing the line that says if i > N or j > N:. The value of N, and i and j being negative, are tested with the previous if.

@jbarnoud
Copy link
Contributor

jbarnoud commented Feb 1, 2017

You are almost there! Beside the typo I pointed in an in-line comment, you have two things to do before I can merge:

  • The line that tests if the index are in bound is wrong. It would be nice to fix it (see in-line comment) and to account for the change in the tests.
  • You need to update package/CHANGELOG to say that the argument validation of dist_mat_to_vec is fixed (refer to this PR); you also need to add yourself as an author in package/CHANGELOG, package/AUTHORS, testsuite/CHANGELOG (no need to add an entry, though), and testsuite/AUTHORS.

Once this is fixed, I think it is all good (assuming the tests pass of course).

@vedantrathore
Copy link
Contributor Author

@kain88-de are all the requested changes completed? If so could you please merge this.

@@ -573,7 +574,15 @@ def dist_mat_to_vec(N, i, j):
:Returns:
int, index (of the matrix element) in the corresponding distance vector
"""
if i > N or j > N:

if not (isinstance(N, int) or isinstance(i, int) or isinstance(j, int)):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this still work with numpy integers? So you pass np.int32(3) or np.int64(3) to the function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kain88-de Okay, thanks! I'll look into it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kain88-de should I add diffrent clauses fr np.int32() and np.int64() and normal int?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One solution is to use isinstance(N, numbers.Integral).

@vedantrathore
Copy link
Contributor Author

@kain88-de Is this fine?

@@ -575,7 +575,7 @@ def dist_mat_to_vec(N, i, j):
int, index (of the matrix element) in the corresponding distance vector
"""

if not (isinstance(N, int) or isinstance(i, int) or isinstance(j, int)):
if not (isinstance(N, numbers.Integral) or isinstance(i, numbers.Integral) or isinstance(j, numbers.Integral)):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should add tests as well for the numpy integers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh forgot that, my bad.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kain88-de I was thinking about how to check whether input is numpy integer or not, I think we are only checking whether the input is not an integer, How should I validate whether the input is valid integer?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current check is testing if the input is an integer. In you test you should just make sure that it's ok to use numpy integers like np.int32 in the function. That is important if one wants to use integers from numpy arrays.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What @kain88-de means is that you should add a few lines in test_psa.py with numpy integer as inputs. The function should work of we give N=numpy.int32(4) or N=numpy.int16(4). and this has to be tested. It is likely that the input values for dist_mat_to_vec will come from a numpy array at some point, then their type will not be a plain int.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kain88-de @jbarnoud I think this is what you meant, right?

@vedantrathore
Copy link
Contributor Author

@kain88-de I think It's done now, all the tests are passed.

@kain88-de kain88-de merged commit da70c1f into MDAnalysis:develop Feb 4, 2017
@kain88-de
Copy link
Member

Thanks

@vedantrathore
Copy link
Contributor Author

@jbarnoud @kain88-de I would like to complete this as my gsoc project. Any guidance on how to start with it?

@kain88-de
Copy link
Member

You can write us on the mailing list or the referenced bug to discuss this idea.

@orbeckst
Copy link
Member

orbeckst commented Feb 7, 2017

This merge introduced a ton of commits with identical messages and made the history look ugly/confusing. @kain88-de when you merge a PR like this the next time please squash/merge everything or ask for it to be rebased/squashed by the contributor. Thanks.

@MDAnalysis/coredevs should we rebase develop and squash all of the commits belonging to this PR, the force push? (In order to do this, branch protection has to be temporarily lifted.)

@jbarnoud
Copy link
Contributor

jbarnoud commented Feb 7, 2017

I would avoid force pushing on develop. Even though there are several commits with identical titles, at least they are identical, grouped in time, and meaningful. IMHO it is not harmful enough in the history to create a precedent of lifting the branch protection.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants