-
Notifications
You must be signed in to change notification settings - Fork 661
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added exception and unittest for dist_mat_to_vec of analysis/psa.py (Part of #597) #1183
Conversation
@kain88-de @jbarnoud can anyone please look into it? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for tackling the issue. There are some changes to make, though.
In addition to the in-line comments, there are a few things that I would like the tests to look at:
dist_mat_to_vec
issue a warning in some cases, this should be tested; look atassert_warns
as it may help you to do so;- what if one of the parameter is negative?
- what if
N
is null?
Testing the edge cases is very important. They are what breaks first, they also what can be broken in a first place. Therefore, testing them is crucial to find potential unexpected outputs.
def test_dist_mat_to_vec_i_less_j(self): | ||
"""Test that distance is correct if i < j""" | ||
err_msg = "dist_mat_to_vec function returning wrong values" | ||
assert_(PSA.dist_mat_to_vec(5, 3, 4) is 9, err_msg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The is
statement tests identity, not equality. In the present case, you should use the ==
comparison; even better in that specific context would be to use assert_equal
rather than assert_
. With assert_equal
the traceback is more informative as it tells you what value you got and what value you expected.
assert_(PSA.dist_mat_to_vec(5, 3, 4) is 9, err_msg) | ||
|
||
def test_dist_mat_to_vec_i_greater_j(self): | ||
"""Test that distance is correct if i < j""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copy-paste error. You forgot to reverse the comparison.
@@ -93,6 +93,16 @@ def test_dendrogram_produced(self): | |||
err_msg = "Dendrogram dictionary object was not produced" | |||
assert_(type(self.plot_data[1]) is dict, err_msg) | |||
|
|||
def test_dist_mat_to_vec_i_less_j(self): | |||
"""Test that distance is correct if i < j""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment is misleading: dist_mat_to_vec
does not return a distance, but an index. The docstring of the function says:
Convert distance matrix indices (in the upper triangle) to the index of
the corresponding distance vector.
out of bounds of N""" | ||
|
||
with self.assertRaises(ValueError): | ||
PSA.dist_mat_to_vec(5, 6, 6) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, add cases with only i
out of bonds, and with only j
out of bonds.
not Integers""" | ||
|
||
with self.assertRaises(ValueError): | ||
PSA.dist_mat_to_vec(5, '6', '7') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if i
, or j
is a float?
@jbarnoud Thanks I'll look into it |
@jbarnoud Is this okay? do tell me if something is wrong |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is better, but not all the test cases are addressed.
out of bounds of N""" | ||
|
||
with self.assertRaises(ValueError): | ||
PSA.dist_mat_to_vec(5, 6, 4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have each of these line in a different with
block, or in a different test. The way it is now makes it difficult to now which one failed if there is a failure.
@@ -138,7 +139,7 @@ def test_dist_mat_to_vec_func_bad_integers(self): | |||
|
|||
with self.assertRaises(ValueError): | |||
PSA.dist_mat_to_vec(5, '6', '7') | |||
|
|||
PSA.dist_mat_to_vec(5, float(6), 7) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above: separate the assertions.
out of bounds of N""" | ||
|
||
with self.assertRaises(ValueError): | ||
PSA.dist_mat_to_vec(5, 6, 4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment to tell what index is out of bonds. It is obvious when you are in the context of writing the test, but it makes you loose quite a lot of time when you have to figure it out latter on.
@jbarnoud is it okay now? |
It is OK for the in-line comments. I would like to see the edge cases I mentioned tested, though. Also, I would add one edge case to the list I made before: what if |
@jbarnoud I think all the test cases are covered now? |
I do not see those corresponding to this comment: #1183 (review) |
@jbarnoud for the null and negative part, the function dist_mat_to_vec is not validating the inputs, so should I add a validation statement inside the function that raises ValueError when inputs is null or negative? |
It is not an issue if
@sseyler I think you wrote the function, did you omit these cases on purpose? Do you think we should add the checks? |
@jbarnoud The couple of tests I included were there to quickly capture a couple of the possible erroneous inputs; they're definitely not exhaustive. I think you covered most of the corner cases, but to summarize (just in case), acceptable values of
Assuming I haven't missed anything, all other values of By the way, thanks for the work on this, guys! |
so @jbarnoud should make any changes or just rebase my commits? |
@vedantrathore Well, writing the tests highlighted an issue with the code you were testing. Since the issue is small and easy to fix, it would be good to fix it in the same PR. |
@jbarnoud do tell me if I added a wrong function. I think this function is already tested. |
package/MDAnalysis/analysis/psa.py
Outdated
try: | ||
validate_dist_mat_to_vec_inputs(N, i, j) | ||
except Exception as E: | ||
raise E |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you don't want to catch the exception and to some special error handling with it then you don't need the try/except block. You can remove it here.
package/MDAnalysis/analysis/psa.py
Outdated
:param N: int | ||
:param i: int | ||
:param j: int | ||
:return: null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We follow numpy style docs in new code. You can have a look into other files for the look.
package/MDAnalysis/analysis/psa.py
Outdated
iValue = int(i) | ||
jValue = int(j) | ||
except ValueError: | ||
raise ValueError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Integer evaluation isn't done like that. Here we just check if the input can be converted to an integer. So '1'
would still be a valid input here with this code. The correct test is isinstance(var, [class])
. So here it would be
if not (isinstance(N, int) or isinstance(i, int) or isinstance(j, int)):
raise ValueError("N,i,j all must be of type int")
Here you should also check if this still works with numpy integer types.
@vedantrathore I think you went a bit overkill here. A couple of The case where Having a function to test the inputs of the function is not a bad idea. It can make the code clearer as testing the input is a different task as using it. If you choose to go for the separate function, then the validation function should do all the validation, leaving only the logic to the main function. Since the main function we have is quite specific in its error messages, the validation function should keep the specificity; this means at the very least raising the exception with an explanation. As a general rule, giving a explanation when raising an exception is rarely a bad idea. Finally, but this is just a matter of style and taste, I would prefix the name of the validation function with a And just a last thing: playing a bit with warnings.warn(war_str.format(j, i)) This is why we need tests! Adding coverage for an handful of lines of code opened a can of worms. A small can, but a van still. @vedantrathore you picked well your function to test 😃 I let you come up with your solution. I have one in mind if you are stuck. |
@jbarnoud @kain88-de Thanks for the reviews, I'll take a look into it and rectify the error 😃 |
@jbarnoud Did I do it right? or should change something. |
with self.assertRaises(ValueError): | ||
PSA.dist_mat_to_vec(5, -1, 2) | ||
|
||
# Check if i is negative |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copy paste error. "Check if j is negative.
package/MDAnalysis/analysis/psa.py
Outdated
if i < 0 or j < 0 or N < 2: | ||
error_str = "Matrix indices are invalid; i and j must be greater than 0 and N must be greater the 2" | ||
raise ValueError(error_str) | ||
|
||
if i > N or j > N: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to @sseyler comment, this test should be i > N - 1 or j > N
. Yet, because the function knows how to look at the lower triangle too, it should be (j > i and (i > N - 1 or j > N)) or (j < i and (i > N or j > N - 1))
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbarnoud, but this doesn't include the case in which N > 2, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also the case where i or j is negative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am only talking about replacing the line that says if i > N or j > N:
. The value of N
, and i
and j
being negative, are tested with the previous if
.
You are almost there! Beside the typo I pointed in an in-line comment, you have two things to do before I can merge:
Once this is fixed, I think it is all good (assuming the tests pass of course). |
@kain88-de are all the requested changes completed? If so could you please merge this. |
package/MDAnalysis/analysis/psa.py
Outdated
@@ -573,7 +574,15 @@ def dist_mat_to_vec(N, i, j): | |||
:Returns: | |||
int, index (of the matrix element) in the corresponding distance vector | |||
""" | |||
if i > N or j > N: | |||
|
|||
if not (isinstance(N, int) or isinstance(i, int) or isinstance(j, int)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this still work with numpy integers? So you pass np.int32(3)
or np.int64(3)
to the function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kain88-de Okay, thanks! I'll look into it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kain88-de should I add diffrent clauses fr np.int32() and np.int64() and normal int?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One solution is to use isinstance(N, numbers.Integral)
.
@kain88-de Is this fine? |
package/MDAnalysis/analysis/psa.py
Outdated
@@ -575,7 +575,7 @@ def dist_mat_to_vec(N, i, j): | |||
int, index (of the matrix element) in the corresponding distance vector | |||
""" | |||
|
|||
if not (isinstance(N, int) or isinstance(i, int) or isinstance(j, int)): | |||
if not (isinstance(N, numbers.Integral) or isinstance(i, numbers.Integral) or isinstance(j, numbers.Integral)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should add tests as well for the numpy integers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh forgot that, my bad.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kain88-de I was thinking about how to check whether input is numpy integer or not, I think we are only checking whether the input is not an integer, How should I validate whether the input is valid integer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current check is testing if the input is an integer. In you test you should just make sure that it's ok to use numpy integers like np.int32 in the function. That is important if one wants to use integers from numpy arrays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What @kain88-de means is that you should add a few lines in test_psa.py
with numpy integer as inputs. The function should work of we give N=numpy.int32(4)
or N=numpy.int16(4)
. and this has to be tested. It is likely that the input values for dist_mat_to_vec
will come from a numpy array at some point, then their type will not be a plain int
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kain88-de @jbarnoud I think this is what you meant, right?
@kain88-de I think It's done now, all the tests are passed. |
Thanks |
@jbarnoud @kain88-de I would like to complete this as my gsoc project. Any guidance on how to start with it? |
You can write us on the mailing list or the referenced bug to discuss this idea. |
This merge introduced a ton of commits with identical messages and made the history look ugly/confusing. @kain88-de when you merge a PR like this the next time please squash/merge everything or ask for it to be rebased/squashed by the contributor. Thanks. @MDAnalysis/coredevs should we rebase develop and squash all of the commits belonging to this PR, the force push? (In order to do this, branch protection has to be temporarily lifted.) |
I would avoid force pushing on develop. Even though there are several commits with identical titles, at least they are identical, grouped in time, and meaningful. IMHO it is not harmful enough in the history to create a precedent of lifting the branch protection. |
Part of #597
Changes made in this Pull Request:
PR Checklist