Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JOSS Review] Software testing #10

Closed
oesteban opened this issue Sep 7, 2017 · 6 comments
Closed

[JOSS Review] Software testing #10

oesteban opened this issue Sep 7, 2017 · 6 comments

Comments

@oesteban
Copy link

oesteban commented Sep 7, 2017

The main missing point here is a "smoke test" (just checking that the software worked, allowing any size of error at the output). How it would work:

  1. Generate an example features file. Store it in your repo so travis fetches it.
  2. Generate (locally) the outputs from that specific features file. Add it to the repo.
  3. Run the command line on that file (use coverage if you want to make sure that run is taken into account for coverage measure, allowing you to get rid of some tests -I go deeper on this below-).
  4. Check that the output of 3) is the same you got in 2).

This does not check that the method worked correctly, it only checks that it did not break down (no smoke came out the box).

But, you happen to have an alternative implementation. This is a great testing oracle. You will be able to 1) ensure your implementation matches the original; 2) be more certain about the correctness of your software. To do so, modify 2) in the previous list with the outputs generated with your Matlab implementation. In 4) you probably want to modify your test to accept certain threshold of numerical tolerance. After all, you wouldn't need to repeat all the experiments of your preprint to provide some evidence that this implementation is equivalent.

The rest of the tests just check the conformity of inputs and outputs (

def test_dimensions():
ew = hiwenet(features, groups)
assert len(ew) == num_groups
assert ew.shape[0] == num_groups and ew.shape[1] == num_groups
def test_too_few_groups():
features, groups, group_ids, num_groups = make_features(100, 1)
with raises(ValueError):
ew = hiwenet(features, groups)
def test_too_few_values():
features, groups, group_ids, num_groups = make_features(10, 500)
with raises(ValueError):
ew = hiwenet(features[:num_groups-1], groups)
def test_invalid_trim_perc():
with raises(ValueError):
ew = hiwenet(features, groups, trim_percentile= -1)
with raises(ValueError):
ew = hiwenet(features, groups, trim_percentile=101)
def test_invalid_weight_method():
with raises(NotImplementedError):
ew = hiwenet(features, groups, weight_method= 'dkjz.jscn')
with raises(NotImplementedError):
ew = hiwenet(features, groups, weight_method= 'somerandomnamenoonewoulduse')
def test_trim_not_too_few_values():
with raises(ValueError):
ew = hiwenet( [0], [1], trim_outliers = False)
def test_trim_false_too_few_to_calc_range():
with raises(ValueError):
ew = hiwenet( [1], groups, trim_outliers = False)
def test_not_np_arrays():
with raises(ValueError):
ew = hiwenet(list(), groups, trim_percentile=101)
with raises(ValueError):
ew = hiwenet(features, list(), trim_percentile=101)
def test_invalid_nbins():
with raises(ValueError):
ew = hiwenet(features, groups, num_bins=np.NaN)
with raises(ValueError):
ew = hiwenet(features, groups, num_bins=np.Inf)
with raises(ValueError):
ew = hiwenet(features, groups, num_bins=2)
def test_return_nx_graph():
nxG = hiwenet(features, groups, return_networkx_graph = True)
assert isinstance(nxG, nx.Graph)
assert nxG.number_of_nodes() == num_groups
assert nxG.number_of_edges() == num_links
def test_extreme_skewed():
# Not yet sure what to test for here!!
ew = hiwenet(10+np.zeros(dimensionality), groups)
), which is fine.

Or they test argparse, when it already takes care of the correctness and interpretation of the command line. I think these tests do not exercise much the code:

# CLI tests
def test_CLI_run():
"function to hit the CLI lines."
# first word is the script names (ignored)
cur_dir = os.path.dirname(os.path.abspath(__file__))
featrs_path = os.path.abspath(os.path.join(cur_dir, '..', 'examples', 'features_1000.txt'))
groups_path = os.path.abspath(os.path.join(cur_dir, '..', 'examples', 'groups_1000.txt'))
sys.argv = shlex.split('hiwenet -f {} -g {} -n 25'.format(featrs_path, groups_path))
CLI()
def test_CLI_nonexisting_paths():
"invalid paths"
cur_dir = os.path.dirname(os.path.abspath(__file__))
featrs_path = os.path.abspath(os.path.join(cur_dir, '..', 'examples', 'features_1000.txt'))
groups_path = 'NONEXISTING_groups_1000.txt'
sys.argv = shlex.split('hiwenet -f {} -g {} -n 25'.format(featrs_path, groups_path))
with raises(IOError):
CLI()
featrs_path = 'NONEXISTING_features_1000.txt'
groups_path = os.path.abspath(os.path.join(cur_dir, '..', 'examples', 'groups_1000.txt'))
sys.argv = shlex.split('hiwenet -f {} -g {} -n 25'.format(featrs_path, groups_path))
with raises(IOError):
CLI()
def test_CLI_invalid_args():
"invalid paths"
featrs_path = 'NONEXISTING_features_1000.txt'
# arg aaa or invalid_arg_name doesnt exist
sys.argv = shlex.split('hiwenet --aaa {0} -f {0} -g {0}'.format(featrs_path))
with raises(SystemExit):
CLI()
sys.argv = shlex.split('hiwenet --invalid_arg_name {0} -f {0} -g {0}'.format(featrs_path))
with raises(SystemExit):
CLI()
def test_CLI_too_few_args():
"testing too few args"
sys.argv = ['hiwenet ']
with raises(SystemExit):
CLI()
sys.argv = ['hiwenet -f check']
with raises(SystemExit):
CLI()
sys.argv = ['hiwenet -g check']
with raises(SystemExit):
CLI()
.

(ref. openjournals/joss-reviews#380)

@raamana
Copy link
Owner

raamana commented Sep 8, 2017

Man, you're incredibly patient to write a detailed issue 👍

Anyways if I understand you and suggested test correctly, this will be testing the I/O aspects of hiwenet, more so than the implementation itself. Yes? I think this is important if we were to offer both interfaces -- working on it.

@oesteban
Copy link
Author

oesteban commented Sep 8, 2017

I proposed two "flavors":

  1. Vanilla:
  • exercises the i/o aspects (as you will see, the coverage will not drop if you remove all the tests on the command line).
  • but more importantly: will let you know when something is broken inside the extract function (meaning, you make a mistake coding and you catch the problem early)
  1. Cross-validated:
  • in addition to the vanilla, you will provide proof that the python implementation is close enough to the matlab implementation

@raamana
Copy link
Owner

raamana commented Sep 8, 2017

The 9bcc307 should resolve item 1 here (involving API, CLI an I/O).

The second one is more about matching medpy's implementation with that coming from the matlab toolbox. This comparison is complicated and non-trivial, which I feel is slightly beyond the scope of this package/review. It is important for my research and I will dig into it as I try to make comparisons between them.

@oesteban
Copy link
Author

I believe it would make sense to extend the added test to all metrics, but I leave this decision to @raamana.

I also consider that test 2 is not that hard to implement (it just takes pre-calculating the features file on a certain dataset with the MATLAB version and check that it is close enough to the features calculated from the same dataset with python, and the dataset does not need to come from a real brain, they could just be randomly generated). This test would add real value to the testing effort. As I said, it is not something I consider mandatory, so I'll close this issue.

@raamana
Copy link
Owner

raamana commented Nov 21, 2017

Thanks Oscar, its definitely to run the CLI vs API matching test to all metrics, I haven't done, as I believe if it does for one, it should also behave the same for others, and I didn't want to elongate the tests for no obvious reasons. I will extend them.

Regarding matlab vs python, as I mentioned before, I expect them to differ as they are implemented by different people , possibly with different equations, although being in the same family. Moreover, i don't any consider any to be a reference or ground truth - each are simply a different instantiation (although with different implementations). I agree with you doing this is way better than otherwise, I hope to do it when I am able to, or when I am able to find other resources (like a coop project).

@raamana
Copy link
Owner

raamana commented Nov 21, 2017

CLI vs API matching test has been extended - via fb224f2 - to all metrics and semi metrics now, thanks for pointing it out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants