[JOSS Review] Software testing #10

oesteban · 2017-09-07T15:52:10Z

The main missing point here is a "smoke test" (just checking that the software worked, allowing any size of error at the output). How it would work:

Generate an example features file. Store it in your repo so travis fetches it.
Generate (locally) the outputs from that specific features file. Add it to the repo.
Run the command line on that file (use coverage if you want to make sure that run is taken into account for coverage measure, allowing you to get rid of some tests -I go deeper on this below-).
Check that the output of 3) is the same you got in 2).

This does not check that the method worked correctly, it only checks that it did not break down (no smoke came out the box).

But, you happen to have an alternative implementation. This is a great testing oracle. You will be able to 1) ensure your implementation matches the original; 2) be more certain about the correctness of your software. To do so, modify 2) in the previous list with the outputs generated with your Matlab implementation. In 4) you probably want to modify your test to accept certain threshold of numerical tolerance. After all, you wouldn't need to repeat all the experiments of your preprint to provide some evidence that this implementation is equivalent.

The rest of the tests just check the conformity of inputs and outputs (

hiwenet/hiwenet/test_hiwenet.py

Lines 46 to 110 in c4fc23b

    
           def test_dimensions(): 
        
               ew = hiwenet(features, groups) 
        
               assert len(ew) == num_groups 
        
               assert ew.shape[0] == num_groups and ew.shape[1] == num_groups 
        
           def test_too_few_groups(): 
        
               features, groups, group_ids, num_groups = make_features(100, 1) 
        
               with raises(ValueError): 
        
                   ew = hiwenet(features, groups) 
        
           def test_too_few_values(): 
        
               features, groups, group_ids, num_groups = make_features(10, 500) 
        
               with raises(ValueError): 
        
                   ew = hiwenet(features[:num_groups-1], groups) 
        
           def test_invalid_trim_perc(): 
        
               with raises(ValueError): 
        
                   ew = hiwenet(features, groups, trim_percentile= -1) 
        
               with raises(ValueError): 
        
                   ew = hiwenet(features, groups, trim_percentile=101) 
        
           def test_invalid_weight_method(): 
        
               with raises(NotImplementedError): 
        
                   ew = hiwenet(features, groups, weight_method= 'dkjz.jscn') 
        
               with raises(NotImplementedError): 
        
                   ew = hiwenet(features, groups, weight_method= 'somerandomnamenoonewoulduse') 
        
           def test_trim_not_too_few_values(): 
        
               with raises(ValueError): 
        
                   ew = hiwenet( [0], [1], trim_outliers = False) 
        
           def test_trim_false_too_few_to_calc_range(): 
        
               with raises(ValueError): 
        
                   ew = hiwenet( [1], groups, trim_outliers = False) 
        
           def test_not_np_arrays(): 
        
               with raises(ValueError): 
        
                   ew = hiwenet(list(), groups, trim_percentile=101) 
        
               with raises(ValueError): 
        
                   ew = hiwenet(features, list(), trim_percentile=101) 
        
           def test_invalid_nbins(): 
        
               with raises(ValueError): 
        
                   ew = hiwenet(features, groups, num_bins=np.NaN) 
        
               with raises(ValueError): 
        
                   ew = hiwenet(features, groups, num_bins=np.Inf) 
        
               with raises(ValueError): 
        
                   ew = hiwenet(features, groups, num_bins=2) 
        
           def test_return_nx_graph(): 
        
               nxG = hiwenet(features, groups, return_networkx_graph = True) 
        
               assert isinstance(nxG, nx.Graph) 
        
               assert nxG.number_of_nodes() == num_groups 
        
               assert nxG.number_of_edges() == num_links 
        
           def test_extreme_skewed(): 
        
               # Not yet sure what to test for here!! 
        
               ew = hiwenet(10+np.zeros(dimensionality), groups)

), which is fine.

Or they test argparse, when it already takes care of the correctness and interpretation of the command line. I think these tests do not exercise much the code:

hiwenet/hiwenet/test_hiwenet.py

Lines 113 to 168 in c4fc23b

    
           # CLI tests 
        
           def test_CLI_run(): 
        
               "function to hit the CLI lines." 
        
               # first word is the script names (ignored) 
        
               cur_dir = os.path.dirname(os.path.abspath(__file__)) 
        
               featrs_path = os.path.abspath(os.path.join(cur_dir, '..', 'examples', 'features_1000.txt')) 
        
               groups_path = os.path.abspath(os.path.join(cur_dir, '..', 'examples', 'groups_1000.txt')) 
        
               sys.argv = shlex.split('hiwenet -f {} -g {} -n 25'.format(featrs_path, groups_path)) 
        
               CLI() 
        
           def test_CLI_nonexisting_paths(): 
        
               "invalid paths" 
        
               cur_dir = os.path.dirname(os.path.abspath(__file__)) 
        
               featrs_path = os.path.abspath(os.path.join(cur_dir, '..', 'examples', 'features_1000.txt')) 
        
               groups_path = 'NONEXISTING_groups_1000.txt' 
        
               sys.argv = shlex.split('hiwenet -f {} -g {} -n 25'.format(featrs_path, groups_path)) 
        
               with raises(IOError): 
        
                   CLI() 
        
               featrs_path = 'NONEXISTING_features_1000.txt' 
        
               groups_path = os.path.abspath(os.path.join(cur_dir, '..', 'examples', 'groups_1000.txt')) 
        
               sys.argv = shlex.split('hiwenet -f {} -g {} -n 25'.format(featrs_path, groups_path)) 
        
               with raises(IOError): 
        
                   CLI() 
        
           def test_CLI_invalid_args(): 
        
               "invalid paths" 
        
               featrs_path = 'NONEXISTING_features_1000.txt' 
        
               # arg aaa or invalid_arg_name doesnt exist 
        
               sys.argv = shlex.split('hiwenet --aaa {0} -f {0} -g {0}'.format(featrs_path)) 
        
               with raises(SystemExit): 
        
                   CLI() 
        
               sys.argv = shlex.split('hiwenet --invalid_arg_name {0} -f {0} -g {0}'.format(featrs_path)) 
        
               with raises(SystemExit): 
        
                   CLI() 
        
           def test_CLI_too_few_args(): 
        
               "testing too few args" 
        
               sys.argv = ['hiwenet '] 
        
               with raises(SystemExit): 
        
                   CLI() 
        
               sys.argv = ['hiwenet -f check'] 
        
               with raises(SystemExit): 
        
                   CLI() 
        
               sys.argv = ['hiwenet -g check'] 
        
               with raises(SystemExit): 
        
                   CLI()

.

(ref. openjournals/joss-reviews#380)

raamana · 2017-09-08T16:16:58Z

Man, you're incredibly patient to write a detailed issue 👍

Anyways if I understand you and suggested test correctly, this will be testing the I/O aspects of hiwenet, more so than the implementation itself. Yes? I think this is important if we were to offer both interfaces -- working on it.

oesteban · 2017-09-08T16:21:20Z

I proposed two "flavors":

Vanilla:

exercises the i/o aspects (as you will see, the coverage will not drop if you remove all the tests on the command line).
but more importantly: will let you know when something is broken inside the extract function (meaning, you make a mistake coding and you catch the problem early)

Cross-validated:

in addition to the vanilla, you will provide proof that the python implementation is close enough to the matlab implementation

raamana · 2017-09-08T16:46:24Z

The 9bcc307 should resolve item 1 here (involving API, CLI an I/O).

The second one is more about matching medpy's implementation with that coming from the matlab toolbox. This comparison is complicated and non-trivial, which I feel is slightly beyond the scope of this package/review. It is important for my research and I will dig into it as I try to make comparisons between them.

oesteban · 2017-11-21T02:09:57Z

I believe it would make sense to extend the added test to all metrics, but I leave this decision to @raamana.

I also consider that test 2 is not that hard to implement (it just takes pre-calculating the features file on a certain dataset with the MATLAB version and check that it is close enough to the features calculated from the same dataset with python, and the dataset does not need to come from a real brain, they could just be randomly generated). This test would add real value to the testing effort. As I said, it is not something I consider mandatory, so I'll close this issue.

raamana · 2017-11-21T02:19:51Z

Thanks Oscar, its definitely to run the CLI vs API matching test to all metrics, I haven't done, as I believe if it does for one, it should also behave the same for others, and I didn't want to elongate the tests for no obvious reasons. I will extend them.

Regarding matlab vs python, as I mentioned before, I expect them to differ as they are implemented by different people , possibly with different equations, although being in the same family. Moreover, i don't any consider any to be a reference or ground truth - each are simply a different instantiation (although with different implementations). I agree with you doing this is way better than otherwise, I hope to do it when I am able to, or when I am able to find other resources (like a coop project).

raamana · 2017-11-21T02:31:45Z

CLI vs API matching test has been extended - via fb224f2 - to all metrics and semi metrics now, thanks for pointing it out.

raamana mentioned this issue Oct 17, 2017

[JOSS review] Discussion and Clarifications #8

Closed

oesteban closed this as completed Nov 21, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[JOSS Review] Software testing #10

[JOSS Review] Software testing #10

oesteban commented Sep 7, 2017

raamana commented Sep 8, 2017

oesteban commented Sep 8, 2017

raamana commented Sep 8, 2017

oesteban commented Nov 21, 2017

raamana commented Nov 21, 2017

raamana commented Nov 21, 2017

[JOSS Review] Software testing #10

[JOSS Review] Software testing #10

Comments

oesteban commented Sep 7, 2017

raamana commented Sep 8, 2017

oesteban commented Sep 8, 2017

raamana commented Sep 8, 2017

oesteban commented Nov 21, 2017

raamana commented Nov 21, 2017

raamana commented Nov 21, 2017