Implement checksums for CircleCI integration tests #59

rmarkello · 2018-05-17T20:25:02Z

The current mechanism for integration testing with CircleCI is to run the full tedana pipeline and then compare the content of the output files to previously generated files using np.allclose. This procedure ensures that the results of the tedana pipeline haven't changed in any appreciable way, which is critical as refactoring and updating occurs.

I was recently inspired by @WhitakerLab's BrainNetworksInPython repository, however, which uses Python's hashlib library to implement checksum comparisons of their generated outputs. I think this procedure could be applied with great success to the tedana repository!

Rather than having CircleCI re-download the previously generated outputs from Dropbox each time, they could be hashed once and kept in a JSON file in the tests/data folder. The outputs generated during testing could then be hashed and compared against this reference. This would also make testing locally a lot easier (no need to download the 1.9GB output directory!).

The text was updated successfully, but these errors were encountered:

KirstieJane · 2018-05-18T06:03:34Z

Eeeeeexcellent - I’m glad that’s helpful! I’m tagging @Islast because she did all that work so she could potentially pass on any advice etc ✨

Islast · 2018-05-18T09:49:23Z

I'm thrilled that you like our method @rmarkello 😄
My warning is that because hashing doesn't allow you to use tolerances as np.close does you have to be very delicate about random states (unless tedana is deterministic). We have a different hash for each environment we test in.
This is manageable but it often feels like our continuous integration is more effective at alerting us to dependency updates interfering with the random state or the formatting of written outputs than it is at checking for behavioural changes in BrainNetworksInPython.

rmarkello · 2018-05-18T14:06:59Z

This point is well met, @Islast! Thank you so much for raising it (and thanks for pinging her in, @KirstieJane!).

While tedana is not deterministic, we do try to ensure consistent seeding for test purposes. However, you're completely right: given the sheer number of floating point operations that occur on the data in tedana, we might need numpy's ability to set tolerance thresholds!

I'm hopeful that with testing occurring inside a Docker container, things will be more-or-less consistent from run-to-run—enough so to at least give this a shot. Indeed, in some respects being made aware of dependency changes or alterations to output formats may be a feature of using the hashing approach, not a bug!

pbellec · 2018-05-19T01:24:29Z

Very cool. We have a similar testing strategy in NIAK and this idea will help a lot. BTW in our case, with proper seeding and a container we exactly replicate outputs.

emdupre · 2018-05-21T19:23:04Z

This is a great idea, @rmarkello !

I think I'd like to wait on implementing it until we resolve #14 (which is yielding issues in #18), since the ideal workflow will allow developers to test locally by just downloading a json of the hash values, rather than the current 2GB of dropbox data. This is definitely something for our 0.1.0 milestone, though !

tsalo · 2019-01-11T14:52:02Z

Lately we've been making a bunch of breaking changes, so we've been fine with just checking for the existence of specific output files. However, once the final breaking PRs (#164 and #152) are merged in, I want to work on refactoring a lot of the more confusing code (e.g., fitmodels_direct and the component selection). I think we should try implementing this before that point, because I would like to make sure that the contents of the files are correct after cleaning up the code.

jbteves · 2019-04-20T13:52:13Z

@tsalo @KirstieJane is this formally going to be part of the GSoC student's work if accepted?

tsalo · 2019-04-22T16:01:32Z

If they're able to do it, then definitely. It will have to come after unit tests though (since they're the low-hanging fruit).

jbteves · 2019-04-22T16:03:28Z

Oh, no, totally, unit testing is higher priority. I'm just trying to make sure that there's a path forward for this issue. It seems like we should just see if the student can get to it this summer, and if not then we'll have to revisit it ourselves. Seem okay?

effigies · 2019-09-19T16:51:05Z

Just a quick note that recent nibabel now has a nib-diff CLI tool that lets you set np.allclose-style tolerances on neuroimages, in case you do need something looser than a checksum.

jbteves · 2019-10-29T01:20:46Z

Added to the sprint agenda.

emdupre · 2019-11-08T16:12:42Z

Is this still being considered @rmarkello ? I didn't see it in #418 ?

rmarkello · 2019-11-10T18:17:18Z

Still being considered, for sure! We're going to need to add regression tests to actually compare the outputs of the integration tests to pre-generated data, but it doesn't (didn't) make sense to do that when so many PRs were going to totally mess with the actual data generation! All our tests would consistently be failing 😢

Once #435, #436, and @handwerkerd's eventual PR to overhaul the decision tree are integrated we can do this. In the interim I can open a PR with an example of what this would look like, though?

stale · 2020-02-08T18:48:27Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions to tedana:tada: !

jbteves · 2020-02-13T15:11:34Z

@rmarkello do you still think this is workable?

rmarkello · 2020-02-14T14:20:28Z

I do, especially if we use nib-diff rather than checksums. That said, I've been totally delinquent on tedana so I'm not sure about the status of some of the more major changes (e.g., the decision tree overhaul). I think it would probably best to hold off on implementing these until then because we know those are going to be breaking changes. But if others want to see these done sooner than that's fine; we'll just have to acknowledge those PRs will fail all the tests until we update the data to compare against!

jbteves · 2020-02-14T14:35:25Z

Forgive me if this is incorrect, but doesn't nib-diff require that both files be in memory? In other words, don't you still have to download or upload a large amount of data to use it?

rmarkello · 2020-02-14T14:40:10Z

You're totally right, they would both have to be in memory. I think checksums are just gonna be a real pain with floating point data, though, so I think we need to do something that's a bit more np.allclose-like. Since the test data are all down-sampled pretty aggressively do you think it would be a problem to load two images in memory at a time? My thought is that it would never be more than two (the new image and the comparison / baseline).

As stands, we are (or used tobe? at one point we definitely were) downloading the comparison output data on CircleCI every time (and, if I remember correctly, just not using it)!

jbteves · 2020-02-14T14:44:43Z

We're definitely uploading comparison output to CircleCI every time, and not using it. I thought the point of the checksums was to reduce the download/upload need, but I forgot about the upload of artifacts to CI rather than just checksums.
I don't think that there's a memory constraint problem, especially with the truncated data. You would have to go pretty extreme to get a memory issue.
I wonder about simply running np.allclose then on the floating-point data. You could add this to the integration test file pretty straightforwardly, correct? What's the advantage of a checksum over that strategy?

rmarkello · 2020-02-14T14:49:24Z

At this point I'm definitely not recommending checksums! The benefit of nib-diff over a straight np.allclose is that the former checks things like the file header, etc, to make sure you're not altering other things besides the actual data content.

Finally, it's worth noting we don't have to call nib-diff from the command line—we could just import the relevant functions (nibabel.cmdline.diff.diff) and use those in the integration tests!

jbteves · 2020-02-14T14:53:33Z

I'm following this now. I think we'd discussed it briefly at the hackathon, but unfortunately my memory is trash and I didn't write it down so unfortunately we're having this conversation a second time. 🤦‍♂️
Okay, awesome, thanks @rmarkello . Other stuff is higher priority but I think we have a pretty good way forward for this.

tsalo · 2020-08-06T15:08:02Z

It seems like we've converged on nib-diff over other possible checks. I noted that in #452. Once we have tests working for T2* estimation and optimal combination, I think we should at least look at doing it for the full integration tests.

emdupre mentioned this issue May 21, 2018

[TST] Add tests for tedana.model.monoexponential #61

Merged

emdupre added this to the 0.1.0 milestone May 21, 2018

emdupre added enhancement issues describing possible enhancements to the project testing issues related to improving testing in the project labels May 21, 2018

tsalo mentioned this issue Jan 18, 2019

Create Google Summer of Code project #185

Closed

stale bot added the stale label Feb 8, 2020

stale bot removed the stale label Feb 13, 2020

handwerkerd mentioned this issue Feb 21, 2020

integration tests not warning about more voxels excluded from a mask #534

Open

stale bot added the stale label May 14, 2020

tsalo mentioned this issue Aug 6, 2020

Add regression tests for T2*/S0 estimation and optimal combination #452

Open

3 tasks

stale bot removed the stale label Aug 6, 2020

handwerkerd mentioned this issue Aug 14, 2020

[REF] Decision tree modularization #592

Closed

stale bot added the stale label Nov 5, 2020

tsalo removed the stale label Feb 5, 2021

ME-ICA deleted a comment from stale bot Nov 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement checksums for CircleCI integration tests #59

Implement checksums for CircleCI integration tests #59

rmarkello commented May 17, 2018

KirstieJane commented May 18, 2018

Islast commented May 18, 2018 •

edited

Loading

rmarkello commented May 18, 2018

pbellec commented May 19, 2018 •

edited

Loading

emdupre commented May 21, 2018

tsalo commented Jan 11, 2019

jbteves commented Apr 20, 2019

tsalo commented Apr 22, 2019

jbteves commented Apr 22, 2019

effigies commented Sep 19, 2019

jbteves commented Oct 29, 2019

emdupre commented Nov 8, 2019

rmarkello commented Nov 10, 2019

stale bot commented Feb 8, 2020

jbteves commented Feb 13, 2020

rmarkello commented Feb 14, 2020

jbteves commented Feb 14, 2020

rmarkello commented Feb 14, 2020

jbteves commented Feb 14, 2020

rmarkello commented Feb 14, 2020

jbteves commented Feb 14, 2020

tsalo commented Aug 6, 2020

Implement checksums for CircleCI integration tests #59

Implement checksums for CircleCI integration tests #59

Comments

rmarkello commented May 17, 2018

KirstieJane commented May 18, 2018

Islast commented May 18, 2018 • edited Loading

rmarkello commented May 18, 2018

pbellec commented May 19, 2018 • edited Loading

emdupre commented May 21, 2018

tsalo commented Jan 11, 2019

jbteves commented Apr 20, 2019

tsalo commented Apr 22, 2019

jbteves commented Apr 22, 2019

effigies commented Sep 19, 2019

jbteves commented Oct 29, 2019

emdupre commented Nov 8, 2019

rmarkello commented Nov 10, 2019

stale bot commented Feb 8, 2020

jbteves commented Feb 13, 2020

rmarkello commented Feb 14, 2020

jbteves commented Feb 14, 2020

rmarkello commented Feb 14, 2020

jbteves commented Feb 14, 2020

rmarkello commented Feb 14, 2020

jbteves commented Feb 14, 2020

tsalo commented Aug 6, 2020

Islast commented May 18, 2018 •

edited

Loading

pbellec commented May 19, 2018 •

edited

Loading