-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatically test if new version of HARK crash Demarks/Remarks #29
Comments
Has anyone here looked at |
Thanks for pointing us to nbval.
I hope that it does not test whether the results of a cell are byte-by-byte
identical, because we've found that jupyter notebooks embed all kinds of
things like timestamps and module version numbers and other cruft which
makes it hard to tell when there has been a MEANINGFUL change to the
notebook. (We are using jupytext to handle that problem but I would
imagine a similar issue would come up in this context).
…On Mon, May 6, 2019 at 1:53 PM Hameer Abbasi ***@***.***> wrote:
Has anyone here looked at nbval <https://pypi.org/project/nbval/> for
PyTest integration? Currently it conflicts with pytest-cov (see
computationalmodelling/nbval#116
<computationalmodelling/nbval#116>), but it
tests everything at least.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#29 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAKCK76X4ORZ2ARKNX4BMZDPUBWBFANCNFSM4HJLKORA>
.
--
- Chris Carroll
|
It has a loose mode which doesn't test the output but does test that the notebooks run successfully. |
That sounds great, exactly what we need. @shaunagm, we might want to do this differently for DemARK's and REMARKs. For REMARKs the most important thing is that the do_min.py file runs from the command line; the notebooks tend to be gravy. But for the DemARKs their whole point is to interactively demonstrate things, so nbval might be perfect for them.__ |
@hameerabbasi and Keith and I have been chatting about this. We probably want to identify a set of notebooks which should always be up to date with HARK - maybe the "gentle introduction to HARK" notebook and other instructional examples. When changing HARK broke those notebooks we'd know we're introducing a breaking change and could mark the release/schedule for release accordingly, while updating the notebooks. But many of our notebooks we don't want to actively update every time we change HARK, so we can't test against them - or at least, we can't test that they execute, though we can (probably) test that they still load. I'm not sure what kind of changes to HARK would be so catastrophic they'd prevent a notebook from even loading. |
One thing I've been wondering about is whether there it is possible to have a second kind of testing, for slow-running code (some of our do_min.py files might take 2-3 minutes, and if we were to test all of them there might be a quite a delay before Travis approved them). Is this part of the reason you think some of the notebooks should be excluded from Travis testing? Because if not for that consideration, it seems to me a good workflow would be to by default test all the DemARK notebooks and REMARK do_min's, and then if we are notified that one of them breaks we can choose either to fix the problem (if it's easy) or to remove that notebook from the "master" branch until it is fixed. (I REALLY don't want to have notebooks posted that break when new users try to use them). If the issue is about speed of the tests (not wanting to have to wait very long for Travis), but there is another kind of testing that we could, say, run every night overnight, and be informed the next day after a merge, that might be a good approach. Does this make sense? |
@llorracc - my feeling is that we should have two kinds of notebooks. Type A is "pinned" to a specific older version of HARK (and pinned to specific versions of any other dependencies). Those always work because they are a snapshot of history, frozen in time. Type B uses whatever the most recent version of HARK is. This is the kind of notebook we would test new versions of HARK against, and would need to be changed as we changed HARK. They would approximately always work but we might accidentally break them occasionally because we're maintaining them the same as we maintain HARK. Type A are much, much easier to have and seem like a good fit for remarks, aka notebooks that are capturing an implementation/replication of a specific paper. Type B is a form of testing/documentation and is more work to maintain but that's the nature of tests/documentation - they always need to be changed along with the code. I don't really care about how long the tests are taking to run - it doesn't seem too bad so far, so I haven't considered it as a factor. This is more trying to avoid the cognitive labor of having to update dozens or hundreds of notebooks every time we change HARK. |
I see, that makes sense. What I was worried about is if somebody had an updated version of HARK but downloaded a notebook that used to work but doesn't now, that would fluster them. But your "pinning" solution solves that problem. I think you're right about the REMARKs -- they are intended as a kind of snapshot of a moment in time and a set of tools with which the problem was solved. Is there a way to set things up so that new content automatically gets "marked" with the HARK version number under which it was initially tested, or is that something we would have to put in by hand when we merge to master for the first time (say)?
I'm hoping that with most HARK version changes, not much will break. But possibly I'm overoptimistic on that ... |
What do you mean by "new content"? Like, new demarks, remarks, etc? I don't know of an automatic way to ensure that dependencies are pinned, so my feeling is we'll just need to be good about checking for version #s before merging PRs. But there may be a way I haven't heard about. |
Yes, that's what I meant. So, it should be part of a checklist before merging new content into master. Or I guess there can be one master "requirements.txt" file at the root of the REMARK which applies to all of the content therein? (Like, do_min, do_mid, do_all, and other ways of using the code)? |
Yes, checklist seems like the solution here. We'll likely want to let different remarks have different requirements, but that'll depend a bit on whether we end up using mybinder, colab, etc to host them. |
econ-ark/HARK#280 starts to work on this |
With Mridul's help we have set this up to work on Demark. |
Copying a comment from @llorracc in another issue:
Since we do not yet have much in the way of unit tests, I'm hoping that we can repurpose something that we DO have: Our DemARKs and REMARKs. They reside in separate repos, but we could make those repos "submodules" in the HARK repo, pointing always to the master branch of DemARK and REMARK.
A minimal "test" of new code is that it doesn't break any of our existing working DemARK and REMARK examples. This would be a good task for a sprinter who knows a reasonable amount about CI and testing, but nothing about HARK.
The first step would be for Travis to "update" the submodules to the latest version of the DemARK and REMARK submodules. Then it seems like it should be possible to get Travis to run some pseudo-code like (in my native language of bash):
and see whether any of the runs crashes things. (Now that we impose CI for PR's for everything, this would also mean that any revision of an existing DemARK or REMARK would automatically be tested upon creation). For REMARKs it is only slightly more complicated. Since the instructions for creating a do_min.py file is that it should take no more than a couple of minutes to run, something like this should work:
The text was updated successfully, but these errors were encountered: