Automated tests for functionality in DemARKs #543

sbenthall · 2020-02-24T21:52:57Z

The DemARKs take a long time to execute.

But smaller tests of their functionality could be set up as automated tests in HARK.

That would increase test coverage in a meaningful way (i.e., we could catch more errors that might break DemARKs before they are committed.)

sbenthall · 2020-02-26T19:28:44Z

sbenthall · 2020-02-26T19:40:41Z

Oh, whoops--several of these have been moved to examples/ already.
However, this is all the more reason to have automated tests for them--it's functionality explicitly supported by HARK.

sbenthall · 2020-02-26T19:42:06Z

There is already a test based on BufferStock.

llorracc · 2020-02-27T00:11:51Z

A first step would be to write a script that just goes through them all and executes them one by one and records how long each one takes. We could then incorporate those that run quickly (in a couple of minutes or less, say) into the automated tests, and mark the others as ones with a "to-do" item of writing tests that run quickly.

…

On Mon, Feb 24, 2020 at 10:52 PM Sebastian Benthall < ***@***.***> wrote: The DemARKs take a long time to execute. But smaller tests of their functionality could be set up as automated tests in HARK. That would increase test coverage in a meaningful way (i.e., we could catch more errors that might break DemARKs *before* they are committed.) — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#543?email_source=notifications&email_token=AAKCK72GLSDLGDIB6LVVXTTREQ6TVA5CNFSM4K2TNTU2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IP4DN2A>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAKCK73JSDRGVEG7L7CNN73REQ6TVANCNFSM4K2TNTUQ> .

-- - Chris Carroll

MridulS · 2020-02-27T10:37:27Z

Had a script which did the timing.

[notebook, time in seconds]

Structural-Estimates-From-Empirical-MPCs-Fagereng-et-al.py 169.72158694267273
Nondurables-During-Great-Recession.py 82.00251603126526
DCEGM-Upper-Envelope.py 4.082727909088135
IndShockConsumerType.py 3.7925097942352295
TractableBufferStock-Interactive.py 1.7823729515075684
GenIncProcessModel.py 97.28985691070557
DiamondOLG.py 3.757750988006592
KeynesFriedmanModigliani.py 6.499566078186035
Micro-and-Macro-Implications-of-Very-Impatient-HHs.py 44.61026406288147
IncExpectationExample.py 440.81260800361633
Gentle-Intro-To-HARK-Buffer-Stock-Model.py 4.099589824676514
KinkedRconsumerType.py 4.050242900848389
KrusellSmith.py 568.4164762496948
ConsPortfolioModelDoc.py 5.891296148300171
ChangeLiqConstr.py 1.9524343013763428
PerfForesightConsumerType.py 2.528262138366699
Gentle-Intro-To-HARK-PerfForesightCRRA.py 2.402021884918213
Chinese-Growth.py 149.02293300628662
Alternative-Combos-Of-Parameter-Values.py 8.333052158355713
LifecycleModelExample.py 4.598069906234741
HoweWeSolveIndShockConsumerType.py 1.5691490173339844
Uncertainty-and-the-Saving-Rate.py 279.39345693588257
FisherTwoPeriod.py 1.9162158966064453
MPC-Out-of-Credit-vs-MPC-Out-of-Income.py 1.6958949565887451

sbenthall · 2020-02-27T13:11:05Z

Thanks @MridulS that's super helpful

llorracc · 2020-02-27T13:16:20Z

So maybe we should just do the ones that take < 10 seconds? PS. I believe the "master" version of some of these has been moved to Seb's excellent new documentation "examples" location. Like ConsPortfolioModelDoc.py

…

On Thu, Feb 27, 2020 at 11:37 AM Mridul Seth ***@***.***> wrote: Had a script which did the timing. [notebook, time in seconds] Structural-Estimates-From-Empirical-MPCs-Fagereng-et-al.py 169.72158694267273 Nondurables-During-Great-Recession.py 82.00251603126526 DCEGM-Upper-Envelope.py 4.082727909088135 IndShockConsumerType.py 3.7925097942352295 TractableBufferStock-Interactive.py 1.7823729515075684 GenIncProcessModel.py 97.28985691070557 DiamondOLG.py 3.757750988006592 KeynesFriedmanModigliani.py 6.499566078186035 Micro-and-Macro-Implications-of-Very-Impatient-HHs.py 44.61026406288147 IncExpectationExample.py 440.81260800361633 Gentle-Intro-To-HARK-Buffer-Stock-Model.py 4.099589824676514 KinkedRconsumerType.py 4.050242900848389 KrusellSmith.py 568.4164762496948 ConsPortfolioModelDoc.py 5.891296148300171 ChangeLiqConstr.py 1.9524343013763428 PerfForesightConsumerType.py 2.528262138366699 Gentle-Intro-To-HARK-PerfForesightCRRA.py 2.402021884918213 Chinese-Growth.py 149.02293300628662 Alternative-Combos-Of-Parameter-Values.py 8.333052158355713 LifecycleModelExample.py 4.598069906234741 HoweWeSolveIndShockConsumerType.py 1.5691490173339844 Uncertainty-and-the-Saving-Rate.py 279.39345693588257 FisherTwoPeriod.py 1.9162158966064453 MPC-Out-of-Credit-vs-MPC-Out-of-Income.py 1.6958949565887451 — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#543?email_source=notifications&email_token=AAKCK75AU5IARNLKJJPMCDTRE6JWRA5CNFSM4K2TNTU2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEND25HY#issuecomment-591900319>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAKCK74YQL7O5WEWDMK3YQTRE6JWRANCNFSM4K2TNTUQ> .

-- - Chris Carroll

sbenthall · 2020-02-27T13:27:28Z

I am going to make tests based on the DemARKs on a case by case basis, in roughly [ascending in execution time, ascending alphabetically] order.

Writing an automated test suite is not just a matter of copy-and-paste.
I'm going to try to design them as high quality tests. That involves:

separate different functionality into different tests
checking a variety of test cases for a unit of functionality
trying to get good overall coverage of the functionality
reducing the amount of time it takes to run the test, while maintaining the functional coverage

As there are a lot of DemARKs, I expect to make progress on this gradually over time.

But my goal here is to put in place a high quality test suite that covers the functionality of the code without taking an excessive amount of time.

MridulS · 2020-02-27T13:34:04Z

On 27-Feb-2020, at 2:27 PM, Sebastian Benthall ***@***.***> wrote: I am going to make tests based on the DemARKs on a case by case basis, in roughly [ascending in execution time, ascending alphabetically] order. Writing an automated test suite is not just a matter of copy-and-paste. I'm going to try to design them as high quality tests. That involves: separate different functionality into different tests checking a variety of test cases for a unit of functionality trying to get good overall coverage of the functionality reducing the amount of time it takes to run the test, while maintaining the functional coverage

+1 on all. Do have a look at https://github.com/econ-ark/HARK/blob/master/HARK/ConsumptionSaving/tests/test_ConsMarkovModel.py#L53 <https://github.com/econ-ark/HARK/blob/master/HARK/ConsumptionSaving/tests/test_ConsMarkovModel.py#L53> where I started building “unit” tests for ConsumptionSaving checkMarkovInputs. Overall functional tests are definitely a better/faster way of getting coverage but unit tests are my favourite, found a couple of bugs just by writing that one unit test.

As there are a lot of DemARKs, I expect to make progress on this gradually over time. But my goal here is to put in place a high quality test suite that covers the functionality of the code without taking an excessive amount of time.

We can always have two tests suites, one working with smaller functional and unit testing. Other doing simulation testing and testing models which have compute intensive work.

…

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#543?email_source=notifications&email_token=ABI5RFB5BYEUEVLDOIXYPHTRE65UBA5CNFSM4K2TNTU2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENEL5JY#issuecomment-591969959>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABI5RFBDBPLZ3Q6L27PMTCDRE65UBANCNFSM4K2TNTUQ>.

llorracc · 2020-02-27T13:40:07Z

Hmmm, my sense is that many of the DemARKs are not worthy of such an in-depth treatment; and it would probably be redundant in that many of the DemARKs use the same functionalities in the same way as each other, so the added value from having handcrafted tools for each of them is probably much less than the added value of that time spent elsewhere. PS. The KrusellSmith DemARK should really be a REMARK. Mridul, why don't you make an issue of that for yourself and we can make it the second thing on which we test the "reproduce" tool for REMARKs.

…

On Thu, Feb 27, 2020 at 2:27 PM Sebastian Benthall ***@***.***> wrote: I am going to make tests based on the DemARKs on a case by case basis, in roughly [ascending in execution time, ascending alphabetically] order. Writing an automated test suite is not just a matter of copy-and-paste. I'm going to try to design them as high quality tests. That involves: - separate different functionality into different tests - checking a variety of test cases for a unit of functionality - trying to get good overall coverage of the functionality - reducing the amount of time it takes to run the test, while maintaining the functional coverage As there are a lot of DemARKs, I expect to make progress on this gradually over time. But my goal here is to put in place a high quality test suite that covers the functionality of the code without taking an excessive amount of time. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#543?email_source=notifications&email_token=AAKCK7265RMB6X6BSDK24RTRE65UBA5CNFSM4K2TNTU2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENEL5JY#issuecomment-591969959>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAKCK7ZPOUA7V7I447EU373RE65UBANCNFSM4K2TNTUQ> .

-- - Chris Carroll

sbenthall · 2020-02-27T13:55:18Z

One other part of my approach that I didn't mention, @llorracc , is that if I see that a DemARK is using functionality that is already tested, I won't make that test.

Basically, I'm just aiming to apply common sense and software engineering best practices here.

@MridulS I see what you are saying about unit, functional, and more advanced test suites, and thanks for pointing towards that example you wrote. I think your point about how writing good tests can lead to bug discovery is right on. We don't know what bugs may be in the existing code because we don't have tests covering all the functionality!

sbenthall · 2020-02-27T14:22:11Z

FisherTwoPeriod is a cool demo of Jupyter widgets but does not appear to use any currently uncovered HARK library features. I'll check it off.

sbenthall · 2020-03-03T21:56:19Z

As an example of how to speed up the tests:

GenIncProcessModel.py, with runtime of 97 seconds, has a simulation with T_sim = 500
Lowering this number to something smaller reduces the test time dramatically while covering the same functionality.

sbenthall · 2020-03-05T21:29:13Z

DemARKs that depend on a lot of custom code are not appropriate for being turned into automated tests for the core libraries.
I'm going to check off some of these. Current example: Chinese Growth, Ind Expectation Example.

sbenthall · 2020-03-06T15:03:35Z

Uncertainty-and-the-Saving-Rate looks to be a demonstration of cstwMPC functionality.
The cstwMPC is slated for some heavy refactoring soon, and parts of is may be removed from the HARK library. See #334 #449 #522

As @llorracc has ownership over that code at the moment, I'll remove automated testing it from the scope of this issue.

sbenthall · 2020-03-06T16:23:07Z

The KrusellSmith case is complicated by the fact that the core classes don't yet have functioning default behavior: #557

Checking it off from the list for now. That means all DemARKs have either been ticketed, test in #547, or exempted.

llorracc · 2020-03-06T16:35:25Z

Awesome! Thanks.

…

On Fri, Mar 6, 2020 at 5:23 PM Sebastian Benthall ***@***.***> wrote: The KrusellSmith case is complicated by the fact that the core classes don't yet have functioning default behavior: #557 <#557> Checking it off from the list for now. That means all DemARKs have either been ticketed, test in #547 <#547>, or exempted. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#543?email_source=notifications&email_token=AAKCK76GSVIOQJPH5EBZH23RGEPOZA5CNFSM4K2TNTU2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEOB6DYY#issuecomment-595845603>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAKCK75KCVG6B6OQKG57QZDRGEPOZANCNFSM4K2TNTUQ> .

-- - Chris Carroll

llorracc · 2020-03-13T18:17:09Z

FisherTwoPeriod is a cool demo of Jupyter widgets but does not appear to use any currently uncovered HARK library features. I'll check it off.

@sbenthall, yes. A few of the items in DemARK are basically just lecture notes for my first year course. Almost all of them could in principle be modified to use HARK, and I hope eventually to do that, but for the moment several of them are just straight jupyter/python. So, you are right to exclude them.

sbenthall self-assigned this Feb 24, 2020

sbenthall mentioned this issue Feb 26, 2020

adding test_ConsPortfolioModel #546

Merged

sbenthall added a commit to sbenthall/HARK that referenced this issue Feb 27, 2020

adding tests based on ChangeLiqConstr DemARK, see econ-ark#543

9f6e360

sbenthall mentioned this issue Feb 27, 2020

Tests from DemARK #547

Merged

sbenthall closed this as completed Mar 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automated tests for functionality in DemARKs #543

Automated tests for functionality in DemARKs #543

sbenthall commented Feb 24, 2020

sbenthall commented Feb 26, 2020 •

edited

Loading

sbenthall commented Feb 26, 2020

sbenthall commented Feb 26, 2020

llorracc commented Feb 27, 2020 via email

MridulS commented Feb 27, 2020

sbenthall commented Feb 27, 2020

llorracc commented Feb 27, 2020 via email

sbenthall commented Feb 27, 2020

MridulS commented Feb 27, 2020 via email

llorracc commented Feb 27, 2020 via email

sbenthall commented Feb 27, 2020

sbenthall commented Feb 27, 2020

sbenthall commented Mar 3, 2020

sbenthall commented Mar 5, 2020

sbenthall commented Mar 6, 2020

sbenthall commented Mar 6, 2020

llorracc commented Mar 6, 2020 via email

llorracc commented Mar 13, 2020 •

edited

Loading

Automated tests for functionality in DemARKs #543

Automated tests for functionality in DemARKs #543

Comments

sbenthall commented Feb 24, 2020

sbenthall commented Feb 26, 2020 • edited Loading

sbenthall commented Feb 26, 2020

sbenthall commented Feb 26, 2020

llorracc commented Feb 27, 2020 via email

MridulS commented Feb 27, 2020

sbenthall commented Feb 27, 2020

llorracc commented Feb 27, 2020 via email

sbenthall commented Feb 27, 2020

MridulS commented Feb 27, 2020 via email

llorracc commented Feb 27, 2020 via email

sbenthall commented Feb 27, 2020

sbenthall commented Feb 27, 2020

sbenthall commented Mar 3, 2020

sbenthall commented Mar 5, 2020

sbenthall commented Mar 6, 2020

sbenthall commented Mar 6, 2020

llorracc commented Mar 6, 2020 via email

llorracc commented Mar 13, 2020 • edited Loading

sbenthall commented Feb 26, 2020 •

edited

Loading

llorracc commented Mar 13, 2020 •

edited

Loading