-
-
Notifications
You must be signed in to change notification settings - Fork 199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
functional simulation test #1148
Conversation
Codecov ReportBase: 73.73% // Head: 73.62% // Decreases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## master #1148 +/- ##
==========================================
- Coverage 73.73% 73.62% -0.11%
==========================================
Files 72 72
Lines 11561 11513 -48
==========================================
- Hits 8524 8476 -48
Misses 3037 3037
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
In order to preserve the old values in case a low-precision test is desired, I'll comment out the old tests and leave them in the suite. |
np.mean(self.economy.MrkvNow_hist), | ||
0.4818181818181818 | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In conversations with Chris and others, this types of tests was controversial.
Notice is not testing for a single simulation value, but for the mean of a long history of simulations. Chris liked these tests because they are informative (if the mean of 10k simulations is 30% off of where we expect it to be, there is something wrong). You (Seb) disagreed because technically there is a greater than zero probability of that event occurring even if all the code is fine.
I see both points.
I'd vote for keeping them (because they are informative) but making them have a very loose tolerance. Say, 10% relative tolerance for the mean of a 10k obs simulation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, happy to revise my position if this is heretical from a software engineering/comp-sci point of view
@@ -185,15 +189,15 @@ def test_methods(self): | |||
|
|||
# testing update_solution_terminal() | |||
self.assertEqual( | |||
self.agent.solution_terminal.cFunc[0](10,self.economy.MSS), | |||
self.agent.solution_terminal.cFunc[0](10, 13.32722), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test checks a solution, not a simulation. I'd leave it in; there is nothing random about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is economy.MSS
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, it's self.kSS * self.RfreeSS + self.wRteSS
. Of course. I'll change it back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, MSS is stochastic, so this test and the one above can leave.
10 | ||
) | ||
|
||
self.assertAlmostEqual( | ||
self.economy.agents[0].solution[0].cFunc[0]( | ||
10,self.economy.MSS | ||
10, 13.32722 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leave this one in. Checks a solution, not a simulation
@@ -100,4 +100,5 @@ def test_simulation(self): | |||
self.agent.initialize_sim() | |||
self.agent.simulate() | |||
|
|||
self.assertAlmostEqual(np.mean(self.agent.history["mLvl"]), 1.2043946738813716) | |||
# simulation test -- seed/generator specific |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests for a moment, not a specific draw. Might leave in or not
@@ -36,15 +36,19 @@ def test_simOnePeriod(self): | |||
self.pcct.track_vars += ["aNrm"] | |||
self.pcct.initialize_sim() | |||
|
|||
self.assertFalse(np.any(self.pcct.shocks["Adjust"])) | |||
# simulation test -- seed/generator specific |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is agent.pcct an analogue of agent.history?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, self.pcct
refers to the AgentType
object.
This is testing the stored values of that object's shocks
dictionary after some number of simulated steps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, this is a shock and should be removed.
self.assertAlmostEqual(self.pcct.shocks["PermShk"][0], 0.85893446) | ||
|
||
self.assertAlmostEqual(self.pcct.shocks["TranShk"][0], 1.0) | ||
self.assertAlmostEqual( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are what you have called 'functional' tests.
I want to ask what kind of error you see these catching.
The only way I can see this failing is someone messing up the transition equations of the simulation method, or introducing a bug in the time indexing of hark.core
so that the states and shocks get badly out of sync. Both are valuable, I just want to know if you see other possible cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are the generalization of the specific value test that makes the fewest assumptions.
I think you have pointed out two ways the test could fail.
I've been thinking about it but can't come up with others. Why do you ask?
|
||
self.assertAlmostEqual(self.pcct.controls["Share"][0], 0.8627164488246847) | ||
self.assertAlmostEqual(self.pcct.controls["cNrm"][0], 1.67874799) | ||
# a drawn shock ; may not be robust to RNG/disitrubition implementations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd remove this one? It tests a specific draw
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
@@ -94,8 +95,9 @@ def test_simulated_values(self): | |||
self.agent.simulate() | |||
|
|||
self.assertAlmostEqual(self.agent.MPCnow[1], 0.5711503906043797) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this test might have to go too. MPC now depends on assets now which are stochastic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good to know, thanks
# simulation tests -- seed/generator specific | ||
# But these are based on aggregate population statistics. | ||
# WARNING: May fail stochastically, or based on specific RNG types. | ||
self.assertAlmostEqual(c_std2, 0.0376882) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we keep these in we might want a low tolerance for them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But if there's no objection to me removing them, and it sounds like there isn't, I will.
I'll respond to your point about tests of sampled moments in the main thread.
@@ -58,7 +59,8 @@ def test_simulated_values(self): | |||
self.agent.simulate() | |||
self.assertAlmostEqual(self.agent.MPCnow[1], 0.5711503906043797) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MPCnow is stochastic
@@ -65,19 +65,20 @@ def test_simulation(self): | |||
) # This implicitly uses the assign_parameters method of AgentType | |||
|
|||
# Create PFexample object | |||
self.agent_infinite.track_vars = ["mNrm"] | |||
self.agent_infinite.track_vars = ["bNrm", "mNrm", "TranShk"] | |||
self.agent_infinite.initialize_sim() | |||
self.agent_infinite.simulate() | |||
|
|||
self.assertAlmostEqual( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another one of those we might keep in with a high tolerance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I've done here (just to mix it up...) is keep the test for the moment (the mean) but turned it into a test of the transition function.
HARK/tests/test_distribution.py
Outdated
|
||
def test_MVNormal(self): | ||
|
||
## Are these tests generator/backend specific? | ||
dist = MVNormal() | ||
|
||
self.assertTrue( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is generator-specific
np.mean(self.agent_infinite.history["mNrm"], axis=1)[100], | ||
-29.140261331951606, | ||
) | ||
# simulation test -- seed/generator specific |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might keep in with high tolerance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sbenthall this looks like a move in a direction we should be moving towards. It removes a bunch of the tests that are RNG-sensitive.
There is the question of whether we want to keep the ones that check for a moment (e.g. the mean) of a large number of draws, which should be much less sensitive to RNG. I see arguments both for and against them.
@Mv77 thanks for looking over all of this. To elaborate a bit on the history, @sbenthall and I had some past disagreements about what should be tested, until finally I realized that his point was that the traditional purpose of such tests was to discover places where a change just caused code to stop working in even the most elemental sense. What I had wanted was to test whether the code produced the "right answer" in a substantive sense (or at least the same answer to a substantive question that had been produced by previous versions of the code). We now have some tests of one kind and some of the other, and perhaps we ought to try to distinguish them more explicitly so that any software engineers who come to the project will understand which tests are of the "does it run" kind and which are of the "is it right" kind. (I can see how "is it right" doesn't make sense for many software projects, like, say, a Word processor or whatever). In any case, for the "is it right" kinds of tests my sense is that the appropriate choice is probably to set some threshold of the kind you propose -- is the new answer within x percent of the old answer. But I think "x" should probably be something like 0.1 percent, not 15 percent, and not the 12 digits or 8 digits of floating point precision we had used in some of the "is it right" tests before. |
Yes, I remember I was part of some of thise conversations and I remember them generating some disagreement. I see the value for having both kinds of tests. I trust @sbenthall's wisdom that perhaps a unit test is not the right way to check whether the stochastic properties of objects "behave well." Maybe we should figure out a more fitting way to do that, raising warnings or who knows what. But that's one for the future, I think. With this PR Seb clears a bunch of tests that we have agreed definitely should not be there, and that is a very good thing. So I'd be very happy to merge this in and postpone the useful discussion of how to do the other type of tests. |
Go ahead and merge. (My message was partly just to codify our past
conversations for future contributors).
On Wed, Sep 21, 2022 at 6:01 PM Mateo Velásquez-Giraldo < ***@***.***> wrote:
Yes, I remember I was part of some of thise conversations and I remember
them generating some disagreement.
I see the value for having both kinds of tests.
I trust @sbenthall <https://github.com/sbenthall>'s wisdom that perhaps a
unit test is not the right way to check whether the stochastic properties
of objects "behave well." Maybe we should figure out a more fitting way to
do that, raising warnings or who knows what.
But that's one for the future, I think. With this PR Seb clears a bunch of
tests that we have agreed definitely should not be there, and that is a
very good thing. So I'd be very happy to merge this in and postpone the
useful discussion of how to do the other type of tests.
—
Reply to this email directly, view it on GitHub
<#1148 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAKCK72WXI37A5TW53VTYFDV7OVV3ANCNFSM5XOVTTGA>
.
You are receiving this because you commented.Message ID:
***@***.***>
--
Sent from Gmail Mobile
|
I will wait for Seb to reply to the review comments |
Hello, Thanks for the review @Mv77 I've removed/repaired tests as per your recommendations. As for testing sampled moments, I see it like this:
|
Yup, I accept your wisdom about false positives, and your optimism about the future! This is not a concern that blocks this PR but a question instead, which could be placed somewhere else for a continued discussion: how does the comp-sci/software development community deal with 'testing' the robustness of things that should always be true in a fuzzy sense about the software? Like the accuracy of a numerical method: you know that changes to the algorithm can introduce small changes in the exact output, but you'd like to have a check that lets you know if the output moves away from a known theoretical answer. I trust your advice that unit tests are not the place to do this. But is there some other way to do this? How do Matlab and Mathematica check that their differential equation solvers do not break when they make a marginal improvement to their matrix multiplication algorithm? |
My sense is that the way to proceed on all of this is to set fairly loose tests regarding the final results, and successively tighter tests for the components that lead to the final result. If the existing code says that the value of But I would not agree with the proposition that we should not test for This lets us build our testing apparatus as needed, and prevents us from writing tests for cases where in practice there is never a problem. |
I assume that you mean probabilistic, rather than fuzzy. Fuzzy logic being a rather different beast which allows for continuously valued truth values. It's a good question, and not one I've looked into carefully before. But after doing a little searching and intuiting... It looks like there are at least a couple things we haven't considered:
These two articles (from 2016 and 2018) are near-top Google hits on the topic, and indicate that this is an active research area and unsolved problem. "Probabilistic programming" is a relatively recent research area which has yet to find mainstream uptake and applications. https://alexey.radul.name/ideas/2016/on-testing-probabilistic-programs/ https://www.cs.cornell.edu/~legunsen/pubs/DuttaETAL18ProbFuzz.pdf |
I think I meant fuzzy in the fuzzy logic sense. The tests we are talking about would not be of the type "Is A==B" which is either true or false. We'd like to have an answer to "Is The fuzzy truth value of the answer to that question is, as you said, stochastic. |
Addresses #1105 by replacing tests on simulation results that target specific values to target relations between values based on the transition equations of the model.
Maybe to introduce another way to test simulations also.
Please ensure your pull request adheres to the following guidelines: