-
-
Notifications
You must be signed in to change notification settings - Fork 199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ConditionalDistribution #1018
ConditionalDistribution #1018
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1018 +/- ##
==========================================
+ Coverage 72.24% 72.43% +0.19%
==========================================
Files 66 66
Lines 9832 9871 +39
==========================================
+ Hits 7103 7150 +47
+ Misses 2729 2721 -8
Continue to review full report at Codecov.
|
I am checking this now. A first point is that I am not sure whether I agree with the use of "Conditional" here. My reasoning is as follows. I believe you envision this machinery being used, for instance, to represent shocks with distributions that vary over time, like PermShk_t and PermShk_{t+1}. The thing is that I see PermShk_t and PermShk_{t+1} as two different random variables, not the same variable with a distribution that is conditioned on different events (reaching t or reaching t+1). That might not be its only use, however, and then the use of "Conditional" might be appropriate and the example I've just used might just be thought of as something that behaves (but is not) a conditional distribution and therefore can take advantage of the functionality. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks very good to me. I believe that it will simplify substantial portions of HARK. There is a lot of potential for simplifying things like construct_lognormal_income_process_unemployment
.
I'd only debate the name of the machinery, and question whether it has any adverse effect on RNG through its use of seeds (I guess this worry would be dismissed if it passes tests).
self.shocks['Adjust'] = ConditionalDistribution( | ||
Bernoulli, | ||
{'p' : self.AdjustPrb}, | ||
seed=self.RNG.randint(0, 2 ** 31 - 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand the code right, you are using a single seed to draw all the shocks. The previous implementation seems to draw a different seed for every "t".
Does this have any perverse RNG implication?
if type(item0) is list: | ||
cond = {key : val[y] for (key, val) in self.conditional.items()} | ||
return self.engine( | ||
seed=self.RNG.randint(0, 2 ** 31 - 1), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re: the previous comment
I see that might not be the case. The seed you pass gets used to draw the (index-varying) seeds of every item.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that might not be the case. The seed you pass gets used to draw the (index-varying) seeds of every item.
I'm glad you are sensitive to this. I'm not 100% sure I have done this correctly.
I was trying to adapt to the seed-handling techniques that @mnwhite has coded into HARK.
I'm somewhat surprised that I got the tests to pass without modifying them, to be honest, but it appears I guessed well as to how to adapt it over.
Maybe a different, more extensive test could verify that there is not a problem here.
Maybe: test the conditional distribution with two cases, but where the numerical values of the inputs in each case (say, mu and sigma), are equal. Then test to make sure samples drawn from the two cases are different? I'll code that up now and add it.
Thanks for this, @Mv77
I suppose I can't be too partial to the naming. I suppose I see a distribution as more abstract than a random variable. Multiple random variables can be drawn from the same distribution. I.e, you could say, \theta_t ~ LogNormal(mu, sigma), and that also \theta_{t+1} ~ LogNormal(mu,sigma). Consider the case where the 'time-varying' variables are used, but rather than being age-varying, they are "season" varying. So, it's an infinite horizon model where unemployment probability (or whatever) is conditional on it being summer or winter. It would be wasteful to define a separate distribution for each age's summer unemployment probability. We need only a distribution conditional on the season. I suppose this extended case is not so apparent given the current uses to which this has been applied, but I'm building it out in anticipation of other kinds of models that have not yet been built with HARK yet. |
Rename ConditionalDistribution to IndexedDistribution |
Bernoulli, | ||
{'p' : self.AdjustPrb}, | ||
seed=self.RNG.randint(0, 2 ** 31 - 1) | ||
).draw(self.t_cycle) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sbenthall thinking about indices more carefully now, I think that according to the previous code and time indexing of solutions vs simulations this should be .draw(self.t_cycle - 1)
, shouldn't it?
There is still the question of why would newborns use the final parameters (0-1 = -1
) but that might be a discussion for another time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that's right.
In python, an index of -1 gets you the last element of a list/collection.
so correcting this means working in the extra 'newborn' logic.
I think it would be better to figure out how to fix #1022 across the board.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well yes, newborns are a separate issue.
But the .draw(self.t_cycle)
is still a bug that should be fixed by turning it into .draw(self.t_cycle - 1)
, isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm.
Prior to this PR:
AdjustPrb = self.AdjustPrb[t - 1]
Adjust[these] = Bernoulli(
AdjustPrb, seed=self.RNG.randint(0, 2 ** 31 - 1)
).draw(N)
For agents of age t, use AdjustPr[t -1]. If t = 0, this would use AdjustPrb[-1], the last AdjustPrb.
This is a bug. It never surfaced because there is no test case for varying AdjustPrbs?
After #1018 merge:
(IndexDistribution of AdjustPrbs).draw(self.t_cycle)
Now the draws are "off by one".
Either way, there is a bug! But this was a case that nobody bothered writing a test for.
Whoever wrote the original AdjustPrb simulation code forgot to resolve the newborn case.
Now the question is: should we put the special newborn case in everywhere that it's been left out?
My preference is to fix #1022 instead. But your point is well-taken that this is a problem with the implementation at the moment. I'll make a separate issue for it.
Mateo,
Can you join our 11:15 ARK call to discuss this set of issues? Maybe more
efficient than email; and I have some input that is quicker to explain
verbally.
…On Thu, Jun 24, 2021 at 8:20 AM Sebastian Benthall ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In HARK/ConsumptionSaving/ConsRiskyAssetModel.py
<#1018 (comment)>:
> - Adjust = np.zeros(self.AgentCount) # Initialize shock array
- for t in range(self.T_cycle):
- these = t == self.t_cycle
- N = np.sum(these)
- if N > 0:
- AdjustPrb = self.AdjustPrb[t - 1]
- Adjust[these] = Bernoulli(
- AdjustPrb, seed=self.RNG.randint(0, 2 ** 31 - 1)
- ).draw(N)
-
- self.shocks["Adjust"] = Adjust
+ self.shocks['Adjust'] = IndexDistribution(
+ Bernoulli,
+ {'p' : self.AdjustPrb},
+ seed=self.RNG.randint(0, 2 ** 31 - 1)
+ ).draw(self.t_cycle)
Hmm.
Prior to this PR:
AdjustPrb = self.AdjustPrb[t - 1]
Adjust[these] = Bernoulli(
AdjustPrb, seed=self.RNG.randint(0, 2 ** 31 - 1)
).draw(N)
For agents of age t, use AdjustPr[t -1]. If t = 0, this would use
AdjustPrb[-1], the *last* AdjustPrb.
This is a bug. It never surfaced because there is no test case for varying
AdjustPrbs?
After #1018 <#1018> merge:
(IndexDistribution of AdjustPrbs).draw(self.t_cycle)
Now the draws are "off by one".
Either way, there is a bug! But this was a case that nobody bothered
writing a test for.
Whoever wrote the original AdjustPrb simulation code forgot to resolve the
newborn case.
Now the question is: should we put the special newborn case in everywhere
that it's been left out?
My preference is to fix #1022
<#1022> instead. But your point is
well-taken that this is a problem with the implementation at the moment.
I'll make a separate issue for it.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1018 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAKCK76BX5PI3O5O7E7L5CLTUMPJ7ANCNFSM462G2GTA>
.
--
- Chris Carroll
|
Sure thing. See you then. |
This PR introduces a
ConditionalDistribution
class indistribution.py
This class represents a distribution that is conditional on some input value. The imaged use case for this includes the age-varying distributions discussed in #1015 and #1017, which are used widely throughout the HARK library.
As a test of integration into the library, I've worked the new class into a couple of the shock computations in the RiskyAsset model.
The current version is tailor to be a replacement of the logic that's currently repeated throughout the model classes: the ConditionalDistribution is configured with lists of values if it represents a "time varying" distribution whose parameters are conditional on an integer index.
Requesting a review from @Mv77