ConditionalDistribution #1018

sbenthall · 2021-06-16T20:34:02Z

This PR introduces a ConditionalDistribution class in distribution.py

This class represents a distribution that is conditional on some input value. The imaged use case for this includes the age-varying distributions discussed in #1015 and #1017, which are used widely throughout the HARK library.

As a test of integration into the library, I've worked the new class into a couple of the shock computations in the RiskyAsset model.

The current version is tailor to be a replacement of the logic that's currently repeated throughout the model classes: the ConditionalDistribution is configured with lists of values if it represents a "time varying" distribution whose parameters are conditional on an integer index.

Requesting a review from @Mv77

Tests for new functionality/models or Tests to reproduce the bug-fix in code.
Updated documentation of features that add new functionality.
Update CHANGELOG.md with major/minor changes.

…to RiskyAssetModel

codecov-commenter · 2021-06-16T22:08:52Z

Codecov Report

Merging #1018 (2f11c4d) into master (27b4567) will increase coverage by 0.19%.
The diff coverage is 94.23%.

@@            Coverage Diff             @@
##           master    #1018      +/-   ##
==========================================
+ Coverage   72.24%   72.43%   +0.19%     
==========================================
  Files          66       66              
  Lines        9832     9871      +39     
==========================================
+ Hits         7103     7150      +47     
+ Misses       2729     2721       -8

Impacted Files	Coverage Δ
HARK/ConsumptionSaving/ConsRiskyAssetModel.py	`86.76% <50.00%> (+10.81%)`	⬆️
HARK/distribution.py	`80.60% <93.75%> (+1.25%)`	⬆️
HARK/tests/test_distribution.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 27b4567...2f11c4d. Read the comment docs.

Mv77 · 2021-06-17T00:47:07Z

I am checking this now.

A first point is that I am not sure whether I agree with the use of "Conditional" here. My reasoning is as follows.

I believe you envision this machinery being used, for instance, to represent shocks with distributions that vary over time, like PermShk_t and PermShk_{t+1}. The thing is that I see PermShk_t and PermShk_{t+1} as two different random variables, not the same variable with a distribution that is conditioned on different events (reaching t or reaching t+1).

That might not be its only use, however, and then the use of "Conditional" might be appropriate and the example I've just used might just be thought of as something that behaves (but is not) a conditional distribution and therefore can take advantage of the functionality.

Mv77

This looks very good to me. I believe that it will simplify substantial portions of HARK. There is a lot of potential for simplifying things like construct_lognormal_income_process_unemployment.

I'd only debate the name of the machinery, and question whether it has any adverse effect on RNG through its use of seeds (I guess this worry would be dismissed if it passes tests).

Mv77 · 2021-06-17T00:53:24Z

HARK/ConsumptionSaving/ConsRiskyAssetModel.py

+        self.shocks['Adjust'] = ConditionalDistribution(
+            Bernoulli,
+            {'p' : self.AdjustPrb},
+            seed=self.RNG.randint(0, 2 ** 31 - 1)


If I understand the code right, you are using a single seed to draw all the shocks. The previous implementation seems to draw a different seed for every "t".

Does this have any perverse RNG implication?

Mv77 · 2021-06-17T00:56:00Z

HARK/distribution.py

+        if type(item0) is list:
+            cond = {key : val[y] for (key, val) in self.conditional.items()}
+            return self.engine(
+                    seed=self.RNG.randint(0, 2 ** 31 - 1),


Re: the previous comment

I see that might not be the case. The seed you pass gets used to draw the (index-varying) seeds of every item.

I see that might not be the case. The seed you pass gets used to draw the (index-varying) seeds of every item.

I'm glad you are sensitive to this. I'm not 100% sure I have done this correctly.

I was trying to adapt to the seed-handling techniques that @mnwhite has coded into HARK.

I'm somewhat surprised that I got the tests to pass without modifying them, to be honest, but it appears I guessed well as to how to adapt it over.

Maybe a different, more extensive test could verify that there is not a problem here.

Maybe: test the conditional distribution with two cases, but where the numerical values of the inputs in each case (say, mu and sigma), are equal. Then test to make sure samples drawn from the two cases are different? I'll code that up now and add it.

sbenthall · 2021-06-17T15:03:41Z

Thanks for this, @Mv77

I believe you envision this machinery being used, for instance, to represent shocks with distributions that vary over time, like PermShk_t and PermShk_{t+1}. The thing is that I see PermShk_t and PermShk_{t+1} as two different random variables, not the same variable with a distribution that is conditioned on different events (reaching t or reaching t+1).

That might not be its only use, however, and then the use of "Conditional" might be appropriate and the example I've just used might just be thought of as something that behaves (but is not) a conditional distribution and therefore can take advantage of the functionality.

I suppose I can't be too partial to the naming.

I suppose I see a distribution as more abstract than a random variable. Multiple random variables can be drawn from the same distribution.

I.e, you could say, \theta_t ~ LogNormal(mu, sigma), and that also \theta_{t+1} ~ LogNormal(mu,sigma).
That says that each random variable is sampled from a particular distribution.
This class is to allow the user to define a distribution that is conditional on some other value.

Consider the case where the 'time-varying' variables are used, but rather than being age-varying, they are "season" varying. So, it's an infinite horizon model where unemployment probability (or whatever) is conditional on it being summer or winter. It would be wasteful to define a separate distribution for each age's summer unemployment probability. We need only a distribution conditional on the season.

I suppose this extended case is not so apparent given the current uses to which this has been applied, but I'm building it out in anticipation of other kinds of models that have not yet been built with HARK yet.

sbenthall · 2021-06-17T16:08:39Z

Rename ConditionalDistribution to IndexedDistribution

Mv77 · 2021-06-23T16:52:21Z

HARK/ConsumptionSaving/ConsRiskyAssetModel.py

+            Bernoulli,
+            {'p' : self.AdjustPrb},
+            seed=self.RNG.randint(0, 2 ** 31 - 1)
+            ).draw(self.t_cycle)


@sbenthall thinking about indices more carefully now, I think that according to the previous code and time indexing of solutions vs simulations this should be .draw(self.t_cycle - 1), shouldn't it?

There is still the question of why would newborns use the final parameters (0-1 = -1) but that might be a discussion for another time.

yes, that's right.

In python, an index of -1 gets you the last element of a list/collection.

so correcting this means working in the extra 'newborn' logic.

I think it would be better to figure out how to fix #1022 across the board.

Well yes, newborns are a separate issue.

But the .draw(self.t_cycle) is still a bug that should be fixed by turning it into .draw(self.t_cycle - 1), isn't it?

Hmm.

Prior to this PR:

AdjustPrb = self.AdjustPrb[t - 1] Adjust[these] = Bernoulli( AdjustPrb, seed=self.RNG.randint(0, 2 ** 31 - 1) ).draw(N)

For agents of age t, use AdjustPr[t -1]. If t = 0, this would use AdjustPrb[-1], the last AdjustPrb.
This is a bug. It never surfaced because there is no test case for varying AdjustPrbs?

After #1018 merge:

(IndexDistribution of AdjustPrbs).draw(self.t_cycle)

Now the draws are "off by one".

Either way, there is a bug! But this was a case that nobody bothered writing a test for.

Whoever wrote the original AdjustPrb simulation code forgot to resolve the newborn case.

Now the question is: should we put the special newborn case in everywhere that it's been left out?

My preference is to fix #1022 instead. But your point is well-taken that this is a problem with the implementation at the moment. I'll make a separate issue for it.

llorracc · 2021-06-24T13:49:30Z

Mateo, Can you join our 11:15 ARK call to discuss this set of issues? Maybe more efficient than email; and I have some input that is quicker to explain verbally.

…

On Thu, Jun 24, 2021 at 8:20 AM Sebastian Benthall ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In HARK/ConsumptionSaving/ConsRiskyAssetModel.py <#1018 (comment)>: > - Adjust = np.zeros(self.AgentCount) # Initialize shock array - for t in range(self.T_cycle): - these = t == self.t_cycle - N = np.sum(these) - if N > 0: - AdjustPrb = self.AdjustPrb[t - 1] - Adjust[these] = Bernoulli( - AdjustPrb, seed=self.RNG.randint(0, 2 ** 31 - 1) - ).draw(N) - - self.shocks["Adjust"] = Adjust + self.shocks['Adjust'] = IndexDistribution( + Bernoulli, + {'p' : self.AdjustPrb}, + seed=self.RNG.randint(0, 2 ** 31 - 1) + ).draw(self.t_cycle) Hmm. Prior to this PR: AdjustPrb = self.AdjustPrb[t - 1] Adjust[these] = Bernoulli( AdjustPrb, seed=self.RNG.randint(0, 2 ** 31 - 1) ).draw(N) For agents of age t, use AdjustPr[t -1]. If t = 0, this would use AdjustPrb[-1], the *last* AdjustPrb. This is a bug. It never surfaced because there is no test case for varying AdjustPrbs? After #1018 <#1018> merge: (IndexDistribution of AdjustPrbs).draw(self.t_cycle) Now the draws are "off by one". Either way, there is a bug! But this was a case that nobody bothered writing a test for. Whoever wrote the original AdjustPrb simulation code forgot to resolve the newborn case. Now the question is: should we put the special newborn case in everywhere that it's been left out? My preference is to fix #1022 <#1022> instead. But your point is well-taken that this is a problem with the implementation at the moment. I'll make a separate issue for it. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#1018 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAKCK76BX5PI3O5O7E7L5CLTUMPJ7ANCNFSM462G2GTA> .

-- - Chris Carroll

Mv77 · 2021-06-24T14:14:07Z

Sure thing. See you then.

sbenthall added 2 commits June 16, 2021 16:21

ConditionalDistribution class, with automated tests, and some work in…

e8fbd69

…to RiskyAssetModel

changelog for the ConditionalDistribution

8a8d67f

sbenthall requested a review from Mv77 June 16, 2021 20:34

sbenthall added this to the 1.0.0 milestone Jun 16, 2021

correct argument names for Lognormal.from_mean_std

1a9b874

Mv77 reviewed Jun 17, 2021

View reviewed changes

adding test to make sure random seeds are not too duplicated?

81e44a8

Merge branch 'master' into i1017

2d03dec

sbenthall added 3 commits June 17, 2021 12:09

Merge branch 'master' of github.com:econ-ark/HARK into i1017

68fdcbb

change ConditionalDistribution to IndexDistribution

a64b64e

Merge branch 'i1017' of github.com:sbenthall/HARK into i1017

2f11c4d

sbenthall added the Ready-To-Merge Has been reviewed and when branch is updated and checks pass it should be merged label Jun 17, 2021

sbenthall merged commit 085ea3a into econ-ark:master Jun 17, 2021

This was referenced Jun 17, 2021

age-varying distributions for shocks #1017

Closed

Allow time-varying UnempPrb #1015

Closed

Mv77 reviewed Jun 23, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ConditionalDistribution #1018

ConditionalDistribution #1018

sbenthall commented Jun 16, 2021

codecov-commenter commented Jun 16, 2021 •

edited

Loading

Mv77 commented Jun 17, 2021

Mv77 left a comment

Mv77 Jun 17, 2021

Mv77 Jun 17, 2021

sbenthall Jun 17, 2021

sbenthall commented Jun 17, 2021

sbenthall commented Jun 17, 2021

Mv77 Jun 23, 2021

sbenthall Jun 23, 2021

Mv77 Jun 23, 2021

sbenthall Jun 24, 2021

llorracc commented Jun 24, 2021 via email

Mv77 commented Jun 24, 2021

ConditionalDistribution #1018

ConditionalDistribution #1018

Conversation

sbenthall commented Jun 16, 2021

codecov-commenter commented Jun 16, 2021 • edited Loading

Codecov Report

Mv77 commented Jun 17, 2021

Mv77 left a comment

Choose a reason for hiding this comment

Mv77 Jun 17, 2021

Choose a reason for hiding this comment

Mv77 Jun 17, 2021

Choose a reason for hiding this comment

sbenthall Jun 17, 2021

Choose a reason for hiding this comment

sbenthall commented Jun 17, 2021

sbenthall commented Jun 17, 2021

Mv77 Jun 23, 2021

Choose a reason for hiding this comment

sbenthall Jun 23, 2021

Choose a reason for hiding this comment

Mv77 Jun 23, 2021

Choose a reason for hiding this comment

sbenthall Jun 24, 2021

Choose a reason for hiding this comment

llorracc commented Jun 24, 2021 via email

Mv77 commented Jun 24, 2021

codecov-commenter commented Jun 16, 2021 •

edited

Loading