Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ConditionalDistribution #1018

Merged
merged 8 commits into from
Jun 17, 2021
Merged

ConditionalDistribution #1018

merged 8 commits into from
Jun 17, 2021

Conversation

sbenthall
Copy link
Contributor

This PR introduces a ConditionalDistribution class in distribution.py

This class represents a distribution that is conditional on some input value. The imaged use case for this includes the age-varying distributions discussed in #1015 and #1017, which are used widely throughout the HARK library.

As a test of integration into the library, I've worked the new class into a couple of the shock computations in the RiskyAsset model.

The current version is tailor to be a replacement of the logic that's currently repeated throughout the model classes: the ConditionalDistribution is configured with lists of values if it represents a "time varying" distribution whose parameters are conditional on an integer index.

Requesting a review from @Mv77

  • Tests for new functionality/models or Tests to reproduce the bug-fix in code.
  • Updated documentation of features that add new functionality.
  • Update CHANGELOG.md with major/minor changes.

@sbenthall sbenthall requested a review from Mv77 June 16, 2021 20:34
@sbenthall sbenthall added this to the 1.0.0 milestone Jun 16, 2021
@codecov-commenter
Copy link

codecov-commenter commented Jun 16, 2021

Codecov Report

Merging #1018 (2f11c4d) into master (27b4567) will increase coverage by 0.19%.
The diff coverage is 94.23%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1018      +/-   ##
==========================================
+ Coverage   72.24%   72.43%   +0.19%     
==========================================
  Files          66       66              
  Lines        9832     9871      +39     
==========================================
+ Hits         7103     7150      +47     
+ Misses       2729     2721       -8     
Impacted Files Coverage Δ
HARK/ConsumptionSaving/ConsRiskyAssetModel.py 86.76% <50.00%> (+10.81%) ⬆️
HARK/distribution.py 80.60% <93.75%> (+1.25%) ⬆️
HARK/tests/test_distribution.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 27b4567...2f11c4d. Read the comment docs.

@Mv77
Copy link
Contributor

Mv77 commented Jun 17, 2021

I am checking this now.

A first point is that I am not sure whether I agree with the use of "Conditional" here. My reasoning is as follows.

I believe you envision this machinery being used, for instance, to represent shocks with distributions that vary over time, like PermShk_t and PermShk_{t+1}. The thing is that I see PermShk_t and PermShk_{t+1} as two different random variables, not the same variable with a distribution that is conditioned on different events (reaching t or reaching t+1).

That might not be its only use, however, and then the use of "Conditional" might be appropriate and the example I've just used might just be thought of as something that behaves (but is not) a conditional distribution and therefore can take advantage of the functionality.

Copy link
Contributor

@Mv77 Mv77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very good to me. I believe that it will simplify substantial portions of HARK. There is a lot of potential for simplifying things like construct_lognormal_income_process_unemployment.

I'd only debate the name of the machinery, and question whether it has any adverse effect on RNG through its use of seeds (I guess this worry would be dismissed if it passes tests).

self.shocks['Adjust'] = ConditionalDistribution(
Bernoulli,
{'p' : self.AdjustPrb},
seed=self.RNG.randint(0, 2 ** 31 - 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand the code right, you are using a single seed to draw all the shocks. The previous implementation seems to draw a different seed for every "t".

Does this have any perverse RNG implication?

if type(item0) is list:
cond = {key : val[y] for (key, val) in self.conditional.items()}
return self.engine(
seed=self.RNG.randint(0, 2 ** 31 - 1),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re: the previous comment

I see that might not be the case. The seed you pass gets used to draw the (index-varying) seeds of every item.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that might not be the case. The seed you pass gets used to draw the (index-varying) seeds of every item.

I'm glad you are sensitive to this. I'm not 100% sure I have done this correctly.

I was trying to adapt to the seed-handling techniques that @mnwhite has coded into HARK.

I'm somewhat surprised that I got the tests to pass without modifying them, to be honest, but it appears I guessed well as to how to adapt it over.

Maybe a different, more extensive test could verify that there is not a problem here.

Maybe: test the conditional distribution with two cases, but where the numerical values of the inputs in each case (say, mu and sigma), are equal. Then test to make sure samples drawn from the two cases are different? I'll code that up now and add it.

@sbenthall
Copy link
Contributor Author

Thanks for this, @Mv77

I believe you envision this machinery being used, for instance, to represent shocks with distributions that vary over time, like PermShk_t and PermShk_{t+1}. The thing is that I see PermShk_t and PermShk_{t+1} as two different random variables, not the same variable with a distribution that is conditioned on different events (reaching t or reaching t+1).

That might not be its only use, however, and then the use of "Conditional" might be appropriate and the example I've just used might just be thought of as something that behaves (but is not) a conditional distribution and therefore can take advantage of the functionality.

I suppose I can't be too partial to the naming.

I suppose I see a distribution as more abstract than a random variable. Multiple random variables can be drawn from the same distribution.

I.e, you could say, \theta_t ~ LogNormal(mu, sigma), and that also \theta_{t+1} ~ LogNormal(mu,sigma).
That says that each random variable is sampled from a particular distribution.
This class is to allow the user to define a distribution that is conditional on some other value.

Consider the case where the 'time-varying' variables are used, but rather than being age-varying, they are "season" varying. So, it's an infinite horizon model where unemployment probability (or whatever) is conditional on it being summer or winter. It would be wasteful to define a separate distribution for each age's summer unemployment probability. We need only a distribution conditional on the season.

I suppose this extended case is not so apparent given the current uses to which this has been applied, but I'm building it out in anticipation of other kinds of models that have not yet been built with HARK yet.

@sbenthall
Copy link
Contributor Author

Rename ConditionalDistribution to IndexedDistribution

@sbenthall sbenthall added the Ready-To-Merge Has been reviewed and when branch is updated and checks pass it should be merged label Jun 17, 2021
@sbenthall sbenthall merged commit 085ea3a into econ-ark:master Jun 17, 2021
Bernoulli,
{'p' : self.AdjustPrb},
seed=self.RNG.randint(0, 2 ** 31 - 1)
).draw(self.t_cycle)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sbenthall thinking about indices more carefully now, I think that according to the previous code and time indexing of solutions vs simulations this should be .draw(self.t_cycle - 1), shouldn't it?

There is still the question of why would newborns use the final parameters (0-1 = -1) but that might be a discussion for another time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that's right.

In python, an index of -1 gets you the last element of a list/collection.

so correcting this means working in the extra 'newborn' logic.

I think it would be better to figure out how to fix #1022 across the board.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well yes, newborns are a separate issue.

But the .draw(self.t_cycle) is still a bug that should be fixed by turning it into .draw(self.t_cycle - 1), isn't it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.

Prior to this PR:

AdjustPrb = self.AdjustPrb[t - 1]
                    Adjust[these] = Bernoulli(
                        AdjustPrb, seed=self.RNG.randint(0, 2 ** 31 - 1)
                    ).draw(N)

For agents of age t, use AdjustPr[t -1]. If t = 0, this would use AdjustPrb[-1], the last AdjustPrb.
This is a bug. It never surfaced because there is no test case for varying AdjustPrbs?

After #1018 merge:

(IndexDistribution of AdjustPrbs).draw(self.t_cycle)

Now the draws are "off by one".

Either way, there is a bug! But this was a case that nobody bothered writing a test for.

Whoever wrote the original AdjustPrb simulation code forgot to resolve the newborn case.

Now the question is: should we put the special newborn case in everywhere that it's been left out?

My preference is to fix #1022 instead. But your point is well-taken that this is a problem with the implementation at the moment. I'll make a separate issue for it.

@llorracc
Copy link
Collaborator

llorracc commented Jun 24, 2021 via email

@Mv77
Copy link
Contributor

Mv77 commented Jun 24, 2021

Sure thing. See you then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ready-To-Merge Has been reviewed and when branch is updated and checks pass it should be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants