-
-
Notifications
You must be signed in to change notification settings - Fork 199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] cleaning up strange newborn handling in ConsIndShock model #1021
base: master
Are you sure you want to change the base?
Conversation
This pertains to the broader issue of how time is handled in HARK. The long
term goal needs to be for the income (and other) shocks that an agent would
experience as a newborn to be defined as an ordinary part of the
specification, just not used during the solution method (because they're
irrelevant).
…On Mon, Jun 21, 2021 at 11:58 AM Sebastian Benthall < ***@***.***> wrote:
The current handling of newborn income shocks is convoluted and prevents
extensibility.
It is also apparently motivated by the improper inclusion of a hack needed
for particular use case, StickyE, into the general library. See #326
<#326>
This PR introduces some numpy array logic that cuts several confusing
lines of code as well as correcting for this issue #326
<#326>
I would actually advocate for further streamlining this code before
merging.
However, tampering with this part of code should get some discussion and
review. There's no way to make these fixes without changing the way in
which the income process distributions are being sampled. That means that
*all* the numerical values used in the tests for models that inherit from
IndShockConsumerType will be thrown off by these changes.
Correcting those numerical values is something I'm happy to do to improve
the code clarity.
- Tests for new functionality/models or Tests to reproduce the bug-fix
in code.
- Updated documentation of features that add new functionality.
- Update CHANGELOG.md with major/minor changes.
------------------------------
You can view, comment on, or merge this pull request online at:
#1021
Commit Summary
- cleaning up strange newborn handling in ConsIndShock model
File Changes
- *M* HARK/ConsumptionSaving/ConsIndShockModel.py
<https://github.com/econ-ark/HARK/pull/1021/files#diff-7f09d28a3d4136ae35d8835cd1352f060da6e0114fe617c34374c8b57d41a57e>
(26)
Patch Links:
- https://github.com/econ-ark/HARK/pull/1021.patch
- https://github.com/econ-ark/HARK/pull/1021.diff
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1021>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADKRAFKYPQOTTKGY2GVO4WTTT5OSBANCNFSM47B3WBPA>
.
|
Glad you're available to weigh in on this @mnwhite Why isn't HARK handling newborn shocks through the "birth" process? Any good reason? Another puzzling thing about this code is that for agents aged These two aspects seem related. |
I feel like we've discussed this before, a few times. Maybe?
The income shocks pull from the t-1 values because of the way the data is
structured. We specified everything in terms of "what information needs to
be known at period t to *solve* that period's problem". Anything that fits
that description has a subscript of [t] in an AgentType's attributes. So
when solving the period t problem, the solver needs to know what the
distribution of shocks that arrive at the start of t+1 are; for the
consumption-saving class of problems, that's the income distribution.
So when you *simulate* a model, and you're at the beginning of period t,
you need to look at the [t-1] income distribution to find out what might
happen to these agents right now. But this is unnatural labeling: it looks
weird in code to grab from t-1 when simulating, and on paper we would write
about shock distributions that realize in period t *as being in period t*.
What should have been done, and maybe the best fix now, is to have
something that indicates which variables/attributes are "time-offset" so
that you really want the [t+1] value in the solver. Then specifying an
agent's data would be more natural, the simulation code would look right,
and we wouldn't have the wonky newborn issue.
Income shocks for newborns aren't handled entirely by the birth process
because that happens *before* period t shocks realize. The "interface"
between periods are the model's "post-state variables": those that need to
get carried from t-1 to t in order for period t states to properly exist
once the period t shocks realize. So when an agent dies and is replaced,
it's those "post-states" that get initialized. You *could* have the
simBirth() process that fills in those initial post-state values (for
a_{t-1} and p_{t-1}) also include the income shocks, but then you'd still
need special code that makes sure the income given to newborns in the
*ordinary* income shock step is zero.
…On Mon, Jun 21, 2021 at 1:05 PM Sebastian Benthall ***@***.***> wrote:
Glad you're available to weigh in on this @mnwhite
<https://github.com/mnwhite>
Why isn't HARK handling newborn shocks through the "birth" process? Any
good reason?
That seems where other cases like this are handled.
Another puzzling thing about this code is that for agents aged t, it uses
the income shock distribution of the *previous* period t - 1.
These two aspects seem related.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1021 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADKRAFKKLASZ35NRMFZ4OELTT5WORANCNFSM47B3WBPA>
.
|
) # permanent "shock" includes expected growth | ||
TranShkNow[these] = IncShkDstnNow.X[1][EventDraws] | ||
# PermShkNow[newborn] = 1.0 | ||
TranShkNow[newborn] = 1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line might be one of the sources of failing tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2206 is the line I mean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the new structure much better.
However tests are failing and we need to figure out why. One of the ones I saw asserted that, in simulations, "mNrm" did not match what it expected. I highlighted a line that might cause this, but I'm not sure whether it is the only one.
@Mv77 that the tests won't pass is a known issue. See issue #326 why forcing the TranShk for newborns may be a kludge that shouldn't be in the library. @mnwhite Just reading your explanation of how things are currently structured gives me a headache. Maybe this is a carry-over from when there was explicit "time-switching" between forward and backward time? Let me see if I understand what you are saying: So you start with parameters like this:
So the 0th TransShkStd corresponds to the 0th IncShkDstn. It sounds like you are saying that in the current code this 0th IncShkDstn is supposed to be used to solve the 0th period problem, and so actually corresponds to the 1th period's income shock in forward simulation? And because the 0th-period shock in forward simulation "doesn't matter", you just populate it with something ... in this case, a repeat of the period 1 shock (which is in the 0th place in the parameter list).
Yes, that sounds much better. I think it's a real problem that the library has been written as if the forward simulations "don't matter", when that's an important part of any modeling library. And the way things are currently is so confusing that after working on it for a year and a half I still didn't realize that this is how it worked. I would have run into an error on this and been very confused at some point. In my book, that's a bug.
Ok, this explanation and what you said above make it clear to me why adding the shocks to the birth process is not the right solution. Thanks! |
You have the interpretation of the meaning of the time subscripts correct.
This issue isn't related to time-flipping. It's probably more related to
trying to make early HARK look a lot like CDC's legacy code, which uses the
same wonky time subscripting.
It's not so much that it was written as if simulation "doesn't matter",
more that it's definitely the much, much easier part to program. In my
graduate course, I don't even teach how to program simulating a model that
you've solved, because it's essentially just "program the equations of the
model exactly as written". I discuss RNG and how to structure that, and
what can go wrong if you *don't* put some thought into where your random
draws are coming from... but simulation itself is borderline trivial. A
complication like "make sure IncomeDstn is coming from the right period" is
small beans compared to what's handled in the solvers.
…On Mon, Jun 21, 2021 at 3:29 PM Sebastian Benthall ***@***.***> wrote:
@Mv77 <https://github.com/Mv77> that the tests won't pass is a known
issue. See issue #326 <#326> why
forcing the TranShk for newborns may be a kludge that shouldn't be in the
library.
@mnwhite <https://github.com/mnwhite> Just reading your explanation of
how things are currently structured gives me a headache. Maybe this is a
carry-over from when there was explicit "time-switching" between forward
and backward time?
Let me see if I understand what you are saying:
So you start with parameters like this:
# Parameters that specify the income distribution over the lifecycle
"PermShkStd" : [0.1,0.2,0.1,0.2,0.1,0.2,0.1,0,0,0],
"PermShkCount" : 7, # Number of points in discrete approximation to permanent income shocks
"TranShkStd" : [0.3,0.2,0.1,0.3,0.2,0.1,0.3,0,0,0],
"TranShkCount" : 7, # Number of points in discrete approximation to transitory income shocks
So the 0th TransShkStd corresponds to the 0th IncShkDstn.
https://github.com/econ-ark/HARK/blob/master/HARK/ConsumptionSaving/ConsIndShockModel.py#L2709-L2731
It sounds like you are saying that in the current code this 0th IncShkDstn
is supposed to be used to *solve* the 0th period problem, and so actually
corresponds to the 1th period's income shock in forward simulation?
And because the 0th-period shock in forward simulation "doesn't matter",
you just populate it with something ... in this case, a repeat of the
period 1 shock (which is in the 0th place in the parameter list).
What should have been done, and maybe the best fix now, is to have
something that indicates which variables/attributes are "time-offset" so
that you really want the [t+1] value in the solver. Then specifying an
agent's data would be more natural, the simulation code would look right,
and we wouldn't have the wonky newborn issue.
Yes, that sounds much better.
I think it's a real problem that the library has been written as if the
forward simulations "don't matter", when that's an important part of any
modeling library. And the way things are currently is so confusing that
after working on it for a year and a half I still didn't realize that this
is how it worked. I would have run into an error on this and been very
confused at some point. In my book, that's a *bug*.
Income shocks for newborns aren't handled entirely by the birth process
because that happens *before* period t shocks realize.
Ok, this explanation and what you said above make it clear to me why
adding the shocks to the birth process is not the right solution. Thanks!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1021 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADKRAFLDQ7DYN56QPHLNF23TT6HLHANCNFSM47B3WBPA>
.
|
Re: the timing issues An idea that I have entertained when I have dealt with this section of HARK is: In the
This would amount to making The advantages that I see to this approach are:
This is just a thought I've entertained and I just wanted to write it out in case it helps when you re-consider timing. I might be wrong and have not thought too hard about the drawbacks. |
The downside there is that we usually write shocks that the agent learns
about in period t (i.e. information that becomes available to them for
their period t control decision) as belonging to period t. I would prefer
it if "code math" and "paper math" aligned better.
My intent with the design of the timing was that agents would never have
"partial" periods: the death-and-replacement event is the *very first*
thing that happens at the beginning of the period, or the interface between
two periods. With the timing structure you're describing, when there's a
death-and-replacement event in period t, *some* of the period t history
variables belong to agent i, but others belong to the agent who replaced
him later in the period.
I think it's better/cleaner to fix the timing issue with an explicit tag /
something that says "when solving the model at t, we need this attribute,
but the t+1 index of it rather than t". Stata has notation for leads and
lags, and on paper we write down the distribution of next period's income
as F_{t+1}; I think something like that should have an analogous notation
in HARK.
As for how other people specify models, I've seen some sloppiness about
timing of birth and death, seemingly for the sake of more parsimonious
writing.
…On Mon, Jun 21, 2021 at 3:47 PM Mateo Velásquez-Giraldo < ***@***.***> wrote:
Re: the timing issues
An idea that I have entertained when I have dealt with this section of
HARK is:
What if shocks were not the first thing that happens in a period, but the
last instead?
In the ConsIndShock model, for instance, the income shocks that alter
Y[t+1] occur:
- In the *model*: at some point between the times at which the t
decisions and the t+1 decisions are made. It does not matter too much
whether they are the last thing to happen at t or the first to happen
at t+1.
- In the *code*: their realizations are the first thing that happens
in period t+1. What I am proposing is to consider making them the last
thing that happens at t.
This would amount to making get_shocks() the last thing that is done in
the sim_one_period() sequence (it is currently the first).
The advantages that I see to this approach are:
- The information needed by the solution and simulation methods for
period t would become more aligned. What I mean is that the
distribution needed to form expectations in solve and to draw shocks
in sim_one_period would be the same, both with index t. This might be
useful for the compartmentalization of periods and stages. A period would
only need to know its states---no matter where they came from---whereas
now, it needs to know last period's post-states, its own distribution for
shocks, and how to combine them (transitions?) which are also needed by
last period's solver. There is "redundance" or "communication" in the sense
that the same distributions and transitions go into two different periods.
- It would simplify the handling of newborn's states. Because of
shocks being the first thing that happens in a period, we are forced to
initialize newborns not with their states ([m_{0}], P_{0}]) but with
their last period's post-state and the distribution of their shocks [a_{-1},
P_{-1}, PermGroFac_{-1,0}, PermShk_{0}, TransShk_{0}] that are used by
get_states() to produce [m_{0}], P_{0}]. These pre-birth objects are
not the way people usually specify models, and therefore this results in
lines like the block that starts with
https://github.com/econ-ark/HARK/blob/b1e8c29dc90d06e64bad2a3f4e44f3e732b51e8c/HARK/ConsumptionSaving/ConsIndShockModel.py#L2194
Which (I think) is assuming that the distribution that applies from -1
to 0 is the same as the one that the user specifies from 0 to 1, and that
there is no transitory income shock.
This is just a thought I've entertained and I just wanted to write it out
in case it helps when you re-consider timing. I might be wrong and have not
thought too hard about the drawbacks.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1021 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADKRAFNPBRBXVCBVVZU3JMDTT6JMRANCNFSM47B3WBPA>
.
|
Also: I find models where agents pop into existence with some m_t > 0 to
beg the question of *where that money came from*. If it represents them
being born with zero assets and then earning some labor income in their
newborn period, *that should be the model*, both on paper and in code.
…On Mon, Jun 21, 2021 at 3:59 PM Matthew White ***@***.***> wrote:
The downside there is that we usually write shocks that the agent learns
about in period t (i.e. information that becomes available to them for
their period t control decision) as belonging to period t. I would prefer
it if "code math" and "paper math" aligned better.
My intent with the design of the timing was that agents would never have
"partial" periods: the death-and-replacement event is the *very first*
thing that happens at the beginning of the period, or the interface between
two periods. With the timing structure you're describing, when there's a
death-and-replacement event in period t, *some* of the period t history
variables belong to agent i, but others belong to the agent who replaced
him later in the period.
I think it's better/cleaner to fix the timing issue with an explicit tag /
something that says "when solving the model at t, we need this attribute,
but the t+1 index of it rather than t". Stata has notation for leads and
lags, and on paper we write down the distribution of next period's income
as F_{t+1}; I think something like that should have an analogous notation
in HARK.
As for how other people specify models, I've seen some sloppiness about
timing of birth and death, seemingly for the sake of more parsimonious
writing.
On Mon, Jun 21, 2021 at 3:47 PM Mateo Velásquez-Giraldo <
***@***.***> wrote:
> Re: the timing issues
>
> An idea that I have entertained when I have dealt with this section of
> HARK is:
> What if shocks were not the first thing that happens in a period, but the
> last instead?
>
> In the ConsIndShock model, for instance, the income shocks that alter
> Y[t+1] occur:
>
> - In the *model*: at some point between the times at which the t
> decisions and the t+1 decisions are made. It does not matter too much
> whether they are the last thing to happen at t or the first to happen
> at t+1.
> - In the *code*: their realizations are the first thing that happens
> in period t+1. What I am proposing is to consider making them the
> last thing that happens at t.
>
> This would amount to making get_shocks() the last thing that is done in
> the sim_one_period() sequence (it is currently the first).
>
> The advantages that I see to this approach are:
>
> - The information needed by the solution and simulation methods for
> period t would become more aligned. What I mean is that the
> distribution needed to form expectations in solve and to draw shocks
> in sim_one_period would be the same, both with index t. This might be
> useful for the compartmentalization of periods and stages. A period would
> only need to know its states---no matter where they came from---whereas
> now, it needs to know last period's post-states, its own distribution for
> shocks, and how to combine them (transitions?) which are also needed by
> last period's solver. There is "redundance" or "communication" in the sense
> that the same distributions and transitions go into two different periods.
> - It would simplify the handling of newborn's states. Because of
> shocks being the first thing that happens in a period, we are forced to
> initialize newborns not with their states ([m_{0}], P_{0}]) but with
> their last period's post-state and the distribution of their shocks [a_{-1},
> P_{-1}, PermGroFac_{-1,0}, PermShk_{0}, TransShk_{0}] that are used
> by get_states() to produce [m_{0}], P_{0}]. These pre-birth objects
> are not the way people usually specify models, and therefore this results
> in lines like the block that starts with
>
> https://github.com/econ-ark/HARK/blob/b1e8c29dc90d06e64bad2a3f4e44f3e732b51e8c/HARK/ConsumptionSaving/ConsIndShockModel.py#L2194
> Which (I think) is assuming that the distribution that applies from
> -1 to 0 is the same as the one that the user specifies from 0 to 1, and
> that there is no transitory income shock.
>
> This is just a thought I've entertained and I just wanted to write it out
> in case it helps when you re-consider timing. I might be wrong and have not
> thought too hard about the drawbacks.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#1021 (comment)>, or
> unsubscribe
> <https://github.com/notifications/unsubscribe-auth/ADKRAFNPBRBXVCBVVZU3JMDTT6JMRANCNFSM47B3WBPA>
> .
>
|
It is in fact so trivial that nobody should be writing a new program to do the simulation. Rather, the user of the software should provide the information that defines the model, then let the simulation machinery run that code intelligently. The reason for caring about the simulation is to make sure the model is being defined cleanly using a general interface; that will also make it possible to write general solution methods. |
draw_age[draw_age < 0] = 0 | ||
|
||
for t in np.unique(draw_age): | ||
these = t == draw_age |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm now realizing that draw_age
already contains the -1 index shift. Therefore, I think indices inside the cycle should be t
, not t-1
. Does that sound right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, thank you, that's right.
Blocked on #1105 |
The current handling of newborn income shocks is convoluted and prevents extensibility.
It is also apparently motivated by the improper inclusion of a hack needed for particular use case, StickyE, into the general library. See #326
This PR introduces some numpy array logic that cuts several confusing lines of code as well as correcting for this issue #326
I would actually advocate for further streamlining this code before merging.
However, tampering with this part of code should get some discussion and review. There's no way to make these fixes without changing the way in which the income process distributions are being sampled. That means that all the numerical values used in the tests for models that inherit from IndShockConsumerType will be thrown off by these changes.
Correcting those numerical values is something I'm happy to do to improve the code clarity.