[WIP] cleaning up strange newborn handling in ConsIndShock model #1021

sbenthall · 2021-06-21T15:58:07Z

The current handling of newborn income shocks is convoluted and prevents extensibility.

It is also apparently motivated by the improper inclusion of a hack needed for particular use case, StickyE, into the general library. See #326

This PR introduces some numpy array logic that cuts several confusing lines of code as well as correcting for this issue #326

I would actually advocate for further streamlining this code before merging.

However, tampering with this part of code should get some discussion and review. There's no way to make these fixes without changing the way in which the income process distributions are being sampled. That means that all the numerical values used in the tests for models that inherit from IndShockConsumerType will be thrown off by these changes.

Correcting those numerical values is something I'm happy to do to improve the code clarity.

Tests for new functionality/models or Tests to reproduce the bug-fix in code.
Updated documentation of features that add new functionality.
Update CHANGELOG.md with major/minor changes.

mnwhite · 2021-06-21T16:28:38Z

This pertains to the broader issue of how time is handled in HARK. The long term goal needs to be for the income (and other) shocks that an agent would experience as a newborn to be defined as an ordinary part of the specification, just not used during the solution method (because they're irrelevant).

…

On Mon, Jun 21, 2021 at 11:58 AM Sebastian Benthall < ***@***.***> wrote: The current handling of newborn income shocks is convoluted and prevents extensibility. It is also apparently motivated by the improper inclusion of a hack needed for particular use case, StickyE, into the general library. See #326 <#326> This PR introduces some numpy array logic that cuts several confusing lines of code as well as correcting for this issue #326 <#326> I would actually advocate for further streamlining this code before merging. However, tampering with this part of code should get some discussion and review. There's no way to make these fixes without changing the way in which the income process distributions are being sampled. That means that *all* the numerical values used in the tests for models that inherit from IndShockConsumerType will be thrown off by these changes. Correcting those numerical values is something I'm happy to do to improve the code clarity. - Tests for new functionality/models or Tests to reproduce the bug-fix in code. - Updated documentation of features that add new functionality. - Update CHANGELOG.md with major/minor changes. ------------------------------ You can view, comment on, or merge this pull request online at: #1021 Commit Summary - cleaning up strange newborn handling in ConsIndShock model File Changes - *M* HARK/ConsumptionSaving/ConsIndShockModel.py <https://github.com/econ-ark/HARK/pull/1021/files#diff-7f09d28a3d4136ae35d8835cd1352f060da6e0114fe617c34374c8b57d41a57e> (26) Patch Links: - https://github.com/econ-ark/HARK/pull/1021.patch - https://github.com/econ-ark/HARK/pull/1021.diff — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#1021>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADKRAFKYPQOTTKGY2GVO4WTTT5OSBANCNFSM47B3WBPA> .

sbenthall · 2021-06-21T17:05:29Z

Glad you're available to weigh in on this @mnwhite

Why isn't HARK handling newborn shocks through the "birth" process? Any good reason?
That seems where other cases like this are handled.

Another puzzling thing about this code is that for agents aged t, it uses the income shock distribution of the previous period t - 1.

These two aspects seem related.

mnwhite · 2021-06-21T18:11:11Z

I feel like we've discussed this before, a few times. Maybe? The income shocks pull from the t-1 values because of the way the data is structured. We specified everything in terms of "what information needs to be known at period t to *solve* that period's problem". Anything that fits that description has a subscript of [t] in an AgentType's attributes. So when solving the period t problem, the solver needs to know what the distribution of shocks that arrive at the start of t+1 are; for the consumption-saving class of problems, that's the income distribution. So when you *simulate* a model, and you're at the beginning of period t, you need to look at the [t-1] income distribution to find out what might happen to these agents right now. But this is unnatural labeling: it looks weird in code to grab from t-1 when simulating, and on paper we would write about shock distributions that realize in period t *as being in period t*. What should have been done, and maybe the best fix now, is to have something that indicates which variables/attributes are "time-offset" so that you really want the [t+1] value in the solver. Then specifying an agent's data would be more natural, the simulation code would look right, and we wouldn't have the wonky newborn issue. Income shocks for newborns aren't handled entirely by the birth process because that happens *before* period t shocks realize. The "interface" between periods are the model's "post-state variables": those that need to get carried from t-1 to t in order for period t states to properly exist once the period t shocks realize. So when an agent dies and is replaced, it's those "post-states" that get initialized. You *could* have the simBirth() process that fills in those initial post-state values (for a_{t-1} and p_{t-1}) also include the income shocks, but then you'd still need special code that makes sure the income given to newborns in the *ordinary* income shock step is zero.

…

On Mon, Jun 21, 2021 at 1:05 PM Sebastian Benthall ***@***.***> wrote: Glad you're available to weigh in on this @mnwhite <https://github.com/mnwhite> Why isn't HARK handling newborn shocks through the "birth" process? Any good reason? That seems where other cases like this are handled. Another puzzling thing about this code is that for agents aged t, it uses the income shock distribution of the *previous* period t - 1. These two aspects seem related. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1021 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADKRAFKKLASZ35NRMFZ4OELTT5WORANCNFSM47B3WBPA> .

Mv77 · 2021-06-21T19:05:57Z

HARK/ConsumptionSaving/ConsIndShockModel.py

-            )  # permanent "shock" includes expected growth
-            TranShkNow[these] = IncShkDstnNow.X[1][EventDraws]
-        #        PermShkNow[newborn] = 1.0
-        TranShkNow[newborn] = 1.0


This line might be one of the sources of failing tests.

2206 is the line I mean.

Mv77

I like the new structure much better.

However tests are failing and we need to figure out why. One of the ones I saw asserted that, in simulations, "mNrm" did not match what it expected. I highlighted a line that might cause this, but I'm not sure whether it is the only one.

sbenthall · 2021-06-21T19:29:40Z

@Mv77 that the tests won't pass is a known issue. See issue #326 why forcing the TranShk for newborns may be a kludge that shouldn't be in the library.

@mnwhite Just reading your explanation of how things are currently structured gives me a headache. Maybe this is a carry-over from when there was explicit "time-switching" between forward and backward time?

Let me see if I understand what you are saying:

So you start with parameters like this:

# Parameters that specify the income distribution over the lifecycle
    "PermShkStd" : [0.1,0.2,0.1,0.2,0.1,0.2,0.1,0,0,0],
    "PermShkCount" : 7,                    # Number of points in discrete approximation to permanent income shocks
    "TranShkStd" : [0.3,0.2,0.1,0.3,0.2,0.1,0.3,0,0,0],
    "TranShkCount" : 7,                    # Number of points in discrete approximation to transitory income shocks

So the 0th TransShkStd corresponds to the 0th IncShkDstn.

https://github.com/econ-ark/HARK/blob/master/HARK/ConsumptionSaving/ConsIndShockModel.py#L2709-L2731

It sounds like you are saying that in the current code this 0th IncShkDstn is supposed to be used to solve the 0th period problem, and so actually corresponds to the 1th period's income shock in forward simulation?

And because the 0th-period shock in forward simulation "doesn't matter", you just populate it with something ... in this case, a repeat of the period 1 shock (which is in the 0th place in the parameter list).

What should have been done, and maybe the best fix now, is to have
something that indicates which variables/attributes are "time-offset" so
that you really want the [t+1] value in the solver. Then specifying an
agent's data would be more natural, the simulation code would look right,
and we wouldn't have the wonky newborn issue.

Yes, that sounds much better.

I think it's a real problem that the library has been written as if the forward simulations "don't matter", when that's an important part of any modeling library. And the way things are currently is so confusing that after working on it for a year and a half I still didn't realize that this is how it worked. I would have run into an error on this and been very confused at some point. In my book, that's a bug.

Income shocks for newborns aren't handled entirely by the birth process
because that happens before period t shocks realize.

Ok, this explanation and what you said above make it clear to me why adding the shocks to the birth process is not the right solution. Thanks!

mnwhite · 2021-06-21T19:41:32Z

You have the interpretation of the meaning of the time subscripts correct. This issue isn't related to time-flipping. It's probably more related to trying to make early HARK look a lot like CDC's legacy code, which uses the same wonky time subscripting. It's not so much that it was written as if simulation "doesn't matter", more that it's definitely the much, much easier part to program. In my graduate course, I don't even teach how to program simulating a model that you've solved, because it's essentially just "program the equations of the model exactly as written". I discuss RNG and how to structure that, and what can go wrong if you *don't* put some thought into where your random draws are coming from... but simulation itself is borderline trivial. A complication like "make sure IncomeDstn is coming from the right period" is small beans compared to what's handled in the solvers.

…

On Mon, Jun 21, 2021 at 3:29 PM Sebastian Benthall ***@***.***> wrote: @Mv77 <https://github.com/Mv77> that the tests won't pass is a known issue. See issue #326 <#326> why forcing the TranShk for newborns may be a kludge that shouldn't be in the library. @mnwhite <https://github.com/mnwhite> Just reading your explanation of how things are currently structured gives me a headache. Maybe this is a carry-over from when there was explicit "time-switching" between forward and backward time? Let me see if I understand what you are saying: So you start with parameters like this: # Parameters that specify the income distribution over the lifecycle "PermShkStd" : [0.1,0.2,0.1,0.2,0.1,0.2,0.1,0,0,0], "PermShkCount" : 7, # Number of points in discrete approximation to permanent income shocks "TranShkStd" : [0.3,0.2,0.1,0.3,0.2,0.1,0.3,0,0,0], "TranShkCount" : 7, # Number of points in discrete approximation to transitory income shocks So the 0th TransShkStd corresponds to the 0th IncShkDstn. https://github.com/econ-ark/HARK/blob/master/HARK/ConsumptionSaving/ConsIndShockModel.py#L2709-L2731 It sounds like you are saying that in the current code this 0th IncShkDstn is supposed to be used to *solve* the 0th period problem, and so actually corresponds to the 1th period's income shock in forward simulation? And because the 0th-period shock in forward simulation "doesn't matter", you just populate it with something ... in this case, a repeat of the period 1 shock (which is in the 0th place in the parameter list). What should have been done, and maybe the best fix now, is to have something that indicates which variables/attributes are "time-offset" so that you really want the [t+1] value in the solver. Then specifying an agent's data would be more natural, the simulation code would look right, and we wouldn't have the wonky newborn issue. Yes, that sounds much better. I think it's a real problem that the library has been written as if the forward simulations "don't matter", when that's an important part of any modeling library. And the way things are currently is so confusing that after working on it for a year and a half I still didn't realize that this is how it worked. I would have run into an error on this and been very confused at some point. In my book, that's a *bug*. Income shocks for newborns aren't handled entirely by the birth process because that happens *before* period t shocks realize. Ok, this explanation and what you said above make it clear to me why adding the shocks to the birth process is not the right solution. Thanks! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1021 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADKRAFLDQ7DYN56QPHLNF23TT6HLHANCNFSM47B3WBPA> .

Mv77 · 2021-06-21T19:47:05Z

Re: the timing issues

An idea that I have entertained when I have dealt with this section of HARK is:
What if shocks were not the first thing that happens in a period, but the last instead?

In the ConsIndShock model, for instance, the income shocks that alter Y[t+1] occur:

In the model: at some point between the times at which the t decisions and the t+1 decisions are made. It does not matter too much whether they are the last thing to happen at t or the first to happen at t+1.
In the code: their realizations are the first thing that happens in period t+1. What I am proposing is to consider making them the last thing that happens at t.

This would amount to making get_shocks() the last thing that is done in the sim_one_period() sequence (it is currently the first).

The advantages that I see to this approach are:

The information needed by the solution and simulation methods for period t would become more aligned. What I mean is that the distribution needed to form expectations in solve and to draw shocks in sim_one_period would be the same, both with index t. This might be useful for the compartmentalization of periods and stages. A period would only need to know its states---no matter where they came from---whereas now, it needs to know last period's post-states, its own distribution for shocks, and how to combine them (transitions?) which are also needed by last period's solver. There is "redundance" or "communication" in the sense that the same distributions and transitions go into two different periods.
It would simplify the handling of newborn's states. Because of shocks being the first thing that happens in a period, we are forced to initialize newborns not with their states ([m_{0}], P_{0}]) but with their last period's post-state and the distribution of their shocks [a_{-1}, P_{-1}, PermGroFac_{-1,0}, PermShk_{0}, TransShk_{0}] that are used by get_states() to produce [m_{0}], P_{0}]. These pre-birth objects are not the way people usually specify models, and therefore this results in lines like the block that starts with

HARK/HARK/ConsumptionSaving/ConsIndShockModel.py

Line 2194 in b1e8c29

if N > 0:

Which (I think) is assuming that the distribution that applies from -1 to 0 is the same as the one that the user specifies from 0 to 1, and that there is no transitory income shock.

This is just a thought I've entertained and I just wanted to write it out in case it helps when you re-consider timing. I might be wrong and have not thought too hard about the drawbacks.

mnwhite · 2021-06-21T19:59:35Z

The downside there is that we usually write shocks that the agent learns about in period t (i.e. information that becomes available to them for their period t control decision) as belonging to period t. I would prefer it if "code math" and "paper math" aligned better. My intent with the design of the timing was that agents would never have "partial" periods: the death-and-replacement event is the *very first* thing that happens at the beginning of the period, or the interface between two periods. With the timing structure you're describing, when there's a death-and-replacement event in period t, *some* of the period t history variables belong to agent i, but others belong to the agent who replaced him later in the period. I think it's better/cleaner to fix the timing issue with an explicit tag / something that says "when solving the model at t, we need this attribute, but the t+1 index of it rather than t". Stata has notation for leads and lags, and on paper we write down the distribution of next period's income as F_{t+1}; I think something like that should have an analogous notation in HARK. As for how other people specify models, I've seen some sloppiness about timing of birth and death, seemingly for the sake of more parsimonious writing.

…

On Mon, Jun 21, 2021 at 3:47 PM Mateo Velásquez-Giraldo < ***@***.***> wrote: Re: the timing issues An idea that I have entertained when I have dealt with this section of HARK is: What if shocks were not the first thing that happens in a period, but the last instead? In the ConsIndShock model, for instance, the income shocks that alter Y[t+1] occur: - In the *model*: at some point between the times at which the t decisions and the t+1 decisions are made. It does not matter too much whether they are the last thing to happen at t or the first to happen at t+1. - In the *code*: their realizations are the first thing that happens in period t+1. What I am proposing is to consider making them the last thing that happens at t. This would amount to making get_shocks() the last thing that is done in the sim_one_period() sequence (it is currently the first). The advantages that I see to this approach are: - The information needed by the solution and simulation methods for period t would become more aligned. What I mean is that the distribution needed to form expectations in solve and to draw shocks in sim_one_period would be the same, both with index t. This might be useful for the compartmentalization of periods and stages. A period would only need to know its states---no matter where they came from---whereas now, it needs to know last period's post-states, its own distribution for shocks, and how to combine them (transitions?) which are also needed by last period's solver. There is "redundance" or "communication" in the sense that the same distributions and transitions go into two different periods. - It would simplify the handling of newborn's states. Because of shocks being the first thing that happens in a period, we are forced to initialize newborns not with their states ([m_{0}], P_{0}]) but with their last period's post-state and the distribution of their shocks [a_{-1}, P_{-1}, PermGroFac_{-1,0}, PermShk_{0}, TransShk_{0}] that are used by get_states() to produce [m_{0}], P_{0}]. These pre-birth objects are not the way people usually specify models, and therefore this results in lines like the block that starts with https://github.com/econ-ark/HARK/blob/b1e8c29dc90d06e64bad2a3f4e44f3e732b51e8c/HARK/ConsumptionSaving/ConsIndShockModel.py#L2194 Which (I think) is assuming that the distribution that applies from -1 to 0 is the same as the one that the user specifies from 0 to 1, and that there is no transitory income shock. This is just a thought I've entertained and I just wanted to write it out in case it helps when you re-consider timing. I might be wrong and have not thought too hard about the drawbacks. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1021 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADKRAFNPBRBXVCBVVZU3JMDTT6JMRANCNFSM47B3WBPA> .

mnwhite · 2021-06-21T20:02:09Z

Also: I find models where agents pop into existence with some m_t > 0 to beg the question of *where that money came from*. If it represents them being born with zero assets and then earning some labor income in their newborn period, *that should be the model*, both on paper and in code.

…

On Mon, Jun 21, 2021 at 3:59 PM Matthew White ***@***.***> wrote: The downside there is that we usually write shocks that the agent learns about in period t (i.e. information that becomes available to them for their period t control decision) as belonging to period t. I would prefer it if "code math" and "paper math" aligned better. My intent with the design of the timing was that agents would never have "partial" periods: the death-and-replacement event is the *very first* thing that happens at the beginning of the period, or the interface between two periods. With the timing structure you're describing, when there's a death-and-replacement event in period t, *some* of the period t history variables belong to agent i, but others belong to the agent who replaced him later in the period. I think it's better/cleaner to fix the timing issue with an explicit tag / something that says "when solving the model at t, we need this attribute, but the t+1 index of it rather than t". Stata has notation for leads and lags, and on paper we write down the distribution of next period's income as F_{t+1}; I think something like that should have an analogous notation in HARK. As for how other people specify models, I've seen some sloppiness about timing of birth and death, seemingly for the sake of more parsimonious writing. On Mon, Jun 21, 2021 at 3:47 PM Mateo Velásquez-Giraldo < ***@***.***> wrote: > Re: the timing issues > > An idea that I have entertained when I have dealt with this section of > HARK is: > What if shocks were not the first thing that happens in a period, but the > last instead? > > In the ConsIndShock model, for instance, the income shocks that alter > Y[t+1] occur: > > - In the *model*: at some point between the times at which the t > decisions and the t+1 decisions are made. It does not matter too much > whether they are the last thing to happen at t or the first to happen > at t+1. > - In the *code*: their realizations are the first thing that happens > in period t+1. What I am proposing is to consider making them the > last thing that happens at t. > > This would amount to making get_shocks() the last thing that is done in > the sim_one_period() sequence (it is currently the first). > > The advantages that I see to this approach are: > > - The information needed by the solution and simulation methods for > period t would become more aligned. What I mean is that the > distribution needed to form expectations in solve and to draw shocks > in sim_one_period would be the same, both with index t. This might be > useful for the compartmentalization of periods and stages. A period would > only need to know its states---no matter where they came from---whereas > now, it needs to know last period's post-states, its own distribution for > shocks, and how to combine them (transitions?) which are also needed by > last period's solver. There is "redundance" or "communication" in the sense > that the same distributions and transitions go into two different periods. > - It would simplify the handling of newborn's states. Because of > shocks being the first thing that happens in a period, we are forced to > initialize newborns not with their states ([m_{0}], P_{0}]) but with > their last period's post-state and the distribution of their shocks [a_{-1}, > P_{-1}, PermGroFac_{-1,0}, PermShk_{0}, TransShk_{0}] that are used > by get_states() to produce [m_{0}], P_{0}]. These pre-birth objects > are not the way people usually specify models, and therefore this results > in lines like the block that starts with > > https://github.com/econ-ark/HARK/blob/b1e8c29dc90d06e64bad2a3f4e44f3e732b51e8c/HARK/ConsumptionSaving/ConsIndShockModel.py#L2194 > Which (I think) is assuming that the distribution that applies from > -1 to 0 is the same as the one that the user specifies from 0 to 1, and > that there is no transitory income shock. > > This is just a thought I've entertained and I just wanted to write it out > in case it helps when you re-consider timing. I might be wrong and have not > thought too hard about the drawbacks. > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#1021 (comment)>, or > unsubscribe > <https://github.com/notifications/unsubscribe-auth/ADKRAFNPBRBXVCBVVZU3JMDTT6JMRANCNFSM47B3WBPA> > . >

sbenthall · 2021-06-22T02:29:20Z

more that it's definitely the much, much easier part to program. In my graduate course, I don't even teach how to program simulating a model that you've solved, because it's essentially just "program the equations of the model exactly as written".

It is in fact so trivial that nobody should be writing a new program to do the simulation.

Rather, the user of the software should provide the information that defines the model, then let the simulation machinery run that code intelligently.

The reason for caring about the simulation is to make sure the model is being defined cleanly using a general interface; that will also make it possible to write general solution methods.

Mv77 · 2021-06-23T16:45:20Z

HARK/ConsumptionSaving/ConsIndShockModel.py

+        draw_age[draw_age < 0] = 0
+
+        for t in np.unique(draw_age):
+            these = t == draw_age


I'm now realizing that draw_age already contains the -1 index shift. Therefore, I think indices inside the cycle should be t, not t-1. Does that sound right?

Yes, thank you, that's right.

sbenthall · 2022-04-06T16:06:04Z

Blocked on #1105

cleaning up strange newborn handling in ConsIndShock model

0868e36

sbenthall requested a review from Mv77 June 21, 2021 15:58

Mv77 reviewed Jun 21, 2021

View reviewed changes

sbenthall mentioned this pull request Jun 22, 2021

Consistent t-index for time varying shock parameters. #1022

Closed

Mv77 reviewed Jun 23, 2021

View reviewed changes

sbenthall mentioned this pull request Jun 25, 2021

[WIP] Consistent t-indexing for time varying parameters #1027

Closed

3 tasks

sbenthall mentioned this pull request Sep 2, 2021

PortfolioFrameModel and FrameAgentType #865

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] cleaning up strange newborn handling in ConsIndShock model #1021

[WIP] cleaning up strange newborn handling in ConsIndShock model #1021

sbenthall commented Jun 21, 2021

mnwhite commented Jun 21, 2021 via email

sbenthall commented Jun 21, 2021

mnwhite commented Jun 21, 2021 via email

Mv77 Jun 21, 2021

Mv77 Jun 21, 2021

Mv77 left a comment

sbenthall commented Jun 21, 2021

mnwhite commented Jun 21, 2021 via email

Mv77 commented Jun 21, 2021

mnwhite commented Jun 21, 2021 via email

mnwhite commented Jun 21, 2021 via email

sbenthall commented Jun 22, 2021

Mv77 Jun 23, 2021

sbenthall Jun 23, 2021

sbenthall commented Apr 6, 2022

[WIP] cleaning up strange newborn handling in ConsIndShock model #1021

Are you sure you want to change the base?

[WIP] cleaning up strange newborn handling in ConsIndShock model #1021

Conversation

sbenthall commented Jun 21, 2021

mnwhite commented Jun 21, 2021 via email

sbenthall commented Jun 21, 2021

mnwhite commented Jun 21, 2021 via email

Mv77 Jun 21, 2021

Choose a reason for hiding this comment

Mv77 Jun 21, 2021

Choose a reason for hiding this comment

Mv77 left a comment

Choose a reason for hiding this comment

sbenthall commented Jun 21, 2021

mnwhite commented Jun 21, 2021 via email

Mv77 commented Jun 21, 2021

mnwhite commented Jun 21, 2021 via email

mnwhite commented Jun 21, 2021 via email

sbenthall commented Jun 22, 2021

Mv77 Jun 23, 2021

Choose a reason for hiding this comment

sbenthall Jun 23, 2021

Choose a reason for hiding this comment

sbenthall commented Apr 6, 2022