Time Varying Covariate #111

bcallaway11 · 2021-12-20T21:20:18Z

bcallaway11
Dec 20, 2021
Maintainer

We should think some about adding options related to time-varying covariates.

From @jtorcasso #71, which I think is a representative example:

Would it be possible to include support for time-varying covariates as a new feature?

Let's say we are evaluating a small program that rolls out in different zips across the US. We expect that the program won't affect things like state population or the unemployment rate, but that these could confound our estimate of the treatment effect. So we'd like to control this time-varying covariates.

Within the existing package, we could estimate the change in the unemployment rate between two fixed time points and include this as a time invariant control, but if we have staggered roll out, a fixed time point may not line up well depending on each group's treatment timing. We could then include changes from other time points, but this doesn't seem very parsimonious.

Instead, if the software supported time-varying controls separate from time-invariant controls, each 2x2 DID could calculate the change in the time-varying control (between the reference period and current time period) and then account for this covariate similarly to how the software handles time-invariant controls.

Would this be possible/useful?

Current Functionality
We would take the covariates from the base period for both the treated group and the untreated group. So, in the above example, we would condition on pre-treatment unemployment levels in different zip codes. In my view, this is probably better than traditional regression DID's because (i) it probably allows for some forms of the treatment affecting the covariates, and (ii) it more obviously compares zip codes with similar unemployment rates (in particular, regressions probably either are comparing locations with the same change in unemployment (which seems awkward) or are likely to be much more highly dependent on the functional form being correctly specified).

That being said...
It is at least worth thinking about allowing the user some control over this. I think that it would be medium-difficult to implement allowing the user to decide if they want to condition on something like (i) pre-treatment covariates, (ii) changes in covariates over time, or (iii) both.

I'm open to any feedback/user-comments on how useful this would be. Thoughts @jtorcasso, @pedrohcgs?

jtorcasso · 2021-12-21T01:24:52Z

jtorcasso
Dec 21, 2021

@bcallaway11: I agree with (i), in most cases I'm wary of controlling for covariates that change after treatment, in case there is some effect of the treatment operating through the covariates. In that case, I wouldn't want to control for the change.

But I'm thinking of a situation where you may have a treatment that could not affect the covariate. For instance, let's say I wanted to determine the effect of a nation-wide, county-level apprenticeship program for persons with disabilities. The program probably doesn't affect regional unemployment rates, but changes to the regional economy (unemployment rates) could affect the unemployment rate of persons with disabilities. I could mistake changes associated with the regional labor market with the impacts of the program. In this case, it may be useful to control for the change in regional unemployment, in case counties tended to adopt the program during times of economic expansion.

To your point (ii), I'm wondering if, within this example, controlling for changes also "seems awkward," or if you think there is a different solution within your current framework.

To add on to your proposal: I would second allowing the user to specify both types of controls for covariates--both changes and the (base period) level. May also be useful to allow controls for changes AND percent changes.

0 replies

bcallaway11 · 2021-12-21T16:26:36Z

bcallaway11
Dec 21, 2021
Maintainer Author

Yeah, this is really interesting I think. I totally agree with you that there are lots of cases where the treatment wouldn't effect the covariates.

I also think you have a good example about the unemployment rate. In that example, we would currently just control for pre-treatment unemployment rates. What seems weird to me about only controlling for the change in unemployment rates is that regions whose unemployment rate went from, say, 9%-7% could serve as comparison units for regions that went from 3%-1%. That said, I think you could make a strong case that it would make sense to control for both the pre-treatment level and the change over time.

I don't really think that there is a big conceptual hurdle to implementing this either. It would just amount to allowing users to include \Delta X as a covariate (not manually, but us doing this behind the scenes if they say that they want this).

I'm leaning towards implementing this, but I'll probably need some time....

0 replies

jtorcasso · 2021-12-22T15:18:22Z

jtorcasso
Dec 22, 2021

Sounds good. Do you think a good UX would be two separate x formulas? I'd offer to help, but I'm not sure how localized the change would be, or if it would require changing many functions. The biggest difficulty I foresee is that this new X, "delta X", would have to be defined at the time of estimation, since it depends on g and t. So its not like you can just define new Xs and carry them around business as usual.

0 replies

bcallaway11 · 2021-12-22T15:43:46Z

bcallaway11
Dec 22, 2021
Maintainer Author

Yes, getting the right interface might require some thought. I'm not dead-set on this, but I'm kind of disinclined to have a second x formula for time varying covariates. Perhaps we could add an extra argument called time_varying_covs that can take the following values:

"pre" - and just go with the existing behavior
"change" - only use the change in covariates over time
"both" - include both the pre-treatment level and change
This doesn't give tight control over specifying different combinations here, though I don't suspect this is a main case. Could also allow users to pass covariates in by name here.

I think the only place where the code would need to be adjusted is here. At that line, disdat is two periods of panel data. You would only need to figure out which covariates vary over time and include them in "x" going forward.

Some more things that are worth thinking about though:

All this is not feasible at all with repeated cross sections
The solution above is not going to work with unbalanced panel data either (this might be a much harder case).

0 replies

YutingYale · 2024-09-03T05:10:52Z

YutingYale
Sep 3, 2024

Can I ask a follow-up question on the time-varying covariates? I’m considering including some time-varying covariates specific to the year of the treatment (year 0) to include in the X formula. The idea is that covariates in year 0 (the year of the treatment) wouldn’t be affected by the treatment yet, but they might have specific trends that could confound the effect of the treatment on outcomes.

In order to do this, I created a set of covariates specific to year 0 and treated them as time-invariant. Since did package only allows for control in the base period (in this case year -1), would the above approach be feasible for IPW within the package to take into account the covariates in year 0? Those covariates in year 0 may also predict the treatment status.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Time Varying Covariate #111

{{title}}

Replies: 5 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Time Varying Covariate #111

bcallaway11 Dec 20, 2021 Maintainer

Replies: 5 comments

jtorcasso Dec 21, 2021

bcallaway11 Dec 21, 2021 Maintainer Author

jtorcasso Dec 22, 2021

bcallaway11 Dec 22, 2021 Maintainer Author

YutingYale Sep 3, 2024

bcallaway11
Dec 20, 2021
Maintainer

jtorcasso
Dec 21, 2021

bcallaway11
Dec 21, 2021
Maintainer Author

jtorcasso
Dec 22, 2021

bcallaway11
Dec 22, 2021
Maintainer Author

YutingYale
Sep 3, 2024