Skip to content

generic tools for causal inference with panel data

Notifications You must be signed in to change notification settings

bcallaway11/ptetools

Repository files navigation

Panel Treatment Effects (pte) Package pte

The pte package compartmentalizes the steps needed to implement estimators of group-time average treatment effects (and their aggregations) in order to make it easier to apply the same sorts of arguments outside of their “birthplace” in the literature on difference-in-differences.

This code is lightweight, only works for balanced panels, and has minimal error checking. That said, it should be useful projects that build on top of group-time average treatment effects in order to deliver estimates of causal effects in panel data settings.

The main function is called pte. The most important paramters that it takes in are subset_fun and attgt_fun. These are functions that the user should pass to pte.

subset_fun takes in the overall data, a group, a time period, and possibly other arguments and returns a data.frame containing the relevant subset of the data, an outcome, and whether or not a unit should be considered to be in the treated or comparison group for that group/time. There is one example of a relevant subset function provided in the package: the two_by_two_subset function. This function takes an original dataset, subsets it into pre- and post-treatment periods and denotes treated and untreated units. This particular subset is perhaps the most common/important one for thinking about treatment effects with panel data.

The other main function is attgt_fun. This function should be able to take in the correct subset of data, possibly along with other arguments to the function, and report an ATT for that subset. With minor modification, this function should be availble for most any sort of treatment effects application — for example, if you can solve the baseline 2x2 case in difference in differences, you should use that function here, and the pte package will take care of dealing with the variation in treatment timing.

If attgt_fun returns an influence function, then the pte package will also conduct inference using the multiplier bootstrap (which is fast) and produce uniform confidence bands (which adjust for multiple testing).

The default output of pte is an overall treatment effect on the treated (i.e., across all groups that participate in the treatment in any time period) and dynamic effects (i.e., event studies). More aggregations are possible, but these seem to be the leading cases; aggregations of group-time average treatment effects are discussed at length in Callaway and Sant’Anna (2021).

Here are a few examples:

Example 1: Difference in differences

The did package, which is based on Callaway and Sant’Anna (2021), includes estimates of group-time average treatment effects, ATT(g,t), based on a difference in differences identification strategy. The following example demonstrates that it is easy to compute group-time average treatment effects using difference in differences using the pte package. [Note: This is definitely not the recommended way of doing this as there is very little error handling, etc. here, but it is rather a proof of concept. You should use the did package for this case.]

This example reproduces DID estimates of the effect of the minimum wage on employment using data from the did package.

library(did)
data(mpdta)
did_res <- pte(yname="lemp",
               gname="first.treat",
               tname="year",
               idname="countyreal",
               data=mpdta,
               setup_pte_fun=setup_pte,
               subset_fun=two_by_two_subset,
               attgt_fun=did_attgt,
               xformla=~lpop) 

summary(did_res)
#> 
#> Overall ATT:  
#>      ATT    Std. Error     [ 95%  Conf. Int.]  
#>  -0.0323        0.0126    -0.0569     -0.0076 *
#> 
#> 
#> Dynamic Effects:
#>  Event Time Estimate Std. Error   [95%  Conf. Band]  
#>          -3   0.0269     0.0168 -0.0159      0.0697  
#>          -2  -0.0050     0.0125 -0.0368      0.0269  
#>          -1  -0.0229     0.0126 -0.0550      0.0093  
#>           0  -0.0201     0.0131 -0.0534      0.0132  
#>           1  -0.0547     0.0179 -0.1003     -0.0091 *
#>           2  -0.1382     0.0409 -0.2422     -0.0341 *
#>           3  -0.1069     0.0341 -0.1936     -0.0202 *
#> ---
#> Signif. codes: `*' confidence band does not cover 0
ggpte(did_res)

What’s most interesting here, is that the only “new” code that needs to be writte is in the did_attgt function. You will see that this is a very small amount of code.

Example 2: Policy Evaluation during a Pandemic

As a next example, consider trying to estimate effects of Covid-19 related policies during a pandemic. The estimates below are for the effects of state-leve shelter-in-place orders during the early part of the pandemic.

Callaway and Li (2021) argue that a particular unconfoundedness-type strategy is more appropriate in this context than DID-type strategies due to the spread of Covid-19 cases being highly nonlinear. However, they still deal with the challenge of variation in treatment timing. Therefore, it is still useful to think about group-time average treatment effects, but the DID strategy should be replaced with their particular unconfoundedness type assumption.

The pte package is particularly useful here.

# formula for covariates
xformla <- ~ current + I(current^2) + region + totalTestResults
covid_res <- pte(yname="positive",
                 gname="group",
                 tname="time.period",
                 idname="state_id",
                 data=covid_data2,
                 setup_pte_fun=setup_pte_basic,
                 subset_fun=two_by_two_subset,
                 attgt_fun=covid_attgt,
                 xformla=xformla,
                 max_e=21,
                 min_e=-10) 
#> Warning in compute.aggte(MP = MP, type = type, balance_e = balance_e, min_e =
#> min_e, : Simultaneous conf. band is somehow smaller than pointwise one using
#> normal approximation. Since this is unusual, we are reporting pointwise
#> confidence intervals

summary(covid_res)
#> 
#> Overall ATT:  
#>      ATT    Std. Error     [ 95%  Conf. Int.] 
#>  14.8882       58.1647   -99.1125    128.8888 
#> 
#> 
#> Dynamic Effects:
#>  Event Time Estimate Std. Error     [95%  Conf. Band] 
#>         -10  -3.7266     3.3841  -13.4426      5.9893 
#>          -9   2.6607     1.5017   -1.6509      6.9722 
#>          -8   0.8290     2.1465   -5.3339      6.9919 
#>          -7   5.2843     1.9978   -0.4516     11.0203 
#>          -6   2.8555     1.8211   -2.3730      8.0841 
#>          -5   1.3589     4.0879  -10.3778     13.0956 
#>          -4   0.3294     3.9431  -10.9916     11.6504 
#>          -3  -4.2227     6.9336  -24.1295     15.6841 
#>          -2  -3.8447     2.4340  -10.8329      3.1434 
#>          -1  -0.2234     3.3015   -9.7021      9.2553 
#>           0 -10.8156     8.8849  -36.3249     14.6936 
#>           1 -13.7998    12.2977  -49.1073     21.5077 
#>           2  -7.8432    11.9865  -42.2575     26.5711 
#>           3  -4.5541    11.7315  -38.2362     29.1279 
#>           4  -3.5368    12.7993  -40.2847     33.2110 
#>           5   8.5221    11.1569  -23.5102     40.5544 
#>           6   1.1140    17.8634  -50.1732     52.4012 
#>           7   6.6384    24.7105  -64.3072     77.5841 
#>           8   7.1288    23.8552  -61.3613     75.6189 
#>           9  10.8758    30.7177  -77.3170     99.0687 
#>          10  17.5057    34.2477  -80.8221    115.8335 
#>          11  40.8318    46.6352  -93.0614    174.7250 
#>          12  48.6134    57.4288 -116.2692    213.4960 
#>          13  52.4228    64.1772 -131.8350    236.6806 
#>          14  50.2000    47.2508  -85.4608    185.8608 
#>          15  68.2960    66.0155 -121.2395    257.8315 
#>          16  44.7305    85.7855 -201.5661    291.0272 
#>          17  61.4670    76.4826 -158.1205    281.0544 
#>          18  50.4635   114.1011 -277.1294    378.0564 
#>          19  47.3392   129.9716 -325.8193    420.4976 
#>          20  28.6326   121.7077 -320.7994    378.0646 
#>          21   4.3445   133.7291 -379.6020    388.2910 
#> ---
#> Signif. codes: `*' confidence band does not cover 0
ggpte(covid_res) + ylim(c(-1000,1000))

What’s most interesting is just how little code needs to be written here. The only new code required is the ppe::covid_attgt function which is available here, and, as you can see, this is very simple.

Example 3: Empirical Bootstrap

The code above used the multiplier bootstrap. The great thing about the multiplier bootstrap is that it’s fast. But in order to use it, you have to work out the influence function for the estimator of ATT(g,t). Although I pretty much always end up doing this, it can be tedious, and it can be nice to get a working version of the code for a project going before working out the details on the influence function.

The pte package can be used with the empirical bootstrap. There are a few limitations. First, it’s going to be substantially slower. Second, this code just reports pointwise confidence intervals. However, this basically is set up to fit into my typical workflow, and I see this as a way to get preliminary results.

Let’s demonstrate it. To do this, consider the same setup as in Example 1, but where no influence function is returned. Let’s write the code for this:

# did with no influence function
did_attgt_noif <- function(gt_data, xformla, ...) {

  # call original function
  did_gt <- did_attgt(gt_data, xformla, ...)

  # remove influence function
  did_gt$inf_func <- NULL

  did_gt
}

Now, we can show the same sorts of results as above

did_res_noif <- pte(yname="lemp",
                    gname="first.treat",
                    tname="year",
                    idname="countyreal",
                    data=mpdta,
                    setup_pte_fun=setup_pte,
                    subset_fun=two_by_two_subset,
                    attgt_fun=did_attgt_noif, #this is only diff.
                    xformla=~lpop) 

summary(did_res_noif)
#> 
#> Overall ATT:  
#>      ATT    Std. Error     [ 95%  Conf. Int.]  
#>  -0.0323        0.0107    -0.0533     -0.0113 *
#> 
#> 
#> Dynamic Effects:
#>  Event Time Estimate Std. Error [95% Pointwise  Conf. Band]  
#>          -3   0.0269     0.0151         -0.0026      0.0565  
#>          -2  -0.0050     0.0115         -0.0275      0.0176  
#>          -1  -0.0229     0.0153         -0.0529      0.0072  
#>           0  -0.0201     0.0102         -0.0400     -0.0002 *
#>           1  -0.0547     0.0172         -0.0885     -0.0210 *
#>           2  -0.1382     0.0391         -0.2148     -0.0616 *
#>           3  -0.1069     0.0379         -0.1813     -0.0325 *
#> ---
#> Signif. codes: `*' confidence band does not cover 0
ggpte(did_res_noif)

What’s exciting about this is just how little new code needs to be written.

About

generic tools for causal inference with panel data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages