Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for randomization inference #431

Merged
merged 44 commits into from
May 25, 2024
Merged

Add support for randomization inference #431

merged 44 commits into from
May 25, 2024

Conversation

s3alfisc
Copy link
Member

@s3alfisc s3alfisc commented May 5, 2024

This PR adds support for randomization inference via a ritest method for Feols.

@s3alfisc s3alfisc linked an issue May 5, 2024 that may be closed by this pull request
@apoorvalal
Copy link
Member

Hey Alex,

(possibly unsolicited) metrics advice on this PR: I think using the studentized statistic (where you calculate the t-stat as $\hat{\tau}/\sqrt{\hat{V}}$ from each permutation distribution has better properties [in both the Fisherian and Neymanian sense] than the simple approach of constructing a randomization distribution from the point-estimate alone. Shouldn't be a major change; one would presumably change _get_ritest_coefs to _get_ritest_studentized (or simply add that as an option where instead of returning the point estimate, you return the t-stat.

Ref: chap 7 of Peng Ding's book [based on this 2021 paper]

@s3alfisc
Copy link
Member Author

s3alfisc commented May 6, 2024

It's very much appreciated! The more feedback the better =) I actually started out with a t-percentile implementation, but then checked out Grant's version, which worked on the fitted betas. I'll switch back to the t-stats as a default and will enable to run tests on both t-stats and coefficients. I vaguely recally that Alwyn Young recommends to use the beta's and not t-stats for the IV wild bootstrap - you haven't happened to have seen a similar results for RI? Also, great that you point me to Ding's book, I've been looking for a good write up on RI =) Thanks!

@s3alfisc
Copy link
Member Author

I have now implemented two algos - one is "fast" and the other is "slow". Both so far only work for iid sampling.

The "slow" one simply loops over calls of "feols" or "fepois" and hence works for OLS, IV and Poisson regression. You can choose to run different variants, the randomization-c and randomization-t following naming conventions introduced by Young.

The "fast" algorithm only works for OLS and the "randomization-c" at the moment. It's vectorized and employs the FWL theorem; going forward, some speed ups should be possible by JIT compiling it via numba. Users can choose "how much" they want to vectorize (as creating a N x reps matrix can be costly if either N or reps are large). To support the "randomization-t", I will have to slightly rework the functions implemented in the vcov method / make them more "generically available".

Here's a code example:

%load_ext autoreload
%autoreload 2

import pyfixest as pf
import numpy as np
data = pf.get_data(N = 10_000)

fml = "Y ~ X1*X2*f2 |f1 + f3"

fit = pf.feols(fml, data=data)
fit.tidy().head()

rng = np.random.default_rng(1234)
fit.ritest(
    resampvar="X2",
    reps = 10_000,
    rng = rng,
    type = "randomization-c", 
    choose_algorithm = "fast",   
    algo_iterations = 1000,  # choose the number of foor loops: draws reps / algo_iterations per loop
    include_plot= True
)

image

To Do's:

  • defensive programming, type checks, assertions, etc
  • tests
  • cluster sampling
  • stratified sampling
  • CIs by test inversion
  • non-standard null hypotheses, one sided testing, etc

Overall, more work than I expected!

@s3alfisc
Copy link
Member Author

Hi @apoorvalal - one question on the "randomization-t" variant: Which defaults should I set for the computation of the vcov?

Should I default to the vcov type set in the "feols" call? Then it could in principle happend that ritest computes iid inferences even under cluster random assignment - which would not be in the spirit of Athey et al.

Here's an example:

fit = pf.feols("Y ~ X1 | f1", vcov = "iid")         # iid ses
fit.ritest(resampvar = "X1", cluster = "f1")     # cluster random assignment; ses should be CRV1-f1

Under the proposed solution, the vcov matrix in each RI-iteration would be computed as iid, despite the cluster random assignment. With this solution, I should at least add a warning message?

Alternatively, I could default to computing CRV variance matrices on the level of cluster assignment and overwrite the vcov type of the feols call.

Do you have any thoughts on this? I hope you could follow =D

@s3alfisc
Copy link
Member Author

TODO:

  • stratified sampling
  • CIs by test inversion
  • non-standard null hypotheses, one sided testing, etc
  • type hints, docstring everywhere

@apoorvalal
Copy link
Member

Hi Alex,
That's an excellent question; I'm not sure I know the answer off the top of my head. I understand the behaviour of randomization-t in the pure randomized trial with no noncompliance setting, but RI is much less clear to me in settings with non-compliance [Young's paper doesn't motivate it from potential outcomes so I don't really know how to reconcile it with Abadie et al and or Ding's papers/book].

Aronow, Chang, and Lopatto put out an interesting looking paper a couple of weeks ago that might be worth looking at as well.

@s3alfisc
Copy link
Member Author

Thanks Apporva - for now, I have deleted the vcov arg to ritest() and now by default compute the vcov as - iid if there is individual level sampling and no controls - HC1 if there is no individual level sampling and controls, and CRV1 for cluster sampling. I think that's a sensible choice, hopefully you agree? 😅

@s3alfisc
Copy link
Member Author

Open to-do's:

  • stratified sampling
  • support for IV
  • SEs, confidence intervals
  • test for hypothesis other than "beta = 0" vs "beta <> 0"

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@s3alfisc s3alfisc merged commit 2e02e0b into master May 25, 2024
7 checks passed
@s3alfisc s3alfisc deleted the ritest branch May 25, 2024 15:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for Randomization Inference
2 participants