-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for randomization inference #431
Conversation
Hey Alex, (possibly unsolicited) metrics advice on this PR: I think using the studentized statistic (where you calculate the t-stat as Ref: chap 7 of Peng Ding's book [based on this 2021 paper] |
It's very much appreciated! The more feedback the better =) I actually started out with a t-percentile implementation, but then checked out Grant's version, which worked on the fitted betas. I'll switch back to the t-stats as a default and will enable to run tests on both t-stats and coefficients. I vaguely recally that Alwyn Young recommends to use the beta's and not t-stats for the IV wild bootstrap - you haven't happened to have seen a similar results for RI? Also, great that you point me to Ding's book, I've been looking for a good write up on RI =) Thanks! |
I have now implemented two algos - one is "fast" and the other is "slow". Both so far only work for iid sampling. The "slow" one simply loops over calls of "feols" or "fepois" and hence works for OLS, IV and Poisson regression. You can choose to run different variants, the The "fast" algorithm only works for OLS and the "randomization-c" at the moment. It's vectorized and employs the FWL theorem; going forward, some speed ups should be possible by JIT compiling it via numba. Users can choose "how much" they want to vectorize (as creating a N x reps matrix can be costly if either N or reps are large). To support the "randomization-t", I will have to slightly rework the functions implemented in the vcov method / make them more "generically available". Here's a code example: %load_ext autoreload
%autoreload 2
import pyfixest as pf
import numpy as np
data = pf.get_data(N = 10_000)
fml = "Y ~ X1*X2*f2 |f1 + f3"
fit = pf.feols(fml, data=data)
fit.tidy().head()
rng = np.random.default_rng(1234)
fit.ritest(
resampvar="X2",
reps = 10_000,
rng = rng,
type = "randomization-c",
choose_algorithm = "fast",
algo_iterations = 1000, # choose the number of foor loops: draws reps / algo_iterations per loop
include_plot= True
) To Do's:
Overall, more work than I expected! |
Hi @apoorvalal - one question on the "randomization-t" variant: Which defaults should I set for the computation of the vcov? Should I default to the vcov type set in the "feols" call? Then it could in principle happend that Here's an example: fit = pf.feols("Y ~ X1 | f1", vcov = "iid") # iid ses
fit.ritest(resampvar = "X1", cluster = "f1") # cluster random assignment; ses should be CRV1-f1 Under the proposed solution, the vcov matrix in each RI-iteration would be computed as iid, despite the cluster random assignment. With this solution, I should at least add a warning message? Alternatively, I could default to computing CRV variance matrices on the level of cluster assignment and overwrite the vcov type of the Do you have any thoughts on this? I hope you could follow =D |
TODO:
|
Hi Alex, Aronow, Chang, and Lopatto put out an interesting looking paper a couple of weeks ago that might be worth looking at as well. |
Thanks Apporva - for now, I have deleted the |
Open to-do's:
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
This PR adds support for randomization inference via a
ritest
method forFeols
.