Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fwildclusterboot presubmission #542

Closed
1 of 20 tasks
s3alfisc opened this issue Jun 10, 2022 · 6 comments
Closed
1 of 20 tasks

fwildclusterboot presubmission #542

s3alfisc opened this issue Jun 10, 2022 · 6 comments

Comments

@s3alfisc
Copy link

s3alfisc commented Jun 10, 2022

Submitting Author Name: Alexander Fischer
Submitting Author Github Handle: @s3alfisc
Other Package Authors Github handles: @droodman,
Repository: https://github.com/s3alfisc/fwildclusterboot
Submission type: Pre-submission
Language: en


  • Paste the full DESCRIPTION file inside a code block below:
Description: Implementation of the fast algorithm for wild cluster bootstrap 
             inference developed in Roodman et al (2019, STATA Journal) for 
             linear regression models <doi:10.1177/1536867X19830877>, 
             which makes it feasible to quickly calculate bootstrap test 
             statistics based on a large number of bootstrap draws even for 
             large samples. Multiway clustering, regression weights, 
             bootstrap weights, fixed effects and subcluster bootstrapping
             are supported. Further, both restricted (WCR) and unrestricted
             (WCU) bootstrap are supported. Methods are provided for a variety 
             of fitted models, including 'lm()', 'feols()' 
             (from package 'fixest') and 'felm()' (from package 'lfe'). 
             Additionally implements a heteroskedasticity-robust (HC1) wild 
             bootstrap.
             Further, the package provides an R binding to 'WildBootTests.jl',
             which provides additional speed gains and functionality, 
             including the 'WRE' bootstrap for instrumental variable models 
             (based on models of type 'ivreg()' from package 'ivreg')
             and hypotheses with q > 1.

Scope

  • Please indicate which category or categories from our package fit policies or statistical package categories this package falls under. (Please check an appropriate box below):

    Data Lifecycle Packages

    • data retrieval
    • data extraction
    • database access
    • data munging
    • data deposition
    • data validation and testing
    • workflow automation
    • version control
    • citation management and bibliometrics
    • scientific software wrappers
    • database software bindings
    • geospatial data
    • text data

    Statistical Packages

    • Bayesian and Monte Carlo Routines
    • Dimensionality Reduction, Clustering, and Unsupervised Learning
    • Machine Learning
    • Regression and Supervised Learning
    • Exploratory Data Analysis (EDA) and Summary Statistics
    • Spatial Analyses
    • Time Series Analyses
  • Explain how and why the package falls under these categories (briefly, 1-2 sentences). Please note any areas you are unsure of:

fwildclusterboot conducts inference for (linear) regression models via a wild (cluster) bootstrap. It further serves as an R binding of the WildBootTests.jl library.

Yes, I have worked with the srr package and have a draft available (but it is currently not in the main branch).

  • Who is the target audience and what are scientific applications of this package?

The target audience is academic social scientists (economics, political science, sociology). fwildclusterboot should be used whenever regression errors are "clustered" into few groups, in which case inference based on asymptotic approximations might fail.

Other R packages that implement the wild cluster bootstrap are sandwich via its vcovBS function and the clusterSEs package. fwildclusterboot implements a significantly faster algorithm. Furter, fwildclusterboot offers additional functionality, e.g. the subcluster bootstrap. Through WildBootTests.jl, it also allows to run a highly optimized version of the WRE bootstrap for IV regressions (Davidson & MacKinnon, 2010) , which is not available in any other R package.

fwildclusterboot implements the "fast" wild cluster bootstrap in R, but also allows to call WildBootTests.jl via the JuliaConnectoR package. It's therefore (also) a wrapper package, and you might consider it to be out of scope?

@emilyriederer
Copy link

Hi @s3alfisc - thanks so much for submitting your package. I especially appreciate all of the details (and impressive benchmarking results!) in the best-in-class answer.

As a general matter, this seems in-scope for a regression package. Based on your work with the statistical standards, could you please comment on whether you believe that the package is on track to meet at least half of the general + category-specific standards?

Thanks also for the call-out on the optional functionality to call the Julia implementation. We are also planning, but do not yet have standards, for a statistical wrapper package. A member of the statistics peer review team may comment further on whether or not this package could fit that category also.

@s3alfisc
Copy link
Author

Hi @emilyriederer , thanks for your feedback! I have uploaded my comments based on the statistical software roclets in a separate branch here .

@mpadge
Copy link
Member

mpadge commented Jun 17, 2022

@s3alfisc @emilyriederer I've had a look through the code, and do not think this package should really be considered a statistical "wrapper" package, as it only constucts a single external call to one Julia package. The Julia connection is entirely optional for package functionality, and in terms of code and algorithms represents only a very small portion of the code. I suggest the review process can proceed under the single category nominated above.

@s3alfisc I note that your current version documents compliance with 59 / 115 standards, which is > 50%, so okay to proceed. The srr_report() nevertheless notes standard G2.15 appears to be missing. Could you please check and rectify if possible? Note also that our automated check system currently works on GitHub default branch only, but we're happy to use your submission to develop an appropriate workflow for non-default branches. Until then, the checks might have to be generated for default branch, after which I'll manually remove that comment and re-generate them for your "ropensci" branch. Thanks!

@s3alfisc
Copy link
Author

Hi @mpadge , thanks for your feedback! I will spend some time cleaning up the package over the next days (documenting all srr roclets, add G.2.15, and merge everything into the main branch) and then I will submit fwildclusterboot! =)

@mpadge
Copy link
Member

mpadge commented Jun 17, 2022

@s3alfisc No need to merge if you'd rather not. We do want our system to one day work on non-default branches, so as said are happy to use your submission to test that, if that's easier for you. That said, we do generally advise against this, because then you'll be stuck implementing changes to reviews in your non-default branch, which may make your own workflow less robust. Up to you.

@emilyriederer
Copy link

Thanks @s3alfisc and @mpadge for the conversation. It sounds like we reached a great resolution on where the package fits and look forward to the full submission. I'll close this presubmission inquiry in the meantime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants