Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

estimates() #907

Closed
wants to merge 2 commits into from
Closed

estimates() #907

wants to merge 2 commits into from

Conversation

vincentarelbundock
Copy link
Owner

No description provided.

@vincentarelbundock vincentarelbundock force-pushed the estimates branch 2 times, most recently from 20b05b2 to 253a9ba Compare September 20, 2023 15:10
@vincentarelbundock
Copy link
Owner Author

vincentarelbundock commented Sep 20, 2023

I’m curious what @ngreifer and @andrewheiss think about PR in #907. Two observations:

  1. One of the many reasons why tidymodels is so popular is that users can switch a single argument in their pipeline to switch the “engine” that fits the model.
  2. Years ago, many teachers seemed very excited about Zelig, because it made it easy for students to fit a bunch of models using a consistent user interface.

It occurs to me that marginaleffects is really just 1 step away from that old school Zelig vision. This morning I wrote a new estimates() function.

Call it without any argument, you get a list of available models:

library(marginaleffects)

estimates()
#           Model                                  Description         Package  Function
# 1          beta                              Beta Regression         betareg   betareg
# 2  betabinomial                                Beta-Binomial             aod   betabin
# 3           cox                     Cox Proportional Hazards        survival     coxph
# 4    firthlogit                             Firth Logitistic         logistf   logistf
# 5     firthflac        Firth Logitistic with Added Covariate         logistf      flac
# 6     firthflic   Firth Logitistic with Intercept Correction         logistf      flac
# 7         feglm                            Fixed Effects GLM          fixest     feglm
# 8          felm                   Fixed Effects Linear Model          fixest     feols
# 9     fepoisson                        Fixed Effects Poisson          fixest    fepois
# 10          gam                   Generalized Additive Model            mgcv       gam
# 11          glm                     Generalized Linear Model           stats       glm
# 12      heckman Heckman-Style Selection and Treatment Effect sampleSelection selection
# 13       heckit Heckman-Style Selection and Treatment Effect sampleSelection    heckit
# 14           lm                                 Linear Model           stats        lm
# 15        logit                                     Logistic           stats       glm
# 16        meglm             Mixed-Effects Generalized Linear            lme4     glmer
# 17         melm                         Mixed-Effects Linear            lme4      lmer
# 18      melogit                          Mixed-Effects Logit            lme4     glmer
# 19    mepoisson                        Mixed-Effects Poisson            lme4     glmer
# 20     meprobit                         Mixed-Effects Probit            lme4     glmer
# 21     multinom                       Multinomial Log-Linear            nnet  multinom
# 22      neg_bin                            Negative Binomial            MASS    glm.nb
# 23          nls                      Nonlinear Least Squares           stats       nls
# 24     ocloglog                Ordered Complementary Log-Log            MASS      polr
# 25      ologlog                              Ordered Log-Log            MASS      polr
# 26     ocauchit                              Ordered Log-Log            MASS      polr
# 27       ologit                             Ordered Logistic            MASS      polr
# 28      oprobit                               Ordered Probit            MASS      polr
# 29      poisson                                      Poisson           stats       glm
# 30       probit                                       Probit           stats       glm
# 31     quantile                          Quantile Regression        quantreg        rq
# 32 quasipoisson                                Quasi-Poisson           stats       glm
# 33   robust_glm                    Robust Generalized Linear      robustbase    glmrob
# 34    robust_lm                                Robust Linear      robustbase     lmrob
# 35        trunc                  Truncated Gaussian Response        truncreg  truncreg
# 36         2sls                      Two-Stage Least Squares           ivreg     ivreg
# 37  2sls_robust      Two-Stage Least Squares with Robust SEs        estimatr iv_robust
# 38   0geometric                      Zero-Inflated Geometric            pscl  zeroinfl
# 39      0negbin              Zero-Inflated Negative Binomial            pscl  zeroinfl
# 40     0poisson                        Zero-Inflated Poisson            pscl  zeroinfl

If you specify the model argument but nothing else, you get an informative printout:

estimates(model = "ologit")
# 
# Model: Ordered Logistic
# Package: MASS
# Function: polr
# Documentation: ?MASS::polr
# Arguments: formula, data, model, weights, start, subset, na.action, contrasts, Hess, method

Finally, you can estimate a bunch of models super easily, and feed them all to any marginaleffects function:

estimates(gear ~ mpg + hp, mtcars, model = "ologit") |> avg_slopes()
# 
#  Group Term Estimate Std. Error     z Pr(>|z|)    S     2.5 %   97.5 %
#      3  hp  -0.00377   0.001514 -2.49  0.01285  6.3 -0.006735 -0.00080
#      3  mpg -0.07014   0.015484 -4.53  < 0.001 17.4 -0.100488 -0.03979
#      4  hp   0.00201   0.000958  2.10  0.03555  4.8  0.000136  0.00389
#      4  mpg  0.03747   0.013861  2.70  0.00687  7.2  0.010303  0.06464
#      5  hp   0.00175   0.000833  2.11  0.03519  4.8  0.000122  0.00339
#      5  mpg  0.03267   0.009571  3.41  < 0.001 10.6  0.013909  0.05143
# 
# Columns: term, group, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high 
# Type:  probs

estimates(gear ~ mpg + hp, mtcars, model = "poisson") |> avg_slopes()
# 
#  Term Estimate Std. Error    z Pr(>|z|)   S    2.5 % 97.5 %
#   hp   0.00644    0.00775 0.83    0.406 1.3 -0.00876 0.0216
#   mpg  0.11296    0.08782 1.29    0.198 2.3 -0.05916 0.2851
# 
# Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high 
# Type:  response

estimates(gear ~ mpg + hp + (1 | cyl), mtcars, model = "mepoisson") |> avg_slopes()
# 
#  Term Estimate Std. Error    z Pr(>|z|)   S    2.5 % 97.5 %
#   hp   0.00644    0.00776 0.83    0.407 1.3 -0.00877 0.0216
#   mpg  0.11296    0.08787 1.29    0.199 2.3 -0.05926 0.2852
# 
# Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high 
# Type:  response

What do you think? Is this worth including in the package? It’s really just a thin wrapper, so maintainance costs are pretty small…

@ngreifer
Copy link
Contributor

I beg of you not to do this. You think the maintenance costs are low, but they are not. This is exactly why Zelig fell; it tried to do too much. Keep things modular by supporting other packages rather than trying to be a one-stop shop. This will inadvertently require you to be a maintainer of all the packages supported. It is better to provide a vignette explaining how to run each of these models using different packages (which you would likely have to do in a help file for estimates(), anyway) than to try to implement them in the package. At best, this should be a separate package that interfaces with marginaleffects, not a part of the package. marginaleffects is too important of package to be tethered by this massive weight.

@andrewheiss
Copy link
Contributor

I'd agree. I've had to teach {tidymodels} for a couple different workshops and learners have found that it's often too abstract to just change the engine argument. It hides too much of the model-specific interface. Where {tidymodels} shines is in production pipelines where people don't care so much about model debugging and tinkering. When just running a handful of models for a report or manuscript or whatever, the universal {tidymodels} interface is clunky and overkill and hard to work with and its better to just use the regular glm(), brm(), ologit() or whatever functions directly. I'd see similar issues with a {Zelig}-like interface—possibly too abstract

@vincentarelbundock
Copy link
Owner Author

Very glad I asked you. Thanks both for your input.

I'm convinced and will drop this.

@andrewheiss
Copy link
Contributor

andrewheiss commented Sep 20, 2023

It might be useful to go the other way—let {tidymodels} handle all the universal interface heavy lifting and just make sure that slopes(), comparisons(), and predictions() can fit in a pipeline. Julia Silge has an example here of using predictions() in a pipeline.

Basically let {tidymodels} be the magical universal interface since they have a whole paid team of developers for that, but ensure that {marginaleffects} can fit in their ecosystem? (I think it already does, since it works with broom)

@vincentarelbundock
Copy link
Owner Author

Riight. I did write some code to facilitate interaction. In principle, any tidymodels object produced by an engine that is supported by marginaleffects should also be supported out of the box. Although I haven't tested that extensively.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants