Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Working on caret based models #445

Closed
asheetal opened this issue May 4, 2022 · 3 comments
Closed

Working on caret based models #445

asheetal opened this issue May 4, 2022 · 3 comments

Comments

@asheetal
Copy link

asheetal commented May 4, 2022

repeat of question
I want to use model agnostic caret objects to extract effect size. Wondering how to go about that.

@mattansb
Copy link
Member

mattansb commented May 4, 2022

Can you give more information, perhaps an example of what you are looking for?

Non-parametric models (like xgboot) don't really lend themselves to predictor specific effect sizes. There are variable importance measures you might want to look at:
https://topepo.github.io/caret/variable-importance.html

@asheetal
Copy link
Author

asheetal commented May 4, 2022

yes variable importance is part of the solution. However conceptually if one could consider R-square of 100% to be a model that perfectly explains the dependent variable, then an R-square of 10% means 90% is unexplained effect. So hoping for a way to get model agnostic effect distribution of predictors

@mattansb
Copy link
Member

mattansb commented May 4, 2022

You can get model-wise diagnostics.

But predictor-wise is much harder, and in any case would not be model agnostic.

As an example, here is a simple linear model - we can compare the unique contribution of each predictor by leaving it out and seeing the change in R square. And yet...

m <- lm(mpg ~ cyl + am + hp,
        data = mtcars)

m_drop_cyl <- update(m, formula. = . ~ . - cyl)
m_drop_am <- update(m, formula. = . ~ . - am)
m_drop_hp <- update(m, formula. = . ~ . - hp)

R2_total <- performance::r2(m)[[1]]

The difference should be the unique contribution of each model

R2_delta_cyl <- R2_total - performance::r2(m_drop_cyl)[[1]]
R2_delta_am <- R2_total - performance::r2(m_drop_am)[[1]]
R2_delta_hp <- R2_total - performance::r2(m_drop_hp)[[1]]

c(R2_total = R2_total,
  Sum_unique_R2 = R2_delta_cyl + R2_delta_am + R2_delta_hp)
#>      R2_total.R2 Sum_unique_R2.R2 
#>        0.8041352        0.1306490

Created on 2022-05-04 by the reprex package (v2.0.1)

In more complex models this becomes even more prominent - e.g., in tree based models or KNN, predictors interact in complex ways...

@easystats easystats locked and limited conversation to collaborators May 4, 2022
@mattansb mattansb converted this issue into discussion #446 May 4, 2022

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants