-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No support for weights #136
Comments
I'm guessing it's getting silently ignored. This can be surprising for sure. @DavisVaughan What about using |
It's definitely getting silently ignored. Regarding the actual problem of adding x <-c(rnorm(10))
df <- data.frame(y=1+2*x+rnorm(10)/2, x=x, wght1=1:10)
library(parsnip)
linear_reg() %>%
set_engine("lm", weights = .dat()$wght1) %>%
fit(y ~ x, data = df)
#> Warning: The following arguments cannot be manually modified and were
#> removed: weights
#> parsnip model object
#>
#>
#> Call:
#> stats::lm(formula = formula, data = data)
#>
#> Coefficients:
#> (Intercept) x
#> 0.776 1.992 Created on 2019-01-25 by the reprex package (v0.2.1.9000) but I forgot we "protect" the |
Ellipsis is on CRAN! https://cran.r-project.org/web/packages/ellipsis/index.html |
Ah, the readme lies! |
We will support case weights (soon) but it doesn't quite work yet. |
Hello, is there any update on this issue? Is it still the plan to include case weights? |
Up. Was looking for the same recently :) |
@topepo @DavisVaughan Would either of you mind providing an update here about current status? I perused through commits looking for any movement here, but did not spot it. Was hoping to supply case.weights to ranger but ran into this warning: |
Running into this issue with logistic regression as well. I tried to set it during |
We are focused on documentation for the next 1-3 months. After that, we'll bump up the priority on this. |
I would be really interested to help with this weight feature. Please let me known if you are interested. |
+1 for implementing case weights. |
Is there a technical challenge that needs to be hurdled to allow weighting in various models? I would have interest in submitting a PR to close this issue, but I'm not sure I understand what @DavisVaughan is referring to by weights being a protected argument. This singular issue is preventing me from deploying a tidymodels workflow to our production environment. |
It's not really a singular issue. There are a several things that need to get done and some are not straightforward. For For For
In |
@topepo I appreciate you laying that out for us; by singular issue, I just meant this one feature (weights) is blocking my personal progress with tidymodels. But, with all of the uncertainty/API decisions it doesn't sound like this is a place for me to contribute. I'll keep waiting patiently for you and the team to make those decisions. My personal status quo is fine for the time being. |
Some other things that I've thought about since the last response (collected here only as a record)... I see two different use cases here:
We tend to think of case weights only in terms of the model but, depending on which of these use cases you are in, it could/should impact other computations, such as:
|
I would be perfectly happy with "just adding options to parsnip", since for my purposes it's not nearly as important for data splitting or preprocessing to respect weights, and I presume (?) I can roll my own weighted metric. So is there a workaround for using weights (or offsets, for that matter) at all? Is it even possible through just writing a new engine, or does this require a more fundamental change to parsnip? I tried to write an engine for weighted logistic regression and failed in all kinds of ways, then gave up, but if I knew it were at least possible, I'd persevere. In any case, +1 for implementing case weights and/or offsets. |
I'll add an example from my own work, where we run models using survey results and weights are needed for Max's Case 2, where we try to adjust the survey to align with our our target population. |
Do you have any idea when this feature (weights) will be implemented? (we want to switch from caret to tidymodels. but this is a blocking issue for us) |
This is one of our top 3 priorities for this year. At least 6 months for all of the packages to be updated. But, for example, if you just want Just to reiterate, it is a fairly big change and I'm not sure that everyone understands how this affects things. I think that we are used to just using |
We are in the same situation, that we would like to switch our modeling to tidymodels. But we use offsets and/or weights in almost all our models, therefore, this issue is (currently) a dealbraker for us. Having said that, is there anything we, as the community can do to help implementing weights (and offsets) into tidymodels? I may not have deep insights into the inner workings of the packages, but would be willing to help with documentation, examples, ideas, or code where possible. |
I suppose this issue is related, not only to parameter weights in lm, but also to other algorithms such as:
Is there any update on this issue? Is it still the plan to include sample / case weights? +1 for implementing this issue. |
In my work, the data comes with the time exposure (i.e. weights and offset), which enters the model as a separate parameter, while X takes matrix form (in the xgboost model setup). It seems the pre-processing step cannot separate the weights from the remaining predictors before the matrix conversion. In any case, +1 for implementing case weights and/or offsets. |
I really like the tidymodel philosophy. But, like others in this thread, I'm unable to utilize it day to day because of weights not being supported. I work in the insurance industry and much of our models use case weights. While waiting for case weight support in parsnip, I developed a DYI solution that works for my purposes (GLMs and lightGBM). It uses some tidymodels packages (not parsnip) and tries to follow tidymodel and functional programming principles. The objective of the code is to carry out nested cross validation to properly assess automatic tuning parameter protocol, without leakage. For anyone interested, a demo of the script can be found here: https://github.com/edreddick/nested-cv Looking forward to weight being supported by tidymodels, and thank you for building and maintaining tidymodels! |
Is this just influencing case-weights or also class-weights ? While tidymodels seems to be great not being able to use class weights would make it pretty much unusable for every classification task I´ve ever worked with. Hope that´s not the case. For the boosting parsnip class weights are not even mentioned. |
Is there any update on this feature? Like some of the commenters above, this is a must-have for my team (and anyone in the insurance business). The rest of the 'verse is excellent, so it would be great to know if there's a potential release date. Thanks! |
This is the next big feature that we are working on. I'm hoping to start before the end of the year.
|
Tidymodels is attempting to be a paradigm shifting suite for data science and machine learning tools. All of the buzz is about the "tidyverse". Academia--in the social sciences--is slow to adopt any of these tools because they do not readily provide the basics. Unfortunately, data weighted in a manner to make population statements is a basic in the social sciences. I have stalled my research agenda since January based on topepo's January 14th comment of "at least 6 months but if only wanted in parsnip sooner". Then, I see where others involved in the tidyverse project are still gauging interest in using weights to make population statements. I just have to wonder if this type of development is actually taking place or if "tidyverse" is only being using in the corporate world were these types of population statements are less important. I recognize this has to be done correctly for efficiency and accuracy. I, also, wonder what the preparation work was done before "tidyverse" was launched. Seems some simple focus groups with researchers would have uncovered this issue before the major push. I keep saying major push because many of the better developers have left there own creations to join in the development of tidyverse. My chief frustration lies in others in my field are publishing using weighted data to make population statements via tidyverse and other machine learning methods (e.g.,H2o), and claiming to be using weights. I cannot see where this is viable with tidyverse nor with H2o. Thus, I am wondering when or if this issue will be fixed? I need to get moving with my research agenda. Thank you. |
It will be and we will release it when it is ready; it isn't vaporware. There are several public roadmaps/specifications of what has to be done for case weights. As you can see from the comments I've made above, it isn't just a single issue. AFAIK, not other R (or python) based modeling framework does a comprehensive job of handling case weights at every step of the process. I'm sorry if the timeline is not conducive to your work. We are a handful of people working hard to create a lot of free modeling software. We are balancing many new features. In our developer survey from last year, only a single person indicated that they wanted us to prioritize case weights. Despite this, we have multiple people working on it now across multiple packages. |
I am glad to see that you are monitoring this thread as you have since January 2019 (your posts n=6 during this time). I only say this because this is a indication of the OUTCRY for this type of development in tidymodels. After reviewing the results of your developer survey--thank you for the thread, it is unclear to me as to the industries the 300 respondents represented. Perhaps, the respondents are not truly representative of the population of actual users. This thread seems to demonstrate a larger need for this type of incorporation of case weights. Oh before I close, I was only quoting YOUR timeline. Of course, it will be ready whenever you all say that it will be ready. I appreciate you and your work in this area. |
Thank you so much for working to integrate case weights in Tidymodels. Just echoing how helpful that would be. I'm a public health researcher and using weights is critical to allow us to make population-level inferences. I mostly work with categorical data and having support for weights in applying algorithms like recursive partitioning trees and ranger would be really helpful. Appreciate you all making this a priority! |
Please see https://www.tidyverse.org/blog/2022/05/case-weights/ and add comments there. I'll lock this thread now. |
The
weights
argument infit
does nothing. Most modeling packages allow for case weights, so is this going to be supported in the future?The text was updated successfully, but these errors were encountered: