-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-3181][MLLIB]: Add Robust Regression Algorithm with Huber Estimator #8013
Conversation
Test build #40111 has finished for PR 8013 at commit
|
Still a lot of duplication. We're adding new features into LiR now, and it will be hard to maintain. Is it possible that you just add the objective function, and use Params to switch between different objective function? Thanks. |
Test build #40222 has finished for PR 8013 at commit
|
Test build #40223 has finished for PR 8013 at commit
|
@dbtsai ust added the objective function, and use Params to switch between different objective function. Thanks! |
@@ -325,4 +325,21 @@ private[ml] trait HasStepSize extends Params { | |||
/** @group getParam */ | |||
final def getStepSize: Double = $(stepSize) | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sharedParams.scala
can not be edited directly. Please look at SharedParamsCodeGen.scala
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, make HasRobust
as HasRobustRegression
in SharedParamsCodeGen.scala
.
Test build #40530 has finished for PR 8013 at commit
|
This class was not added by me. I didn't touch PySpark. |
Test build #41422 has finished for PR 8013 at commit
|
Test build #41421 has finished for PR 8013 at commit
|
Test build #41423 has finished for PR 8013 at commit
|
@@ -65,6 +65,10 @@ private[shared] object SharedParamsCodeGen { | |||
isValid = "ParamValidators.inArray(Array(\"skip\", \"error\"))"), | |||
ParamDesc[Boolean]("standardization", "whether to standardize the training features" + | |||
" before fitting the model.", Some("true")), | |||
ParamDesc[Boolean]("robustRegression", "whether to use robust Huber Cost Function", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to introduce a "costFunction" param which defaults to "LeastSquares" and pattern match in LinearRegression#L195 since that will force mutual exclusivity when more than two cost functions are possible
Test build #41662 has finished for PR 8013 at commit
|
@@ -97,6 +97,22 @@ class LinearRegression(override val uid: String) | |||
setDefault(standardization -> true) | |||
|
|||
/** | |||
* Set the robust Option to determine whether to use robust Huber Cost Function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This is not really an "Option", can we just make this say " Set whether to use robust Huber Cost Function"
There is a lot of code repetition between this and #2096, perhaps you can make the |
Hello, robust tuning parameter k should not be a constant as you implemented. |
add the objective function, and use Params to switch edit to pass scala style tests make HasRobustRegression in SharedParamsCodeGen.scala, Make the document more explicitly and make k tunable and default to 1.345 by having another param UnitTests with Outliers UnitTests with Outliers Edit HuberAggregator scala codestyle Update LinearRegression.scala
e447623
to
01601ee
Compare
Test build #49555 has finished for PR 8013 at commit
|
Thanks for the pull request. I'm going through a list of pull requests to cut them down since the sheer number is breaking some of the tooling we have. Due to lack of activity on this pull request, I'm going to push a commit to close it. Feel free to reopen it or create a new one. We can also continue the discussion on the JIRA ticket. @dbtsai there are a few pull requests that were waiting on your review. Can you revisit them even if they are closed? |
@rxin @mengxr I'm back to US from a leave. Going to revisit PRs under me. I had worked with @MechCoder to implement Huber estimator in python scikit scikit-learn/scikit-learn#5291 which had been merged. @fjiang6, @MechCoder, @sethah, are you interested in porting this feature to Spark which should be fairly straightforward? Thanks. |
@dbtsai I'm interested in porting Huber estimator to Spark. If you did not start it, I can send a PR in a few days. Thanks! |
@yanboliang Sounds great! Thanks. |
I'll be happy to review it. |
Huber Robust Regression under spark/ml/regression
Unit Tests