-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-19234][MLLib] AFTSurvivalRegression should fail fast when any labels are zero #16652
Conversation
@@ -18,20 +18,26 @@ | |||
package org.apache.spark.ml.regression | |||
|
|||
import scala.util.Random | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Leave the blank)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still causing a style check failure @admackin
@@ -415,4 +421,40 @@ object AFTSurvivalRegressionSuite { | |||
"maxIter" -> 2, | |||
"tol" -> 0.01 | |||
) | |||
|
|||
private[AFTSurvivalRegressionSuite] def checkNumericTypes[M <: Model[M], T <: Estimator[M]]( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is being copied in from MLUtils? why is it necessary?
Yes, the version in MLUtils had labels of zero in the test cases, so was causing test cases to fail after my patch. It didn't look like there was a way to fix this, so I thought it better to make a patch that didn't affect potentially dozens of other packages. Any other thoughts on how to achieve this? I could add a 'minLabel' param to the MLUtils methods but that seems overly specific for this one package. |
I imagine there's a better way to do this without copying code. Do you mean the common test code assumes 0 labels are permitted? then maybe it should just not do that, because it's just using any old value to test. If it doesn't actually assume 0 labels are permitted, then its logic should still work. It's just that this test would need some additional logic to verify that 0 labels cause an exception. |
Yes that’s right, in MLUtils it supplies zero-labels as test cases (ie assumes they’re allowed, which for other regression algorithms would be true). More than happy to patch that instead though if you think that’s OK – it’ll just affect presumably a lot of other test cases.
Verifying the 0-labels cause exceptions isn’t yet covered, but probably should be (separately to MLUtils). (I’ll just need to find out how to assert thrown errors in the Spark testing)
…On 22 Jan 2017, 01:07 +1100, Sean Owen ***@***.***>, wrote:
I imagine there's a better way to do this without copying code. Do you mean the common test code assumes 0 labels are permitted? then maybe it should just not do that, because it's just using any old value to test.
If it doesn't actually assume 0 labels are permitted, then its logic should still work. It's just that this test would need some additional logic to verify that 0 labels cause an exception.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.((
|
It looks like there is no particular reason that |
Test build #3544 has finished for PR 16652 at commit
|
I've addressed all the problems I think – code style now fixed, MLTestingUtils patched (and verified all MLLib test cases still pass), and added a test case for zero-valued labels |
This is looking OK to me, but it needs a (squash, optionally, and) rebase now before it can test again. |
…ed in test cases as they now throw errors
Test build #3551 has finished for PR 16652 at commit
|
(0.000, 0.0, Vectors.dense(0.346, 2.158)), // ← generates error; zero labels invalid | ||
(4.199, 0.0, Vectors.dense(0.795, -0.226)))).toDF("label", "censor", "features") | ||
val aft = new AFTSurvivalRegression() | ||
intercept[SparkException] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's recommended to verify the error message using withClue, eg:
withClue("label of AFTPoint must be positive") {
intercept[SparkException] {
aft.fit(dataset)
}
}
looks good to me too. I just added a small suggestion. Thanks! |
Test build #3552 has finished for PR 16652 at commit
|
Test build #3554 has finished for PR 16652 at commit
|
The core change is looking OK @admackin but seems like it fails Python tests? if you have a moment to look at that, could be all that's needed to get this over the line. |
gentle ping @admackin |
What changes were proposed in this pull request?
If any labels of 0.0 (which are invalid) are supplied, AFTSurvivalRegression gives an error straight away rather than hard-to-interpret warnings and zero-valued coefficients in the output.
How was this patch tested?
Verified against current test suite. (One test needed to be updated as it was providing values of zero for labels so was failing after this patch)
Please review http://spark.apache.org/contributing.html before opening a pull request.