Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-15892][ML] Incorrectly merged AFTAggregator with zero total count #13619

Closed
wants to merge 5 commits into from

Conversation

HyukjinKwon
Copy link
Member

@HyukjinKwon HyukjinKwon commented Jun 11, 2016

What changes were proposed in this pull request?

Currently, AFTAggregator is not being merged correctly. For example, if there is any single empty partition in the data, this creates an AFTAggregator with zero total count which causes the exception below:

IllegalArgumentException: u'requirement failed: The number of instances should be greater than 0.0, but got 0.'

Please see AFTSurvivalRegression.scala#L573-L575 as well.

Just to be clear, the python example aft_survival_regression.py seems using 5 rows. So, if there exist partitions more than 5, it throws the exception above since it contains empty partitions which results in an incorrectly merged AFTAggregator.

Executing bin/spark-submit examples/src/main/python/ml/aft_survival_regression.py on a machine with CPUs more than 5 is being failed because it creates tasks with some empty partitions with defualt configurations (AFAIK, it sets the parallelism level to the number of CPU cores).

How was this patch tested?

An unit test in AFTSurvivalRegressionSuite.scala and manually tested by bin/spark-submit examples/src/main/python/ml/aft_survival_regression.py.

@HyukjinKwon
Copy link
Member Author

HyukjinKwon commented Jun 11, 2016

cc @jkbradley Could you please take a look?

@HyukjinKwon
Copy link
Member Author

HyukjinKwon commented Jun 11, 2016

FYI, the example was being passed in branch-1.6 because it does not check whether the count is 0 or not. This check was introduced in 101663f.

@SparkQA
Copy link

SparkQA commented Jun 11, 2016

Test build #60345 has finished for PR 13619 at commit fb16b71.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jkbradley
Copy link
Member

@HyukjinKwon Thanks! The fix looks correct. Could you please add a tiny unit test which fails before your fix & works afterwards?

@HyukjinKwon
Copy link
Member Author

@jkbradley Sure!

@SparkQA
Copy link

SparkQA commented Jun 12, 2016

Test build #60353 has finished for PR 13619 at commit 4447d0a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 12, 2016

Test build #60357 has finished for PR 13619 at commit c86ede8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 12, 2016

Test build #60359 has finished for PR 13619 at commit 2c8adbf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jkbradley
Copy link
Member

LGTM
Merging with master and branch-2.0
Thank you!

asfgit pushed a commit that referenced this pull request Jun 12, 2016
## What changes were proposed in this pull request?

Currently, `AFTAggregator` is not being merged correctly. For example, if there is any single empty partition in the data, this creates an `AFTAggregator` with zero total count which causes the exception below:

```
IllegalArgumentException: u'requirement failed: The number of instances should be greater than 0.0, but got 0.'
```

Please see [AFTSurvivalRegression.scala#L573-L575](https://github.com/apache/spark/blob/6ecedf39b44c9acd58cdddf1a31cf11e8e24428c/mllib/src/main/scala/org/apache/spark/ml/regression/AFTSurvivalRegression.scala#L573-L575) as well.

Just to be clear, the python example `aft_survival_regression.py` seems using 5 rows. So, if there exist partitions more than 5, it throws the exception above since it contains empty partitions which results in an incorrectly merged `AFTAggregator`.

Executing `bin/spark-submit examples/src/main/python/ml/aft_survival_regression.py` on a machine with CPUs more than 5 is being failed because it creates tasks with some empty partitions with defualt  configurations (AFAIK, it sets the parallelism level to the number of CPU cores).

## How was this patch tested?

An unit test in `AFTSurvivalRegressionSuite.scala` and manually tested by `bin/spark-submit examples/src/main/python/ml/aft_survival_regression.py`.

Author: hyukjinkwon <[email protected]>
Author: Hyukjin Kwon <[email protected]>

Closes #13619 from HyukjinKwon/SPARK-15892.

(cherry picked from commit e355460)
Signed-off-by: Joseph K. Bradley <[email protected]>
asfgit pushed a commit that referenced this pull request Jun 12, 2016
## What changes were proposed in this pull request?

Currently, `AFTAggregator` is not being merged correctly. For example, if there is any single empty partition in the data, this creates an `AFTAggregator` with zero total count which causes the exception below:

```
IllegalArgumentException: u'requirement failed: The number of instances should be greater than 0.0, but got 0.'
```

Please see [AFTSurvivalRegression.scala#L573-L575](https://github.com/apache/spark/blob/6ecedf39b44c9acd58cdddf1a31cf11e8e24428c/mllib/src/main/scala/org/apache/spark/ml/regression/AFTSurvivalRegression.scala#L573-L575) as well.

Just to be clear, the python example `aft_survival_regression.py` seems using 5 rows. So, if there exist partitions more than 5, it throws the exception above since it contains empty partitions which results in an incorrectly merged `AFTAggregator`.

Executing `bin/spark-submit examples/src/main/python/ml/aft_survival_regression.py` on a machine with CPUs more than 5 is being failed because it creates tasks with some empty partitions with defualt  configurations (AFAIK, it sets the parallelism level to the number of CPU cores).

## How was this patch tested?

An unit test in `AFTSurvivalRegressionSuite.scala` and manually tested by `bin/spark-submit examples/src/main/python/ml/aft_survival_regression.py`.

Author: hyukjinkwon <[email protected]>
Author: Hyukjin Kwon <[email protected]>

Closes #13619 from HyukjinKwon/SPARK-15892.

(cherry picked from commit e355460)
Signed-off-by: Joseph K. Bradley <[email protected]>
@asfgit asfgit closed this in e355460 Jun 12, 2016
zzcclp pushed a commit to zzcclp/spark that referenced this pull request Jun 13, 2016
## What changes were proposed in this pull request?

Currently, `AFTAggregator` is not being merged correctly. For example, if there is any single empty partition in the data, this creates an `AFTAggregator` with zero total count which causes the exception below:

```
IllegalArgumentException: u'requirement failed: The number of instances should be greater than 0.0, but got 0.'
```

Please see [AFTSurvivalRegression.scala#L573-L575](https://github.com/apache/spark/blob/6ecedf39b44c9acd58cdddf1a31cf11e8e24428c/mllib/src/main/scala/org/apache/spark/ml/regression/AFTSurvivalRegression.scala#L573-L575) as well.

Just to be clear, the python example `aft_survival_regression.py` seems using 5 rows. So, if there exist partitions more than 5, it throws the exception above since it contains empty partitions which results in an incorrectly merged `AFTAggregator`.

Executing `bin/spark-submit examples/src/main/python/ml/aft_survival_regression.py` on a machine with CPUs more than 5 is being failed because it creates tasks with some empty partitions with defualt  configurations (AFAIK, it sets the parallelism level to the number of CPU cores).

## How was this patch tested?

An unit test in `AFTSurvivalRegressionSuite.scala` and manually tested by `bin/spark-submit examples/src/main/python/ml/aft_survival_regression.py`.

Author: hyukjinkwon <[email protected]>
Author: Hyukjin Kwon <[email protected]>

Closes apache#13619 from HyukjinKwon/SPARK-15892.

(cherry picked from commit e355460)
Signed-off-by: Joseph K. Bradley <[email protected]>
(cherry picked from commit be3c41b)
// the parallelism is bigger than that. Because the issue was about `AFTAggregator`s
// being merged incorrectly when it has an empty partition, running the codes below
// should not throw an exception.
val dataset = spark.createDataFrame(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HyukjinKwon value spark is not found here, it should be sqlContext, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I compile it in branch-1.6.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with branch-2.0, it is OK , i think that this pr should not be merged into branch-1.6 directly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, it seems this is merged into branch-1.6 too. Yes, it should be sqlContext for branch-1.6.

@HyukjinKwon
Copy link
Member Author

@zzcclp I made a PR against branch-1.6 here, #13630. Thank you for pointing this out quickly.

@zzcclp
Copy link
Contributor

zzcclp commented Jun 13, 2016

@HyukjinKwon OK, but this pr should be reverted first.
@jkbradley could you revert this pr for branch-1.6 ?

@HyukjinKwon
Copy link
Member Author

HyukjinKwon commented Jun 13, 2016

Yes, it seems it is still failing. I can change the PR to revert this if @jkbradley is busy for now. Otherwise, I will close mine.

@zzcclp
Copy link
Contributor

zzcclp commented Jun 14, 2016

@HyukjinKwon , could you revert this pr for branch-1.6 ?

@HyukjinKwon
Copy link
Member Author

@zzcclp I just did here #13630

@zzcclp
Copy link
Contributor

zzcclp commented Jun 14, 2016

@HyukjinKwon , just see it, thanks.

@jkbradley
Copy link
Member

My apologies for the slow response. I'll revert it now. Thank you for pinging!

@jkbradley
Copy link
Member

Reverted commit to branch-1.6 in commit 2f3e327

asfgit pushed a commit that referenced this pull request Jun 18, 2016
… 1.6

## What changes were proposed in this pull request?

This PR backports #13619.

The original test added in branch-2.0 was failed in branch-1.6.

This seems because the behaviour was changed in 101663f. This was failure while calculating Euler's number which ends up with a infinity regardless of this path.

So, I brought the dataset from `AFTSurvivalRegressionExample` to make sure this is working and then wrote the test.

I ran the test before/after creating empty partitions. `model.scale` becomes `1.0` with empty partitions and becames `1.547` without them.

After this patch, this becomes always `1.547`.

## How was this patch tested?

Unit test in `AFTSurvivalRegressionSuite`.

Author: hyukjinkwon <[email protected]>

Closes #13725 from HyukjinKwon/SPARK-15892-1-6.
zzcclp pushed a commit to zzcclp/spark that referenced this pull request Jun 20, 2016
… 1.6

## What changes were proposed in this pull request?

This PR backports apache#13619.

The original test added in branch-2.0 was failed in branch-1.6.

This seems because the behaviour was changed in apache@101663f. This was failure while calculating Euler's number which ends up with a infinity regardless of this path.

So, I brought the dataset from `AFTSurvivalRegressionExample` to make sure this is working and then wrote the test.

I ran the test before/after creating empty partitions. `model.scale` becomes `1.0` with empty partitions and becames `1.547` without them.

After this patch, this becomes always `1.547`.

## How was this patch tested?

Unit test in `AFTSurvivalRegressionSuite`.

Author: hyukjinkwon <[email protected]>

Closes apache#13725 from HyukjinKwon/SPARK-15892-1-6.

(cherry picked from commit fd05389)
@HyukjinKwon HyukjinKwon deleted the SPARK-15892 branch January 2, 2018 03:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants