[SPARK-19066][SPARKR][Backport-2.1]:LDA doesn't set optimizer correctly #16623

wangmiao1981 · 2017-01-17T19:54:55Z

What changes were proposed in this pull request?

Back port the fix to SPARK-19066 to 2.1 branch.

How was this patch tested?

Unit tests

gatorsmile · 2017-01-17T19:57:09Z

Could you change the PR title to

[SPARK-19066][SPARKR][Backport-2.1]LDA doesn't set optimizer correctly

wangmiao1981 · 2017-01-17T19:58:44Z

Changed. Thanks!

wangmiao1981 · 2017-01-17T20:02:08Z

@yanboliang ported it to 2.1. Thanks!

SparkQA · 2017-01-17T21:11:04Z

Test build #71527 has finished for PR 16623 at commit bdc7e4a.

This patch fails SparkR unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-01-18T01:08:13Z

Test build #71542 has finished for PR 16623 at commit 3b17c3d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

felixcheung · 2017-01-18T05:21:10Z

LGTM

felixcheung · 2017-01-18T05:24:56Z

merged to 2.1. thanks!

## What changes were proposed in this pull request? Back port the fix to SPARK-19066 to 2.1 branch. ## How was this patch tested? Unit tests Author: [email protected] <[email protected]> Closes #16623 from wangmiao1981/bugport.

felixcheung · 2017-01-18T17:50:59Z

@wangmiao1981 could you close this PR - backport doesn't get closed automatically.

wangmiao1981 · 2017-01-18T21:35:51Z

Close this PR as it has been merged to 2.1. Thanks!

jkbradley · 2017-02-16T18:56:43Z

@felixcheung Just saw this was backported to 2.1.1. Since this is a fairly significant behavioral change, I recommend we revert this backport. I could imagine workloads working with EM but failing with online LDA, and we should be careful about having such failures in patch versions (2.1.0->2.1.1). Let's wait until the next major release (2.2.0) instead.

I've also seen other PRs adding public APIs to SparkR for 2.1.1. I understand that SparkR is still fairly experimental, but I recommend we start treating it like other parts of Spark in terms of API stability. We can talk about exceptions when they are really necessary.

Can you please revert this patch? Thanks!

felixcheung · 2017-02-16T20:33:59Z

Surely. This is a small enough change (one line) that we could revert easily. But just so I understand, this isn't just a behavior change per se - this is the API not doing that is documented. Isn't this something we should fix in a patch release?

jkbradley · 2017-02-17T20:48:16Z

Hm, true, this is a weird case, where it is somewhere between a behavior change and a bug fix. You're right---let's not revert this patch.

I do worry about other patches in SparkR; I'll try be more vigilant about API stability for SparkR for future patches.

Thanks!

felixcheung · 2017-02-17T21:00:01Z

@jkbradley Sure - I understand you perspective on the API state, and we share the same view. To be clear we should avoid breaking API changes and new API should be in minor release and not patch release.

But there needs to be exception to the rule - SparkR is no where close to ready or stable and there are many gaps we need to fill as soon as possible. Or we might as well abandon it.

Here's the full list of every single commit in R in branch-2.1 since its release.

Blocker for R package release
7763b0b
#16330
#16290

Blocker, API doesn't work
1022049
(& this PR)
173c238

Minor changes, usability on running SparkR as a package
77202a6
9758905
9a49f9a
80a3e13 (yes new API - there is no way to access the Spark UI url)

Gap - no way to manage partition, no API to do that
ee3642f
ba2a5ad
6c35399

Gap - usability, very un-R like - "new API" operators are already there, we are extending them
82fcc13
9c04e42

Yes, somewhat questionable, new parameter but fixing an issue
06e77e0, which is discussed here #16761

@shivaram

shivaram · 2017-02-18T18:25:48Z

Thanks for cc'ing me on this - I think @jkbradley has a good point that we should be a bit more explicit in discussing when / why we backport changes in SparkR. While we have not declared it a stable interface, I think the number of users who are using it is large enough now that there is some expectation of stability.

I think we can easily separate out the fixes we backport into a couple of buckets -- changes like 7763b0b which are internal changes and don't affect the API / are bug fixes fixing the behavior of an existing API. My reading of our versioning policy is that these are expected in a feature release.

The second group are changes that either add or modify an API (we should never backport something that removes an API). For the modifications, lets explicitly discuss in the JIRA / Github PR as to why this is necessary / a good idea and also if its source compatible (e.g., 06e77e0 is source compatible)

The other thing we could do is add JIRA labels like regression / bug-fix / api-modify to each JIRA that gets backported. Does that sound like something that would be useful @jkbradley @felixcheung ?

jkbradley · 2017-03-06T01:14:49Z

Thanks for the comments. I definitely agree with many of your combined statements:

R has not been declared stable. (Though where in the docs is this even stated? I was unable to find anything saying it is an alpha component.)
Some changes are necessary and OK, such as APIs which simply do not work.
We could still be stricter about some API changes.

If I'm a user of SparkR, I agree I'd expect it to be less stable than the rest of Spark. But apparent instability from changing APIs can also scare away users from putting SparkR into actual use. It's a balance, but I'd prefer to err on the side of stability.

I do like the idea of tagging backports, but it's Ok with me not to as long as there is proper explanation of the reason within the JIRA.

backport bug fix

bdc7e4a

wangmiao1981 changed the title ~~[SPARK-19066][SPARKR]:Back port bug fix to 2.1 branch~~ [SPARK-19066][SPARKR][Backport-2.1]:LDA doesn't set optimizer correctly Jan 17, 2017

fix a unit test

3b17c3d

wangmiao1981 closed this Jan 18, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-19066][SPARKR][Backport-2.1]:LDA doesn't set optimizer correctly #16623

[SPARK-19066][SPARKR][Backport-2.1]:LDA doesn't set optimizer correctly #16623

wangmiao1981 commented Jan 17, 2017

gatorsmile commented Jan 17, 2017

wangmiao1981 commented Jan 17, 2017

wangmiao1981 commented Jan 17, 2017

SparkQA commented Jan 17, 2017

SparkQA commented Jan 18, 2017

felixcheung commented Jan 18, 2017

felixcheung commented Jan 18, 2017

felixcheung commented Jan 18, 2017

wangmiao1981 commented Jan 18, 2017

jkbradley commented Feb 16, 2017

felixcheung commented Feb 16, 2017 via email

jkbradley commented Feb 17, 2017

felixcheung commented Feb 17, 2017 •

edited

Loading

shivaram commented Feb 18, 2017

jkbradley commented Mar 6, 2017

[SPARK-19066][SPARKR][Backport-2.1]:LDA doesn't set optimizer correctly #16623

[SPARK-19066][SPARKR][Backport-2.1]:LDA doesn't set optimizer correctly #16623

Conversation

wangmiao1981 commented Jan 17, 2017

What changes were proposed in this pull request?

How was this patch tested?

gatorsmile commented Jan 17, 2017

wangmiao1981 commented Jan 17, 2017

wangmiao1981 commented Jan 17, 2017

SparkQA commented Jan 17, 2017

SparkQA commented Jan 18, 2017

felixcheung commented Jan 18, 2017

felixcheung commented Jan 18, 2017

felixcheung commented Jan 18, 2017

wangmiao1981 commented Jan 18, 2017

jkbradley commented Feb 16, 2017

felixcheung commented Feb 16, 2017 via email

jkbradley commented Feb 17, 2017

felixcheung commented Feb 17, 2017 • edited Loading

shivaram commented Feb 18, 2017

jkbradley commented Mar 6, 2017

felixcheung commented Feb 17, 2017 •

edited

Loading