Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-19066][SPARKR][Backport-2.1]:LDA doesn't set optimizer correctly #16623

Closed
wants to merge 2 commits into from

Conversation

wangmiao1981
Copy link
Contributor

What changes were proposed in this pull request?

Back port the fix to SPARK-19066 to 2.1 branch.

How was this patch tested?

Unit tests

@gatorsmile
Copy link
Member

Could you change the PR title to

[SPARK-19066][SPARKR][Backport-2.1]LDA doesn't set optimizer correctly

@wangmiao1981 wangmiao1981 changed the title [SPARK-19066][SPARKR]:Back port bug fix to 2.1 branch [SPARK-19066][SPARKR][Backport-2.1]:LDA doesn't set optimizer correctly Jan 17, 2017
@wangmiao1981
Copy link
Contributor Author

Changed. Thanks!

@wangmiao1981
Copy link
Contributor Author

@yanboliang ported it to 2.1. Thanks!

@SparkQA
Copy link

SparkQA commented Jan 17, 2017

Test build #71527 has finished for PR 16623 at commit bdc7e4a.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 18, 2017

Test build #71542 has finished for PR 16623 at commit 3b17c3d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@felixcheung
Copy link
Member

LGTM

@felixcheung
Copy link
Member

merged to 2.1. thanks!

asfgit pushed a commit that referenced this pull request Jan 18, 2017
## What changes were proposed in this pull request?
Back port the fix to SPARK-19066 to 2.1 branch.

## How was this patch tested?
Unit tests

Author: [email protected] <[email protected]>

Closes #16623 from wangmiao1981/bugport.
@felixcheung
Copy link
Member

@wangmiao1981 could you close this PR - backport doesn't get closed automatically.

@wangmiao1981
Copy link
Contributor Author

Close this PR as it has been merged to 2.1. Thanks!

@jkbradley
Copy link
Member

@felixcheung Just saw this was backported to 2.1.1. Since this is a fairly significant behavioral change, I recommend we revert this backport. I could imagine workloads working with EM but failing with online LDA, and we should be careful about having such failures in patch versions (2.1.0->2.1.1). Let's wait until the next major release (2.2.0) instead.

I've also seen other PRs adding public APIs to SparkR for 2.1.1. I understand that SparkR is still fairly experimental, but I recommend we start treating it like other parts of Spark in terms of API stability. We can talk about exceptions when they are really necessary.

Can you please revert this patch? Thanks!

@felixcheung
Copy link
Member

felixcheung commented Feb 16, 2017 via email

@jkbradley
Copy link
Member

Hm, true, this is a weird case, where it is somewhere between a behavior change and a bug fix. You're right---let's not revert this patch.

I do worry about other patches in SparkR; I'll try be more vigilant about API stability for SparkR for future patches.

Thanks!

@felixcheung
Copy link
Member

felixcheung commented Feb 17, 2017

@jkbradley Sure - I understand you perspective on the API state, and we share the same view. To be clear we should avoid breaking API changes and new API should be in minor release and not patch release.

But there needs to be exception to the rule - SparkR is no where close to ready or stable and there are many gaps we need to fill as soon as possible. Or we might as well abandon it.

Here's the full list of every single commit in R in branch-2.1 since its release.

Blocker for R package release
7763b0b
#16330
#16290

Blocker, API doesn't work
1022049
(& this PR)
173c238

Minor changes, usability on running SparkR as a package
77202a6
9758905
9a49f9a
80a3e13 (yes new API - there is no way to access the Spark UI url)

Gap - no way to manage partition, no API to do that
ee3642f
ba2a5ad
6c35399

Gap - usability, very un-R like - "new API" operators are already there, we are extending them
82fcc13
9c04e42

Yes, somewhat questionable, new parameter but fixing an issue
06e77e0, which is discussed here #16761

@shivaram

@shivaram
Copy link
Contributor

Thanks for cc'ing me on this - I think @jkbradley has a good point that we should be a bit more explicit in discussing when / why we backport changes in SparkR. While we have not declared it a stable interface, I think the number of users who are using it is large enough now that there is some expectation of stability.

I think we can easily separate out the fixes we backport into a couple of buckets -- changes like 7763b0b which are internal changes and don't affect the API / are bug fixes fixing the behavior of an existing API. My reading of our versioning policy is that these are expected in a feature release.

The second group are changes that either add or modify an API (we should never backport something that removes an API). For the modifications, lets explicitly discuss in the JIRA / Github PR as to why this is necessary / a good idea and also if its source compatible (e.g., 06e77e0 is source compatible)

The other thing we could do is add JIRA labels like regression / bug-fix / api-modify to each JIRA that gets backported. Does that sound like something that would be useful @jkbradley @felixcheung ?

@jkbradley
Copy link
Member

Thanks for the comments. I definitely agree with many of your combined statements:

  • R has not been declared stable. (Though where in the docs is this even stated? I was unable to find anything saying it is an alpha component.)
  • Some changes are necessary and OK, such as APIs which simply do not work.
  • We could still be stricter about some API changes.

If I'm a user of SparkR, I agree I'd expect it to be less stable than the rest of Spark. But apparent instability from changing APIs can also scare away users from putting SparkR into actual use. It's a balance, but I'd prefer to err on the side of stability.

I do like the idea of tagging backports, but it's Ok with me not to as long as there is proper explanation of the reason within the JIRA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants