[SPARK-16495] [MLlib]Add ADMM optimizer in mllib package #14473

ZunwenYou · 2016-08-03T02:56:58Z

Alternating Direction Method of Multipliers (ADMM) is well suited to distributed convex optimization, and in particular to large-scale problems arising in statistics, machine learning, and related areas.
Details can be found in the S. Boyd's paper.

JIRA Issue: https://issues.apache.org/jira/browse/SPARK-16495

AmplabJenkins · 2016-08-03T02:57:15Z

Can one of the admins verify this patch?

ZunwenYou · 2016-08-04T03:59:17Z

@MLnick please have a look at this.

sethah · 2016-08-05T04:11:08Z

@ZunwenYou Would you mind addressing the comments in the JIRA first? Adding a new optimization algorithm to an API that is now deprecated definitely warrants more high level discussion before code review should proceed, IMO.

MLnick · 2016-08-05T09:06:09Z

@ZunwenYou sorry if I was not clear on the JIRA. I said there that this should probably be done as a Spark package external to the core initially. That way you can gather some user feedback and performance numbers etc.

If this is to be implemented within Spark then as Seth mentions it makes more sense to go into the ml API. In this case we need more work and discussion to see how to add different optimizers and make them available to the different models with a unified API.

It would also need some performance comparisons to the current optimizer options.

ZunwenYou · 2016-08-05T11:14:24Z

@MLnick You are right. We have apply ADMM to Sparse Logistic Regression with L1 norm in some CTR applications, the data sets of these applications almost consist of 10 million dimension and 100 million samples. Actually, ADMM's LR is fast than SGD on large scale data.

I can provide some performance comparisons to SGD or OWLQN on our data set，but I wonder if the performance comparisons are convinced because our data set is private. Do you have any idea about data set for preformance comparisons?

MLnick · 2016-08-05T11:48:41Z

I'd recommend (a) generate some data; and/or (b) take a look at some larger public datasets (or samples thereof) such as Criteo (https://www.kaggle.com/c/criteo-display-ad-challenge/data) or Avito (https://www.kaggle.com/c/avito-context-ad-clicks/data)

debasish83 · 2016-12-26T05:22:19Z

ADMM is already available as a breeze solver (BFGS, OWLQN, NonlinearMinimizer) which is integrated with ml/mllib...It will be great if you can look into it and let me know if you need pointers in running comparisons with OWLQN:
https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/proximal/NonlinearMinimizer.scala
This is implemented based on the paper you cited.

Closes apache#11785 Closes apache#13027 Closes apache#13614 Closes apache#13761 Closes apache#15197 Closes apache#14006 Closes apache#12576 Closes apache#15447 Closes apache#13259 Closes apache#15616 Closes apache#14473 Closes apache#16638 Closes apache#16146 Closes apache#17269 Closes apache#17313 Closes apache#17418 Closes apache#17485 Closes apache#17551 Closes apache#17463 Closes apache#17625 Closes apache#10739 Closes apache#15193 Closes apache#15344 Closes apache#14804 Closes apache#16993 Closes apache#17040 Closes apache#15180 Closes apache#17238

This pr proposed to close stale PRs. Currently, we have 400+ open PRs and there are some stale PRs whose JIRA tickets have been already closed and whose JIRA tickets does not exist (also, they seem not to be minor issues). // Open PRs whose JIRA tickets have been already closed Closes apache#11785 Closes apache#13027 Closes apache#13614 Closes apache#13761 Closes apache#15197 Closes apache#14006 Closes apache#12576 Closes apache#15447 Closes apache#13259 Closes apache#15616 Closes apache#14473 Closes apache#16638 Closes apache#16146 Closes apache#17269 Closes apache#17313 Closes apache#17418 Closes apache#17485 Closes apache#17551 Closes apache#17463 Closes apache#17625 // Open PRs whose JIRA tickets does not exist and they are not minor issues Closes apache#10739 Closes apache#15193 Closes apache#15344 Closes apache#14804 Closes apache#16993 Closes apache#17040 Closes apache#15180 Closes apache#17238 N/A Author: Takeshi Yamamuro <[email protected]> Closes apache#17734 from maropu/resolved_pr. Change-Id: Id2e590aa7283fe5ac01424d30a40df06da6098b5

add ADMM and ADMMSuite

190520f

ZunwenYou changed the title ~~[SPARK-16495] Add ADMM optimizer in mllib package~~ [SPARK-16495] [MLlib]Add ADMM optimizer in mllib package Aug 3, 2016

maropu mentioned this pull request Apr 23, 2017

[BUILD] Close stale PRs #17734

Closed

asfgit closed this in e9f9715 Apr 24, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-16495] [MLlib]Add ADMM optimizer in mllib package #14473

[SPARK-16495] [MLlib]Add ADMM optimizer in mllib package #14473

ZunwenYou commented Aug 3, 2016

AmplabJenkins commented Aug 3, 2016

ZunwenYou commented Aug 4, 2016

sethah commented Aug 5, 2016

MLnick commented Aug 5, 2016

ZunwenYou commented Aug 5, 2016

MLnick commented Aug 5, 2016

debasish83 commented Dec 26, 2016

[SPARK-16495] [MLlib]Add ADMM optimizer in mllib package #14473

[SPARK-16495] [MLlib]Add ADMM optimizer in mllib package #14473

Conversation

ZunwenYou commented Aug 3, 2016

AmplabJenkins commented Aug 3, 2016

ZunwenYou commented Aug 4, 2016

sethah commented Aug 5, 2016

MLnick commented Aug 5, 2016

ZunwenYou commented Aug 5, 2016

MLnick commented Aug 5, 2016

debasish83 commented Dec 26, 2016