Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-10757] [MLlib] Java friendly constructor for distributed matrices #9159

Closed

Conversation

yzhliu
Copy link
Member

@yzhliu yzhliu commented Oct 18, 2015

This patch allows java developers to construct BlockMatrix and RowMatrix more naturally.

Take BlockMatrix as an example, the type of 'blocks' in current constructor is RDD[((Int, Int), Matrix)], while more precisely it should be RDD[((Int, Int), T<:Matrix)]. This could cause some problem. If we add a Java-friendly constructor

def this(blocks: JavaRDD[((Integer, Integer), Matrix)],
    rowsPerBlock: Int,
    colsPerBlock: Int)

and then construct a BlockMatrix with RDD[(Int, Int), DenseMatrix], the complier will complain a conflict, like:

method constructor BlockMatrix with alternatives:
(blocks: org.apache.spark.api.java.JavaRDD[((Integer, Integer), org.apache.spark.mllib.linalg.Matrix)],rowsPerBlock: Int,colsPerBlock: Int)org.apache.spark.mllib.linalg.distributed.BlockMatrix <and>
(blocks: org.apache.spark.rdd.RDD[((Int, Int), org.apache.spark.mllib.linalg.Matrix)],rowsPerBlock: Int,colsPerBlock: Int)org.apache.spark.mllib.linalg.distributed.BlockMatrix
cannot be applied to (org.apache.spark.rdd.RDD[((Int, Int), org.apache.spark.mllib.linalg.DenseMatrix)], Int, Int)

On the other hand, BlockMatrix should hide the underlying implementation of sub-matrix, no matter it is dense or sparse. Thus, a class level covariant for Matrix is improper. That's why I add a companion object method to construct BlockMatrix/RowMatrix

For IndexedRowMatrix/CoordinateMatrix, since the RDD parameters for them are really simple and contain no type inheritance, I don't see necessity to add additional Java friendly apis for those two. Or do we need them to let java users construct IndexedRowMatrix/CoordinateMatrix in the same way as what they do with BlockMatrix/RowMatrix?

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@yanboliang
Copy link
Contributor

@mengxr Could you add @Javelinjs to whitelist?

@yanboliang
Copy link
Contributor

@Javelinjs Could you add unit test cases for this function? Then Jenkins could run tests for you patch after @mengxr added you to whitelist.

@yzhliu
Copy link
Member Author

yzhliu commented Oct 26, 2015

@yanboliang Sure I'll work on it soon. Thank you.

@yanboliang
Copy link
Contributor

It looks well, but I curious about whether we can figure out a more elegant way like this:

class BlockMatrix[M <: Matrix] @Since("1.3.0") (
    @Since("1.3.0") val blocks: RDD[((Int, Int), M)],
    @Since("1.3.0") val rowsPerBlock: Int,
    @Since("1.3.0") val colsPerBlock: Int,
    private var nRows: Long,
    private var nCols: Long) extends DistributedMatrix with Logging {
}

Can we embrace all these cases together and make an unified constructor?
I'm not sure it can work, just a proposal. Please let me know if it's unrealistic.

@yzhliu
Copy link
Member Author

yzhliu commented Oct 26, 2015

@yanboliang Your constructor can build. But this implies exposing [M <: Matrix] to users, we will see something like bm: BlockMatrix[SparseMatrix] here and there, and old codes might cause type mismatch like Type mismatch, expected: BlockMatrix, actual: BlockMatrix[SparseMatrix]

In my opinion, I think BlockMatrix should hide the underlying implementation of its sub-matrix. Please let me know if my idea has any problem.

@yzhliu yzhliu force-pushed the i10757-java-friendly-dist-matrices branch from 8dbd4c5 to 7260df6 Compare October 27, 2015 14:59
@yzhliu
Copy link
Member Author

yzhliu commented Oct 27, 2015

@yanboliang @mengxr I've added some test cases.

@rxin
Copy link
Contributor

rxin commented Jun 15, 2016

Thanks for the pull request. I'm going through a list of pull requests to cut them down since the sheer number is breaking some of the tooling we have. Due to lack of activity on this pull request, I'm going to push a commit to close it. Feel free to reopen it or create a new one.

@asfgit asfgit closed this in 1a33f2e Jun 15, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants