Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-8514] LU factorization on BlockMatrix #8563

Closed
wants to merge 22 commits into from

Conversation

nilmeier
Copy link

@nilmeier nilmeier commented Sep 2, 2015

@mengxr and/or @shivaram may be reviewing.

@nilmeier
Copy link
Author

nilmeier commented Sep 2, 2015

I have been corresponding with Xiangrui Meng on this. (I'm with IBM
[email protected]). I also sent him an email just now. Cheers, Jerome

On Tue, Sep 1, 2015 at 10:38 PM, UCB AMPLab [email protected]
wrote:

Can one of the admins verify this patch?


Reply to this email directly or view it on GitHub
#8563 (comment).

Jerome Nilmeier, PhD
Cell: 510-325-8695
Home: 925-292-5321

@mengxr
Copy link
Contributor

mengxr commented Sep 2, 2015

add to whitelist

@mengxr
Copy link
Contributor

mengxr commented Sep 2, 2015

ok to test

@nilmeier
Copy link
Author

nilmeier commented Sep 2, 2015

Thanks Xiangrui! Cheers, J

Sent from my iPhone

On Sep 1, 2015, at 11:22 PM, Xiangrui Meng [email protected] wrote:

ok to test


Reply to this email directly or view it on GitHub.

@SparkQA
Copy link

SparkQA commented Sep 2, 2015

Test build #41914 has finished for PR 8563 at commit 2f62cca.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class DCT(JavaTransformer, HasInputCol, HasOutputCol):
    • class SQLTransformer(JavaTransformer):
    • class StopWordsRemover(JavaTransformer, HasInputCol, HasOutputCol):

@shivaram
Copy link
Contributor

shivaram commented Sep 2, 2015

@nilmeier Do you have a reference to a paper which analyses the running time and communication costs for the algorithm implemented here ?

@nilmeier
Copy link
Author

nilmeier commented Sep 2, 2015

This approach has some similarity to the CALU paper that you posted, and
follows what the paper describes as "classic right looking algorithms",
(p3). There are differences to our approach, which I discussed in a How
it Works
section in the documentation. We don't have a publication for
this work as of yet.

In terms of running time, I have some single node (i7 macbook) data
(attached). The scaling here for the LU calc appears to be n^3.5, where n
is the number of rowBlocks. The current approach is n^3 at best.
We're running timings on a 10 node (24 core ea.) cluster, and should have
some more comprehensive data for you shortly. Please let me know if I can
provide anything else in the meantime, or if you'd like to meet to discuss.

Sincerely, Jerome

On Wed, Sep 2, 2015 at 10:19 AM, Shivaram Venkataraman <
[email protected]> wrote:

@nilmeier https://github.com/nilmeier Do you have a reference to a
paper which analyses the running time and communication costs for the
algorithm implemented here ?


Reply to this email directly or view it on GitHub
#8563 (comment).

Jerome Nilmeier, PhD
Cell: 510-325-8695
Home: 925-292-5321

@nilmeier
Copy link
Author

nilmeier commented Sep 7, 2015

Hello @shivaram:

Here are some more timings from a 10 node cluster. npb in the legend is
the number of entries per block. The x axis is the number of blocks.

For these cases, I am seeing better than n^3 scaling, which suggests that
there are other slower processes that are showing up here.

We run into stack overflow errors for a large number of blocks. If we
adjust the stacksize, we can run larger blocks. Here, we report the 'out
of the box' settings.

I'm using a cluster that is dedicated to another project for these timings,
so it is kind of hard to get a lot of data, but we can continue to work
through timings if you like.

I have attached the scripts used to generate the data. Please forgive the
fact that these are very ad hoc, but you can see how the timings are
carried out.

Please let me know if these are helpful for you, or if you need anything
else.

Sincerely, Jerome

On Wed, Sep 2, 2015 at 11:05 AM, Jerome Nilmeier [email protected] wrote:

This approach has some similarity to the CALU paper that you posted, and
follows what the paper describes as "classic right looking algorithms",
(p3). There are differences to our approach, which I discussed in a How
it Works
section in the documentation. We don't have a publication for
this work as of yet.

In terms of running time, I have some single node (i7 macbook) data
(attached). The scaling here for the LU calc appears to be n^3.5, where n
is the number of rowBlocks. The current approach is n^3 at best.
We're running timings on a 10 node (24 core ea.) cluster, and should have
some more comprehensive data for you shortly. Please let me know if I can
provide anything else in the meantime, or if you'd like to meet to discuss.

Sincerely, Jerome

On Wed, Sep 2, 2015 at 10:19 AM, Shivaram Venkataraman <
[email protected]> wrote:

@nilmeier https://github.com/nilmeier Do you have a reference to a
paper which analyses the running time and communication costs for the
algorithm implemented here ?


Reply to this email directly or view it on GitHub
#8563 (comment).

Jerome Nilmeier, PhD
Cell: 510-325-8695
Home: 925-292-5321

Jerome Nilmeier, PhD
Cell: 510-325-8695
Home: 925-292-5321

@shivaram
Copy link
Contributor

shivaram commented Sep 8, 2015

@nilmeier Could you post a link to the timing graph ? I can't seem to find it on the JIRA or here on github.

@nilmeier
Copy link
Author

nilmeier commented Sep 8, 2015

Hello @shivaram

Sorry to delay. Here are some first pass timings on our 10 node cluster..
https://drive.google.com/folderview?id=0B4hDkPpoay0gMTlZV3QwMW9LZU0&usp=sharing

There is a README in this drive directory. Please feel free to contact me
with more questions.

Sincerely, Jerome

On Tue, Sep 8, 2015 at 9:48 AM, Shivaram Venkataraman <
[email protected]> wrote:

@nilmeier https://github.com/nilmeier Could you post a link to the
timing graph ? I can't seem to find it on the JIRA or here on github.


Reply to this email directly or view it on GitHub
#8563 (comment).

Jerome Nilmeier, PhD
Cell: 510-325-8695
Home: 925-292-5321

@nilmeier
Copy link
Author

nilmeier commented Sep 9, 2015

Hi @shivaram

Some of my colleagues here at IBM have noted that some of the BlockMatrix
constructions in the blockLU routine may be improved. I incorporated a
similar improvement in the testImplementation.scala script, which seemed to
help. I thought it might be best to wait to try to incorporate such an
improvement in the blockLU routine, however, until the review has been
completed. Please let me know what your thoughts on the matter are when
you have time.

Sincerely, @nilmeier

On Tue, Sep 8, 2015 at 3:19 PM, Jerome Nilmeier [email protected] wrote:

Hello @shivaram

Sorry to delay. Here are some first pass timings on our 10 node cluster..

https://drive.google.com/folderview?id=0B4hDkPpoay0gMTlZV3QwMW9LZU0&usp=sharing

There is a README in this drive directory. Please feel free to contact me
with more questions.

Sincerely, Jerome

On Tue, Sep 8, 2015 at 9:48 AM, Shivaram Venkataraman <
[email protected]> wrote:

@nilmeier https://github.com/nilmeier Could you post a link to the
timing graph ? I can't seem to find it on the JIRA or here on github.


Reply to this email directly or view it on GitHub
#8563 (comment).

Jerome Nilmeier, PhD
Cell: 510-325-8695
Home: 925-292-5321

Jerome Nilmeier, PhD
Cell: 510-325-8695
Home: 925-292-5321

@nilmeier
Copy link
Author

Hello @shivaram and @mengxr

I am working through some updates to this pull request that include some minor changes over the next week in preparation for an internal code review. Mostly, these will be minor changes.

I will also be adding a BlockMatrix.solve method that will use LU to solve AX=B for X. Should I include all of these updates in the current pull request?

Cheers, Jerome

@SparkQA
Copy link

SparkQA commented Sep 24, 2015

Test build #42966 has finished for PR 8563 at commit 2651b83.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@frreiss
Copy link
Contributor

frreiss commented Sep 25, 2015

Review comments:

--> Try making the portion of the examples with input data more condensed; perhaps
by reading the matrices from string constants.

--> There's an orphan/duplicate ScalaDoc comment at line 356 of BlockMatrix.scala

--> Remove carriage return on lines 370, 481, 569, 732

--> Recommend adding a call to the new subtract method to the MLLib
programmer's guide

--> New API calls to BlockMatrix should have corresponding PySpark APIs

--> Error message at line 394 should print out the block sizes that don't match

--> The code at line 384 should multiply every element of b.head by -1 as far
as I can see

--> Line 456 and 465-471 have wrong indentation

--> Scaladoc at 474 should state that blockRowRange and blockColRange are block
indexes, not row/column indexes

--> In lines 460-463, consider making a single pass over the blocks instead of
4 passes

--> Add a note to SchurComplement that the current implementation assumes that
a11 fits into memory on the driver

--> Might want to use a case class in return type of blockLUtoSolver

--> Take a close look at the performance impact of the chains of
multiplications at line 811 when there are many levels of recursion

--> In recursiveSequencesBuild, you may want to break off more than one block
from the upper left corner of the matrix; in many cases, the available memory
on the driver can comfortably hold, say 10x10 blocks. You should be able to
query the SparkContext's memory information to determine how much heap
space is available for the in-memory portion of the computation. On a
side note, it would be good to produce a user-friendly error if it looks
like there is not enough local heap to process one block's data locally.

--> Might want to reuse buffers for the local data allocated at lines 623-629
to avoid triggering global GC at each level of recursion

@nilmeier
Copy link
Author

Thank you @frreiss for the review! I will address these issues and update the pull request in the next few days. Cheers, J

@nilmeier
Copy link
Author

nilmeier commented Oct 5, 2015

Addressed Fred's Review Comments in the most recent update. I was not able to fully address all issues raised...they are described below:

Review comments:

  1. --> Try making the portion of the examples with input data more condensed; perhaps
    by reading the matrices from string constants
    X Done.

  2. --> There's an orphan/duplicate ScalaDoc comment at line 356 of BlockMatrix.scala
    X Done.

  3. --> Remove carriage return on lines 370, 481, 569, 732
    X Done.

  4. --> Recommend adding a call to the new subtract method to the MLLib
    programmer's guide
    X Done.

  5. --> New API calls to BlockMatrix should have corresponding PySpark APIs
    O Mike Dusenberry has an open JIRA I will be meeting with Mike Dusenberry who has
    done Python API work before.

  6. --> Error message at line 394 should print out the block sizes that don't match
    X added error message to .add and .subtract

  7. --> The code at line 384 should multiply every element of b.head by -1 as far
    as I can see
    X Fixed this error.

  8. --> Line 456 and 465-471 have wrong indentation
    X Done.

  9. --> Scaladoc at 474 should state that blockRowRange and blockColRange are block
    indexes, not row/column indexes
    X Done.

  10. --> In lines 460-463, consider making a single pass over the blocks instead of
    4 passes
    O I couldn't see an easy way to do this that didn't introduce additional layers of recursion
    that may impact performance. I would like to add this to a later revision, as the recursive
    building procedures are a general design issue to consider here.

  11. --> Add a note to SchurComplement that the current implementation assumes that
    a11 fits into memory on the driver
    X Done.

  12. --> Might want to use a case class in return type of blockLUtoSolver
    O Will try to do this in the near future, but I didn't include the changes in this revision.

  13. --> Take a close look at the performance impact of the chains of
    multiplications at line 811 when there are many levels of recursion

O We did some studies, and the code would benefit from refactoring to remove recursion.
It would require a significant rewrite, and might be beyond the scope of the initial
submission.

  1. --> In recursiveSequencesBuild, you may want to break off more than one block
    from the upper left corner of the matrix; in many cases, the available memory
    on the driver can comfortably hold, say 10x10 blocks. You should be able to
    query the SparkContext's memory information to determine how much heap
    space is available for the in-memory portion of the computation. On a
    side note, it would be good to produce a user-friendly error if it looks
    like there is not enough local heap to process one block's data locally.

O This may require a significant rewrite to handle correctly....I would like to try this for the next revision.

  1. --> Might want to reuse buffers for the local data allocated at lines 623-629
    to avoid triggering global GC at each level of recursion.

O I would like to explore this as well in the future, but I didn't address it in this update.




/** Computes the LU Decomposition of a Square Matrix. For a matrix A of size (n x n)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For scala doc, we use

/**
 * Start from here
 * End here
 */

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in a second commit.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected in a second commit.

@SparkQA
Copy link

SparkQA commented Oct 11, 2015

Test build #43537 has finished for PR 8563 at commit 032805d.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class LUSequences(p: RDD[((Int, Int), Matrix)], l: RDD[((Int, Int), Matrix)],

@SparkQA
Copy link

SparkQA commented Oct 11, 2015

Test build #43539 has finished for PR 8563 at commit 2984c64.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class LUSequences(p: RDD[((Int, Int), Matrix)], l: RDD[((Int, Int), Matrix)],

@SparkQA
Copy link

SparkQA commented Oct 11, 2015

Test build #43545 has finished for PR 8563 at commit be23343.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class LUSequences(p: RDD[((Int, Int), Matrix)], l: RDD[((Int, Int), Matrix)],

@SparkQA
Copy link

SparkQA commented Oct 11, 2015

Test build #43550 has finished for PR 8563 at commit 0909197.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class LUSequences(p: RDD[((Int, Int), Matrix)], l: RDD[((Int, Int), Matrix)],

@SparkQA
Copy link

SparkQA commented Oct 11, 2015

Test build #43549 has finished for PR 8563 at commit 3038bfd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class PLU(p: BlockMatrix, l: BlockMatrix, u: BlockMatrix )
    • class PLUandInverses(p: BlockMatrix, l: BlockMatrix, u: BlockMatrix,
    • class LUSequences(p: RDD[((Int, Int), Matrix)], l: RDD[((Int, Int), Matrix)],

@@ -312,7 +313,7 @@ class BlockMatrix @Since("1.3.0") (
}

/** Collects data and assembles a local dense breeze matrix (for test only). */
private[mllib] def toBreeze(): BDM[Double] = {
private [mllib] def toBreeze(): BDM[Double] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove a space after private

@nilmeier
Copy link
Author

nilmeier commented Nov 6, 2015

Thank you for the comments Yu. I'll update these in the next few days. Best, Jerome

@yu-iskw
Copy link
Contributor

yu-iskw commented Nov 6, 2015

@nilmeier thanks. This document would help you with contributing to Spark.
https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide

@nilmeier
Copy link
Author

nilmeier commented Nov 7, 2015 via email

@SparkQA
Copy link

SparkQA commented Nov 20, 2015

Test build #46398 has finished for PR 8563 at commit ac0f3cd.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * case class LUSequences(\n

@SparkQA
Copy link

SparkQA commented Nov 20, 2015

Test build #46395 has finished for PR 8563 at commit 66cc11d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * class LUSequences(p: RDD[((Int, Int), Matrix)], l: RDD[((Int, Int), Matrix)],\n

@nilmeier
Copy link
Author

@mengxr and/or @shivaram: can you advise on status of pull request? Best @nilmeier

@rxin
Copy link
Contributor

rxin commented Jun 15, 2016

Thanks for the pull request. I'm going through a list of pull requests to cut them down since the sheer number is breaking some of the tooling we have. Due to lack of activity on this pull request, I'm going to push a commit to close it. Feel free to reopen it or create a new one.

@dbtsai there are a few pull requests that were waiting on your review. Can you revisit them even if they are closed?

@asfgit asfgit closed this in 1a33f2e Jun 15, 2016
@nilmeier
Copy link
Author

Yes, I hadn't heard back from anyone on this in some time...was this
reviewed?

On Wed, Jun 15, 2016 at 3:08 PM, Reynold Xin [email protected]
wrote:

Thanks for the pull request. I'm going through a list of pull requests to
cut them down since the sheer number is breaking some of the tooling we
have. Due to lack of activity on this pull request, I'm going to push a
commit to close it. Feel free to reopen it or create a new one.

@dbtsai https://github.com/dbtsai there are a few pull requests that
were waiting on your review. Can you revisit them even if they are closed?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#8563 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/AHkV36swuvv7ldTi0NAQr9x_NbNtT0ISks5qMHfHgaJpZM4F2Q2Y
.

Jerome Nilmeier, PhD
Cell: 510-325-8695
Home: 925-292-5321

@nilmeier
Copy link
Author

@dbtsai https://github.com/dbtsai Was this algorithm in okay shape? We
can probably resubmit this if needed, but we can also just develop an
offline version of it if that is easier. Cheers, J

On Wed, Jun 15, 2016 at 5:26 PM, Jerome Nilmeier [email protected] wrote:

Yes, I hadn't heard back from anyone on this in some time...was this
reviewed?

On Wed, Jun 15, 2016 at 3:08 PM, Reynold Xin [email protected]
wrote:

Thanks for the pull request. I'm going through a list of pull requests to
cut them down since the sheer number is breaking some of the tooling we
have. Due to lack of activity on this pull request, I'm going to push a
commit to close it. Feel free to reopen it or create a new one.

@dbtsai https://github.com/dbtsai there are a few pull requests that
were waiting on your review. Can you revisit them even if they are closed?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#8563 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/AHkV36swuvv7ldTi0NAQr9x_NbNtT0ISks5qMHfHgaJpZM4F2Q2Y
.

Jerome Nilmeier, PhD
Cell: 510-325-8695
Home: 925-292-5321

Jerome Nilmeier, PhD
Cell: 510-325-8695
Home: 925-292-5321

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants