[SPARK-1957] [WIP] Pluggable Diskstore for BlockManager #907

colorant · 2014-05-29T01:01:15Z

No description provided.

AmplabJenkins · 2014-05-29T01:02:58Z

Merged build triggered.

AmplabJenkins · 2014-05-29T01:03:05Z

Merged build started.

AmplabJenkins · 2014-05-29T01:04:29Z

Merged build finished.

AmplabJenkins · 2014-05-29T01:04:29Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15271/

AmplabJenkins · 2014-05-29T01:52:58Z

Merged build triggered.

AmplabJenkins · 2014-05-29T01:53:05Z

Merged build started.

witgo · 2014-05-29T02:22:16Z

@colorant
This is a big changes. Can you explain this change reason?

AmplabJenkins · 2014-05-29T02:28:52Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-05-29T02:28:52Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15274/

colorant · 2014-05-29T05:30:33Z

Hi @witgo This is aim to aligned with the spark pluggable storage roadmap. say support different storage media across different existing / emerging hardware. ( for now, say you could put data on SSD for performance, and would fall back to HDD when SSD is full etc. )

And talked with @pwendell a little bit on this before. He suggested that they (Databricks) will come up with a API redesign for pluggable storage. At the same time we can try to add a disk store layer firstly to verify some ideas.

colorant · 2014-05-29T05:32:42Z

Hi, jira here:

https://issues.apache.org/jira/browse/SPARK-1957

colorant · 2014-05-29T05:52:50Z

And the implementation here is try to minimize the impact to current spark code base. Especially the shuffle part of code as described in the jira. which might be improved later together with shuffle framework itself. currently, Just try to make the wheel runs.

And there are improvements needed on the quota control part of logic. Say enhance the quota control to make it work on path/disk bases instead of store bases as a whole to reduce the chance of out of disk space issue ( though as whole the store's quotation is not overrun yet) etc. And would benefit from catching varous out of disk space exception etc.

However, if the disk space is not a issue. I think this one just works fine. And I also have tried to make sure that if not extra config is done on diskstore. the general data flow path is almost identical as the current approaching, overhead is neglec table.

colorant · 2014-06-04T00:57:12Z

@pwendell would you mind take look on this one ;)

AmplabJenkins · 2014-06-13T01:42:05Z

Merged build triggered.

AmplabJenkins · 2014-06-13T01:49:59Z

Merged build started.

AmplabJenkins · 2014-06-13T02:32:57Z

Merged build finished.

AmplabJenkins · 2014-06-13T02:32:57Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15748/

colorant · 2014-06-25T07:47:41Z

Hi @andrewor14 , other than #1209 , also this one is related to BM, could you also take a look on the general idea ? I know the code need a rebase to the latest code, but I am seek for a general feedback about the ideas ;)

andrewor14 · 2014-09-03T22:12:09Z

@colorant Can you file a JIRA to describe what your'e trying to achieve here? Also, how is this related to #1209? If this is still relevant, can you up merge this to master and resolve the conflicts?

colorant · 2014-09-04T00:41:17Z

@andrewor14, JIRA ticket is: https://issues.apache.org/jira/browse/SPARK-1957. And #1209 is about to make the data format more general for BM. which also modified a lot of files related in this PR. While, yes, since rxin is pushing ManagedBuffer for data transfer, That PR might not be necessary, I will close that one if this is true.

andrewor14 · 2014-09-04T01:44:16Z

Can you add it to the title?

pwendell · 2014-11-10T01:47:33Z

Thit is useful as a prototype, I'd prefer to close this issue as an active review. We can use this as a starting point if we revisit the internal interfaces here.

liyezhang556520 · 2014-11-10T03:12:35Z

Hi @pwendell , We (I and @colorant) are still following this issue, it will be nice for spark to support the pluggable storage, since it can take advantage of the total memory hierarchy (2LM, NVM, SSD, HDD, etc.). @colorant talked with you about API design for pluggable storage, and we'd like to know your point of view on this, and what we can do to keep this going on? And we can make a discussion on JIRA. Thanks.

…pache#913) * MapR [SPARK-953] Investigate and add all needed changes for Spark services (apache#905) * [EZSPA-347] Find a way to pass sensitive configs in secure manner (apache#907) * MapR [SPARK-961] Spark job can't be properly killed using yarn API or CLI (apache#908) * MapR [SPARK-962] MSSQL can not handle SQL syntax which is used in Spark (apache#909) * MapR [SPARK-963] select from hbase table which was created via hive fails (apache#910) Co-authored-by: Dmitry Popkov <[email protected]> Co-authored-by: Andrew Khalymon <[email protected]>

colorant changed the title ~~Pluggable Diskstore for BlockManager~~ [SPARK-1957] Pluggable Diskstore for BlockManager Sep 4, 2014

colorant changed the title ~~[SPARK-1957] Pluggable Diskstore for BlockManager~~ [SPARK-1957] [WIP] Pluggable Diskstore for BlockManager Sep 5, 2014

Initial commit for hierarchy disk store

9d8e4d8

colorant force-pushed the diskstore branch from 65d563e to 9d8e4d8 Compare September 10, 2014 05:52

asfgit closed this in f73b56f Nov 10, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-1957] [WIP] Pluggable Diskstore for BlockManager #907

[SPARK-1957] [WIP] Pluggable Diskstore for BlockManager #907

colorant commented May 29, 2014

AmplabJenkins commented May 29, 2014

AmplabJenkins commented May 29, 2014

AmplabJenkins commented May 29, 2014

AmplabJenkins commented May 29, 2014

AmplabJenkins commented May 29, 2014

AmplabJenkins commented May 29, 2014

witgo commented May 29, 2014

AmplabJenkins commented May 29, 2014

AmplabJenkins commented May 29, 2014

colorant commented May 29, 2014

colorant commented May 29, 2014

colorant commented May 29, 2014

colorant commented Jun 4, 2014

AmplabJenkins commented Jun 13, 2014

AmplabJenkins commented Jun 13, 2014

AmplabJenkins commented Jun 13, 2014

AmplabJenkins commented Jun 13, 2014

colorant commented Jun 25, 2014

andrewor14 commented Sep 3, 2014

colorant commented Sep 4, 2014

andrewor14 commented Sep 4, 2014

pwendell commented Nov 10, 2014

liyezhang556520 commented Nov 10, 2014

[SPARK-1957] [WIP] Pluggable Diskstore for BlockManager #907

[SPARK-1957] [WIP] Pluggable Diskstore for BlockManager #907

Conversation

colorant commented May 29, 2014

AmplabJenkins commented May 29, 2014

AmplabJenkins commented May 29, 2014

AmplabJenkins commented May 29, 2014

AmplabJenkins commented May 29, 2014

AmplabJenkins commented May 29, 2014

AmplabJenkins commented May 29, 2014

witgo commented May 29, 2014

AmplabJenkins commented May 29, 2014

AmplabJenkins commented May 29, 2014

colorant commented May 29, 2014

colorant commented May 29, 2014

colorant commented May 29, 2014

colorant commented Jun 4, 2014

AmplabJenkins commented Jun 13, 2014

AmplabJenkins commented Jun 13, 2014

AmplabJenkins commented Jun 13, 2014

AmplabJenkins commented Jun 13, 2014

colorant commented Jun 25, 2014

andrewor14 commented Sep 3, 2014

colorant commented Sep 4, 2014

andrewor14 commented Sep 4, 2014

pwendell commented Nov 10, 2014

liyezhang556520 commented Nov 10, 2014