-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-1957] [WIP] Pluggable Diskstore for BlockManager #907
Conversation
Merged build triggered. |
Merged build started. |
Merged build finished. |
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15271/ |
Merged build triggered. |
Merged build started. |
@colorant |
Merged build finished. All automated tests passed. |
All automated tests passed. |
Hi @witgo This is aim to aligned with the spark pluggable storage roadmap. say support different storage media across different existing / emerging hardware. ( for now, say you could put data on SSD for performance, and would fall back to HDD when SSD is full etc. ) And talked with @pwendell a little bit on this before. He suggested that they (Databricks) will come up with a API redesign for pluggable storage. At the same time we can try to add a disk store layer firstly to verify some ideas. |
Hi, jira here: |
And the implementation here is try to minimize the impact to current spark code base. Especially the shuffle part of code as described in the jira. which might be improved later together with shuffle framework itself. currently, Just try to make the wheel runs. And there are improvements needed on the quota control part of logic. Say enhance the quota control to make it work on path/disk bases instead of store bases as a whole to reduce the chance of out of disk space issue ( though as whole the store's quotation is not overrun yet) etc. And would benefit from catching varous out of disk space exception etc. However, if the disk space is not a issue. I think this one just works fine. And I also have tried to make sure that if not extra config is done on diskstore. the general data flow path is almost identical as the current approaching, overhead is neglec table. |
@pwendell would you mind take look on this one ;) |
Merged build triggered. |
Merged build started. |
Merged build finished. |
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15748/ |
Hi @andrewor14 , other than #1209 , also this one is related to BM, could you also take a look on the general idea ? I know the code need a rebase to the latest code, but I am seek for a general feedback about the ideas ;) |
@andrewor14, JIRA ticket is: https://issues.apache.org/jira/browse/SPARK-1957. And #1209 is about to make the data format more general for BM. which also modified a lot of files related in this PR. While, yes, since rxin is pushing ManagedBuffer for data transfer, That PR might not be necessary, I will close that one if this is true. |
Can you add it to the title? |
Thit is useful as a prototype, I'd prefer to close this issue as an active review. We can use this as a starting point if we revisit the internal interfaces here. |
Hi @pwendell , We (I and @colorant) are still following this issue, it will be nice for spark to support the pluggable storage, since it can take advantage of the total memory hierarchy (2LM, NVM, SSD, HDD, etc.). @colorant talked with you about API design for pluggable storage, and we'd like to know your point of view on this, and what we can do to keep this going on? And we can make a discussion on JIRA. Thanks. |
…pache#913) * MapR [SPARK-953] Investigate and add all needed changes for Spark services (apache#905) * [EZSPA-347] Find a way to pass sensitive configs in secure manner (apache#907) * MapR [SPARK-961] Spark job can't be properly killed using yarn API or CLI (apache#908) * MapR [SPARK-962] MSSQL can not handle SQL syntax which is used in Spark (apache#909) * MapR [SPARK-963] select from hbase table which was created via hive fails (apache#910) Co-authored-by: Dmitry Popkov <[email protected]> Co-authored-by: Andrew Khalymon <[email protected]>
…pache#913) * MapR [SPARK-953] Investigate and add all needed changes for Spark services (apache#905) * [EZSPA-347] Find a way to pass sensitive configs in secure manner (apache#907) * MapR [SPARK-961] Spark job can't be properly killed using yarn API or CLI (apache#908) * MapR [SPARK-962] MSSQL can not handle SQL syntax which is used in Spark (apache#909) * MapR [SPARK-963] select from hbase table which was created via hive fails (apache#910) Co-authored-by: Dmitry Popkov <[email protected]> Co-authored-by: Andrew Khalymon <[email protected]>
No description provided.