Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-1957] [WIP] Pluggable Diskstore for BlockManager #907

Closed
wants to merge 1 commit into from

Conversation

colorant
Copy link
Contributor

No description provided.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15271/

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@witgo
Copy link
Contributor

witgo commented May 29, 2014

@colorant
This is a big changes. Can you explain this change reason?

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15274/

@colorant
Copy link
Contributor Author

Hi @witgo This is aim to aligned with the spark pluggable storage roadmap. say support different storage media across different existing / emerging hardware. ( for now, say you could put data on SSD for performance, and would fall back to HDD when SSD is full etc. )

And talked with @pwendell a little bit on this before. He suggested that they (Databricks) will come up with a API redesign for pluggable storage. At the same time we can try to add a disk store layer firstly to verify some ideas.

@colorant
Copy link
Contributor Author

@colorant
Copy link
Contributor Author

And the implementation here is try to minimize the impact to current spark code base. Especially the shuffle part of code as described in the jira. which might be improved later together with shuffle framework itself. currently, Just try to make the wheel runs.

And there are improvements needed on the quota control part of logic. Say enhance the quota control to make it work on path/disk bases instead of store bases as a whole to reduce the chance of out of disk space issue ( though as whole the store's quotation is not overrun yet) etc. And would benefit from catching varous out of disk space exception etc.

However, if the disk space is not a issue. I think this one just works fine. And I also have tried to make sure that if not extra config is done on diskstore. the general data flow path is almost identical as the current approaching, overhead is neglec table.

@colorant
Copy link
Contributor Author

colorant commented Jun 4, 2014

@pwendell would you mind take look on this one ;)

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15748/

@colorant
Copy link
Contributor Author

Hi @andrewor14 , other than #1209 , also this one is related to BM, could you also take a look on the general idea ? I know the code need a rebase to the latest code, but I am seek for a general feedback about the ideas ;)

@andrewor14
Copy link
Contributor

@colorant Can you file a JIRA to describe what your'e trying to achieve here? Also, how is this related to #1209? If this is still relevant, can you up merge this to master and resolve the conflicts?

@colorant
Copy link
Contributor Author

colorant commented Sep 4, 2014

@andrewor14, JIRA ticket is: https://issues.apache.org/jira/browse/SPARK-1957. And #1209 is about to make the data format more general for BM. which also modified a lot of files related in this PR. While, yes, since rxin is pushing ManagedBuffer for data transfer, That PR might not be necessary, I will close that one if this is true.

@andrewor14
Copy link
Contributor

Can you add it to the title?

@colorant colorant changed the title Pluggable Diskstore for BlockManager [SPARK-1957] Pluggable Diskstore for BlockManager Sep 4, 2014
@colorant colorant changed the title [SPARK-1957] Pluggable Diskstore for BlockManager [SPARK-1957] [WIP] Pluggable Diskstore for BlockManager Sep 5, 2014
@pwendell
Copy link
Contributor

Thit is useful as a prototype, I'd prefer to close this issue as an active review. We can use this as a starting point if we revisit the internal interfaces here.

@asfgit asfgit closed this in f73b56f Nov 10, 2014
@liyezhang556520
Copy link
Contributor

Hi @pwendell , We (I and @colorant) are still following this issue, it will be nice for spark to support the pluggable storage, since it can take advantage of the total memory hierarchy (2LM, NVM, SSD, HDD, etc.). @colorant talked with you about API design for pluggable storage, and we'd like to know your point of view on this, and what we can do to keep this going on? And we can make a discussion on JIRA. Thanks.

Agirish pushed a commit to HPEEzmeral/apache-spark that referenced this pull request May 5, 2022
…pache#913)

* MapR [SPARK-953] Investigate and add all needed changes for Spark services (apache#905)

* [EZSPA-347] Find a way to pass sensitive configs in secure manner (apache#907)

* MapR [SPARK-961] Spark job can't be properly killed using yarn API or CLI (apache#908)

* MapR [SPARK-962] MSSQL can not handle SQL syntax which is used in Spark (apache#909)

* MapR [SPARK-963] select from hbase table which was created via hive fails (apache#910)

Co-authored-by: Dmitry Popkov <[email protected]>
Co-authored-by: Andrew Khalymon <[email protected]>
udaynpusa pushed a commit to mapr/spark that referenced this pull request Jan 30, 2024
…pache#913)

* MapR [SPARK-953] Investigate and add all needed changes for Spark services (apache#905)

* [EZSPA-347] Find a way to pass sensitive configs in secure manner (apache#907)

* MapR [SPARK-961] Spark job can't be properly killed using yarn API or CLI (apache#908)

* MapR [SPARK-962] MSSQL can not handle SQL syntax which is used in Spark (apache#909)

* MapR [SPARK-963] select from hbase table which was created via hive fails (apache#910)

Co-authored-by: Dmitry Popkov <[email protected]>
Co-authored-by: Andrew Khalymon <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants