Skip to content
This repository has been archived by the owner on Dec 17, 2024. It is now read-only.

Commit

Permalink
check-in remote shuffle based on DAOS Object API (#5)
Browse files Browse the repository at this point in the history
* reconstruct project and add new shuffle-daos plugin

Signed-off-by: jiafu zhang <[email protected]>

* corrected location of scalastyle-config

Signed-off-by: jiafu zhang <[email protected]>

* use daos-java version of 1.1.4

Signed-off-by: jiafu zhang <[email protected]>
  • Loading branch information
jiafuzha authored Feb 1, 2021
1 parent 292d5bd commit 99e53f0
Show file tree
Hide file tree
Showing 79 changed files with 7,513 additions and 285 deletions.
38 changes: 34 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Remote Shuffle
# Remote Shuffle Plugins

## Online Documentation

Expand All @@ -9,10 +9,18 @@ You can find the all the PMem Spill documents on the [project web page](https://
- [User Guide](#userguide)

## Introduction
Remote Shuffle is a Spark* ShuffleManager plugin, shuffling data through a remote Hadoop-compatible file system, as opposed to vanilla Spark's local-disks.
Remote Shuffle is a Spark* ShuffleManager plugin, shuffling data through a remote datastore, as opposed to vanilla Spark's local-disks.

This is an essential part of enabling Spark on disaggregated compute and storage architecture.

There are two shuffle plugins in this project.
- shuffle-hadoop, A remote shuffle plugin based Hadoop filesystem.
This plugin can work with any remote filesystems compatible with Hadoop, like HDFS, AWS S3 and [DAOS](https://github.com/daos-stack/daos).
- shuffle-daos
Different from the above general plugin based on Hadoop Filesystem interface, this plugin bases on DAOS Object API.
Thanks to DAOS Distribution Key and Attribute Key, we can improve performance by constructing shuffle output like
below.
![](./shuffle-daos/images/shuffle.png)

### Installation
We have provided a Conda package which will automatically install dependencies needed by OAP, you can refer to [OAP-Installation-Guide](./docs/OAP-Installation-Guide.md) for more information. If you have finished [OAP-Installation-Guide](./docs/OAP-Installation-Guide.md), you can find compiled OAP jars in `$HOME/miniconda2/envs/oapenv/oap_jars/`.
Expand All @@ -23,12 +31,12 @@ We have provided a Conda package which will automatically install dependencies n
We have provided a Conda package which will automatically install dependencies needed by OAP, you can refer to [OAP-Installation-Guide](./docs/OAP-Installation-Guide.md) for more information. If you have finished [OAP-Installation-Guide](./docs/OAP-Installation-Guide.md), you can find compiled remote shuffle jars under `$HOME/miniconda2/envs/oapenv/oap_jars`.
Then just skip this section and jump to [User Guide](#user-guide).

Build this module using the following command in `OAP/oap-shuffle/remote-shuffle` folder. This file needs to be deployed on every compute node that runs Spark. Manually place it on all nodes or let resource manager do the work.
Build using the following command in `OAP/remote-shuffle` folder. This file needs to be deployed on every compute node that runs Spark. Manually place it on all nodes or let resource manager do the work.

```
mvn -DskipTests clean package
```
## User Guide
## <a name="g1"></a>User Guide (shuffle-hadoop)
### Enable Remote Shuffle

Add the `.jar` files to the classpath of Spark driver and executors: Put the
Expand Down Expand Up @@ -212,3 +220,25 @@ shuffle reader:
shuffle storage daos://default:1
shuffle folder: /tmp/shuffle
```

## User Guide (shuffle-daos)

Most of [User Guide (shuffle-hadoop)](#g1) can be applied to shuffle-daos. We'll not repeat them here. Just show
differences here.

### Shuffle Manager

```
spark.shuffle.manager org.apache.spark.shuffle.daos.DaosShuffleManager
```

### Classpath

```
spark.executor.extraClassPath $HOME/miniconda2/envs/oapenv/oap_jars/daos-java-<version>.jar
$HOME/miniconda2/envs/oapenv/oap_jars/hadoop-daos-<version>.jar
$HOME/miniconda2/envs/oapenv/oap_jars/shuffle-daos-<version>.jar
spark.driver.extraClassPath $HOME/miniconda2/envs/oapenv/oap_jars/daos-java-<version>.jar
$HOME/miniconda2/envs/oapenv/oap_jars/hadoop-daos-<version>.jar
$HOME/miniconda2/envs/oapenv/oap_jars/shuffle-daos-<version>.jar
```
Loading

0 comments on commit 99e53f0

Please sign in to comment.