Skip to content

Commit

Permalink
[ML-137] [Core] Multiple improvements for build & deploy and integrat…
Browse files Browse the repository at this point in the history
…e oneAPI 2021.4 (#139)

* Move libs to resource and use oneCCL in oneAPI ToolKit (need to patch soname of libfabric.so.1)

* Add prepare build resources, workaround CCL_ROOT parsing bug for 2021.4

* nit

* Add output version for prepare-build-deps.sh

* Add dev/build-maven-local-repo.sh

* Add dal 2021.4 deps from central instead of local, clean assembly.xml, update build & test scripts

* update scripts

* update ci

* Add vscode settings
Add RELEASE and revise env.sh for test-cluster
Add exe mode to build.sh

* nit

* add lib as empty dir

* set log4j to WARN, update doc and code comments

* nit

* update env.sh & template

* update HOST_NAME

* add trap to capture script error

* Prepare hdfs data

* update scripts

* nit
  • Loading branch information
xwu99 authored Nov 4, 2021
1 parent 9871ab2 commit 0de0516
Show file tree
Hide file tree
Showing 36 changed files with 581 additions and 349 deletions.
6 changes: 6 additions & 0 deletions .github/PULL_REQUEST_TEMPLATE
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

## Does this PR also require the following changes?

- CI
- Documentation
- Example
2 changes: 1 addition & 1 deletion .github/workflows/oap-mllib-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ jobs:
~/.m2/repository
/opt/intel/oneapi
~/opt
key: ${{ runner.os }}_spark-3.1.1_hadoop-3.2.0_oneapi-2021.3.0
key: ${{ runner.os }}_spark-3.1.1_hadoop-3.2.0_oneapi-2021.4.0
restore-keys: |
${{ runner.os }}-
- name: Set up environments
Expand Down
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
*.o
*.log
.vscode
*.iml
.vscode/
target/
.idea/
.idea_modules/
19 changes: 19 additions & 0 deletions .vscode/c_cpp_properties.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
{
"configurations": [
{
"name": "Linux",
"includePath": [
"${workspaceFolder}/mllib-dal/src/main/native/**",
"${CCL_ROOT}/include/**",
"${DAALROOT}/include/**",
"${JAVA_HOME}/include/**"
],
"defines": [],
"compilerPath": "${CMPLR_ROOT}/linux/bin/clang",
"cStandard": "c17",
"cppStandard": "c++14",
"intelliSenseMode": "clang-x64"
}
],
"version": 4
}
37 changes: 37 additions & 0 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
{
"files.associations": {
"*.tcc": "cpp",
"cctype": "cpp",
"chrono": "cpp",
"cstdint": "cpp",
"ctime": "cpp",
"cwchar": "cpp",
"exception": "cpp",
"initializer_list": "cpp",
"iosfwd": "cpp",
"iostream": "cpp",
"istream": "cpp",
"limits": "cpp",
"ostream": "cpp",
"ratio": "cpp",
"string_view": "cpp",
"type_traits": "cpp",
"clocale": "cpp",
"streambuf": "cpp",
"algorithm": "cpp",
"cstdarg": "cpp",
"cstddef": "cpp",
"cstdio": "cpp",
"deque": "cpp",
"vector": "cpp",
"functional": "cpp",
"memory_resource": "cpp",
"string": "cpp",
"utility": "cpp",
"fstream": "cpp",
"iomanip": "cpp",
"new": "cpp",
"sstream": "cpp",
"*.template": "shellscript"
}
}
79 changes: 29 additions & 50 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,11 @@ OAP MLlib is an optimized package to accelerate machine learning algorithms in

## Compatibility

OAP MLlib maintains the same API interfaces with Spark MLlib. That means the application built with Spark MLlib can be running directly with minimum configuration.
OAP MLlib maintains the same API interfaces with Spark MLlib. That means the application built with Spark MLlib can be running directly with minimum configuration.

Most of the algorithms can produce the same results that are identical with Spark MLlib. However due to the nature of distributed float point operations, there may be some small deviation from the original result, we will make sure the error is within acceptable range and the accuracy is on par with Spark MLlib.
Most of the algorithms can produce the same results that are identical with Spark MLlib. However due to the nature of distributed float point operations, there may be some small deviation from the original result, we will make sure the error is within acceptable range and the accuracy is on par with Spark MLlib.

For those algorithms that are not accelerated by OAP MLlib, the original Spark MLlib one will be used.
For those algorithms that are not accelerated by OAP MLlib, the original Spark MLlib one will be used.

## Online Documentation

Expand Down Expand Up @@ -55,7 +55,7 @@ Intel® oneAPI Toolkits components used by the project are already included into
#### General Configuration

##### YARN Cluster Manager
Users usually run Spark application on __YARN__ with __client__ mode. In that case, you only need to add the following configurations in `spark-defaults.conf` or in `spark-submit` command line before running.
Users usually run Spark application on __YARN__ with __client__ mode. In that case, you only need to add the following configurations in `spark-defaults.conf` or in `spark-submit` command line before running.

```
# absolute path of the jar for uploading
Expand Down Expand Up @@ -85,22 +85,22 @@ OAP MLlib expects 1 executor acts as 1 oneCCL rank for compute. As `spark.shuffl
### Sanity Check

#### Setup `env.sh`
```
```bash
$ cd conf
$ cp env.sh.template env.sh
```
Edit related variables in "`Minimun Settings`" of `env.sh`

#### Upload example data files to HDFS
```
```bash
$ cd examples
$ hadoop fs -mkdir -p /user/$USER
$ hadoop fs -copyFromLocal data
$ hadoop fs -ls data
```
#### Run K-means

```
```bash
$ cd examples/kmeans
$ ./build.sh
$ ./run.sh
Expand All @@ -119,45 +119,27 @@ We use [Apache Maven](https://maven.apache.org/) to manage and build source code
* JDK 8.0+
* Apache Maven 3.6.2+
* GNU GCC 4.8.5+
* Intel® oneAPI Toolkits 2021.3.0 Components:
* Intel® oneAPI Base Toolkit (>=2021.4.0) Components :
- DPC++/C++ Compiler (dpcpp/clang++)
- Data Analytics Library (oneDAL)
- Threading Building Blocks (oneTBB)
* [Open Source Intel® oneAPI Collective Communications Library (oneCCL)](https://github.com/oneapi-src/oneCCL)

Intel® oneAPI Toolkits and its components can be downloaded and install from [here](https://software.intel.com/content/www/us/en/develop/tools/oneapi.html). Installation process for oneAPI using Package Managers (YUM (DNF), APT, and ZYPPER) is also available. Generally you only need to install oneAPI Base Toolkit for Linux with all or selected components mentioned above. Instead of using oneCCL included in Intel® oneAPI Toolkits, we prefer to build from open source oneCCL to resolve some bugs.
- Collective Communications Library (oneCCL)]

More details about oneAPI can be found [here](https://software.intel.com/content/www/us/en/develop/tools/oneapi.html).
Generally you only need to install __Intel® oneAPI Base Toolkit for Linux__ with all or selected components mentioned above. Intel® oneAPI Base Toolkit can be downloaded and installed from [here](https://software.intel.com/content/www/us/en/develop/tools/oneapi.html). Installation process for oneAPI using Package Managers (YUM (DNF), APT, and ZYPPER) is also available. More details about oneAPI can be found [here](https://software.intel.com/content/www/us/en/develop/tools/oneapi.html).

Scala and Java dependency descriptions are already included in Maven POM file.
Scala and Java dependency descriptions are already included in Maven POM file.

***Note:*** You can refer to [this script](dev/install-build-deps-centos.sh) to install correct dependencies: DPC++/C++, oneDAL, oneTBB, oneCCL.

### Build

#### Building oneCCL

To clone and build from open source oneCCL, run the following commands:
```
$ git clone https://github.com/oneapi-src/oneCCL
$ cd oneCCL
$ git checkout 2021.2.1
$ mkdir build && cd build
$ cmake ..
$ make -j install
```

The generated files will be placed in `/your/oneCCL_source_code/build/_install`

#### Building OAP MLlib

To clone and checkout source code, run the following commands:
```
$ git clone https://github.com/oap-project/oap-mllib.git
```bash
$ git clone https://github.com/oap-project/oap-mllib.git
```
__Optional__ to checkout specific release branch:
```
$ cd oap-mllib && git checkout ${version}
```bash
$ cd oap-mllib && git checkout ${version}
```

We rely on environment variables to find required toolchains and libraries. Please make sure the following environment variables are set for building:
Expand All @@ -171,25 +153,22 @@ CCL_ROOT | Path to oneCCL home directory

We suggest you to source `setvars.sh` script into current shell to setup building environments as following:

```
```bash
$ source /opt/intel/oneapi/setvars.sh
$ source /your/oneCCL_source_code/build/_install/env/setvars.sh
```

__Be noticed we are using our own built oneCCL instead, we should source oneCCL's `setvars.sh` to overwrite oneAPI one.__

You can also refer to [this CI script](dev/ci-test.sh) to setup the building environments.

If you prefer to buid your own open source [oneDAL](https://github.com/oneapi-src/oneDAL), [oneTBB](https://github.com/oneapi-src/oneTBB) versions rather than use the ones included in oneAPI TookKits, you can refer to the related build instructions and manually source `setvars.sh` accordingly.
If you prefer to buid your own open source [oneDAL](https://github.com/oneapi-src/oneDAL), [oneTBB](https://github.com/oneapi-src/oneTBB), [oneCCL](https://github.com/oneapi-src/oneCCL) versions rather than use the ones included in oneAPI Base Toolkit, you can refer to the related build instructions and manually source `setvars.sh` accordingly.

To build, run the following commands:
```
To build, run the following commands:
```bash
$ cd mllib-dal
$ ./build.sh
```

If no parameter is given, the Spark version __3.1.1__ will be activated by default. You can also specify a different Spark version with option `-p spark-x.x.x`. For example:
```
```bash
$ ./build.sh -p spark-3.0.0
```

Expand All @@ -206,6 +185,7 @@ pca | PCA example for Scala
als | ALS example for Scala
naive-bayes | Naive Bayes example for Scala
linear-regression | Linear Regression example for Scala
correlation | Correlation example for Scala

### Python Examples

Expand All @@ -217,12 +197,11 @@ als-pyspark | ALS example for PySpark

## List of Accelerated Algorithms

Algorithm | Category | Maturity
------------------|----------|-------------
K-Means | CPU | Stable
K-Means | GPU | Experimental
PCA | CPU | Stable
PCA | GPU | Experimental
ALS | CPU | Stable
Naive Bayes | CPU | Experimental
Linear Regression | CPU | Experimental
Algorithm | CPU | GPU | Maturity
------------------|-----|-----|---------
K-Means | X | X | Stable
PCA | X | X | Stable
ALS | X | | Experimental
Naive Bayes | X | | Stable
Linear Regression | X | | Stable
Correlation | X | X | Experimental
1 change: 1 addition & 0 deletions RELEASE
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
OAP_MLLIB_VERSION=1.2.0
7 changes: 4 additions & 3 deletions conf/env.sh.template
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,6 @@

# ============== Minimum Settings ============= #

# Set OAP MLlib version (e.g. 1.1.0)
OAP_MLLIB_VERSION=x.x.x
# Set Spark master
SPARK_MASTER=yarn
# Set Hadoop home path
Expand All @@ -17,6 +15,9 @@ export OAP_MLLIB_ROOT=/path/to/oap-mllib/home

# ============================================= #

# Import RELEASE envs
source $OAP_MLLIB_ROOT/RELEASE

# Set HADOOP_CONF_DIR for Spark
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

Expand All @@ -42,7 +43,7 @@ SPARK_TOTAL_CORES=$((SPARK_NUM_EXECUTORS * SPARK_EXECUTOR_CORES))
SPARK_DEFAULT_PARALLELISM=$((SPARK_TOTAL_CORES * 2))

# Checks
for dir in $SPARK_HOME $HADOOP_HOME $OAP_MLLIB_JAR
for dir in $SPARK_HOME $HADOOP_HOME $OAP_MLLIB_JAR
do
if [[ ! -e $dir ]]; then
echo $dir does not exist!
Expand Down
30 changes: 30 additions & 0 deletions dev/build-maven-local-repo.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
#!/usr/bin/env bash

if [[ -z $DAALROOT ]]; then
echo DAALROOT not defined!
exit 1
fi

echo "Building Maven Repo for oneDAL ..."

mkdir maven-repository
mvn deploy:deploy-file -Dfile=$DAALROOT/lib/onedal.jar -DgroupId=com.intel.onedal -Dversion=2021.4.0 -Dpackaging=jar -Durl=file:./maven-repository -DrepositoryId=maven-repository -DupdateReleaseInfo=true

echo "DONE"

find ./maven-repository

# Add the following into pom.xml:

# <repositories>
# <repository>
# <id>maven-repository</id>
# <url>file:///${project.basedir}/maven-repository</url>
# </repository>
# </repositories>

# <dependency>
# <groupId>com.intel.dal</groupId>
# <artifactId>dal</artifactId>
# <version>2021.4.0</version>
# </dependency>
16 changes: 14 additions & 2 deletions dev/ci-test.sh
Original file line number Diff line number Diff line change
@@ -1,18 +1,30 @@
#!/usr/bin/env bash

# exit when any command fails
set -e

# keep track of the last executed command
trap 'last_command=$current_command; current_command=$BASH_COMMAND' DEBUG
# echo an error message before exiting
trap 'echo "\"${last_command}\" command filed with exit code $?."' EXIT

# Setup building envs
source /opt/intel/oneapi/setvars.sh
source /tmp/oneCCL/build/_install/env/setvars.sh

SupportedSparkVersions=("spark-3.0.0" "spark-3.0.1" "spark-3.0.2" "spark-3.1.1")
# Prepare lib resources
cd $GITHUB_WORKSPACE/mllib-dal
../dev/prepare-build-deps.sh

# Test for all versions
SupportedSparkVersions=("spark-3.0.0" "spark-3.0.1" "spark-3.0.2" "spark-3.1.1")
for SparkVer in ${SupportedSparkVersions[*]}; do
echo
echo "========================================"
echo "Testing with Spark Version: $SparkVer"
echo "========================================"
echo
cd $GITHUB_WORKSPACE/mllib-dal
./build.sh -q
./test.sh -q -p $SparkVer
done

Expand Down
12 changes: 1 addition & 11 deletions dev/install-build-deps-centos.sh
Original file line number Diff line number Diff line change
Expand Up @@ -15,17 +15,7 @@ EOF
sudo mv /tmp/oneAPI.repo /etc/yum.repos.d
# sudo yum groupinstall -y "Development Tools"
# sudo yum install -y cmake
sudo yum install -y intel-oneapi-dpcpp-cpp-2021.3.0 intel-oneapi-dal-devel-2021.3.0 intel-oneapi-tbb-devel-2021.3.0
sudo yum install -y intel-oneapi-dpcpp-cpp-2021.4.0 intel-oneapi-dal-devel-2021.4.0 intel-oneapi-tbb-devel-2021.4.0 intel-oneapi-ccl-devel-2021.4.0 intel-oneapi-mpi-devel-2021.4.0
else
echo "oneAPI components already installed!"
fi

echo "Building oneCCL ..."
cd /tmp
rm -rf oneCCL
git clone https://github.com/oneapi-src/oneCCL
cd oneCCL
git checkout 2021.2.1
mkdir build && cd build
cmake ..
make -j 2 install
12 changes: 1 addition & 11 deletions dev/install-build-deps-ubuntu.sh
Original file line number Diff line number Diff line change
Expand Up @@ -9,17 +9,7 @@ if [ ! -d /opt/intel/oneapi ]; then
echo "deb https://apt.repos.intel.com/oneapi all main" | sudo tee /etc/apt/sources.list.d/oneAPI.list
sudo apt-get update
# sudo apt-get install -y build-essential cmake
sudo apt-get install -y intel-oneapi-dpcpp-cpp-2021.3.0 intel-oneapi-dal-devel-2021.3.0 intel-oneapi-tbb-devel-2021.3.0
sudo apt-get install -y intel-oneapi-dpcpp-cpp-2021.4.0 intel-oneapi-dal-devel-2021.4.0 intel-oneapi-tbb-devel-2021.4.0 intel-oneapi-ccl-devel-2021.4.0 intel-oneapi-mpi-devel-2021.4.0
else
echo "oneAPI components already installed!"
fi

echo "Building oneCCL ..."
cd /tmp
rm -rf oneCCL
git clone https://github.com/oneapi-src/oneCCL
cd oneCCL
git checkout 2021.2.1
mkdir build && cd build
cmake ..
make -j 2 install
Loading

0 comments on commit 0de0516

Please sign in to comment.