forked from oap-project/cloudtik
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add doc for ml runtime examples and spark optimizations (oap-project#924
- Loading branch information
1 parent
628c3a3
commit fafab98
Showing
10 changed files
with
146 additions
and
36 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
8 changes: 8 additions & 0 deletions
8
docs/source/UserGuide/RunningOptimizedAI/running-ml-examples.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
# Running Machining Examples | ||
|
||
CloudTik provides ready to run examples for demonstrating | ||
how distributed machine learning and deep learning jobs can be implemented | ||
in CloudTik Spark and ML runtime cluster. | ||
|
||
Refer to [Distributed Machine Learning and Deep Learning Examples](https://github.com/oap-project/cloudtik/tree/main/example/ml) | ||
for a detailed step-by-step guide. |
9 changes: 9 additions & 0 deletions
9
docs/source/UserGuide/RunningOptimizedAnalytics/running-analytics-benchmarks.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
# Running Analytics Benchmarks | ||
|
||
CloudTik provides ready to use tools for running TPC-DS benchmark | ||
on a CloudTik spark runtime cluster. | ||
|
||
Refer to [Run TPC-DS performance benchmark for Spark](https://github.com/oap-project/cloudtik/tree/main/tools/benchmarks/spark) | ||
for a detailed step-by-step guide. | ||
|
||
|
40 changes: 40 additions & 0 deletions
40
docs/source/UserGuide/RunningOptimizedAnalytics/spark-optimizations.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
# Spark Optimizations | ||
CloudTik Spark can run on both the up-stream Spark and a CloudTik optimized Spark. | ||
CloudTik optimized Spark implemented quite a few important optimizations upon | ||
the up-stream Spark and thus provides better performance. | ||
|
||
- [Runtime Filter Optimization](#runtime-filter-optimization) | ||
- [Top N Optimization](#top-n-optimization) | ||
- [Size-based Join Reorder Optimization](#size-based-join-reorder-optimization) | ||
- [Distinct Before Intersect Optimization](#distinct-before-intersect-optimization) | ||
- [Flatten Scalar Subquery Optimization](#flatten-scalar-subquery-optimization) | ||
- [Flatten Single Row Aggregate Optimization](#flatten-single-row-aggregate-optimization) | ||
|
||
## Runtime Filter Optimization | ||
Row-level runtime filters can improve the performance of some joins by pre-filtering one side (Filter Application Side) | ||
of a join using a Bloom filter or semi-join filters generated from the values from the other side (Filter Creation Side) of the join. | ||
|
||
## Top N Optimization | ||
For the rank functions (row_number|rank|dense_rank), | ||
the rank of a key computed on partial dataset is always <= its final rank computed on the whole dataset. | ||
It’s safe to discard rows with partial rank > k. Select local top-k records within each partition, | ||
and then compute the global top-k. This can help reduce the shuffle amount. | ||
We introduce a new node RankLimit to filter out unnecessary rows based on rank computed on partial dataset. | ||
We can enable this feature by setting spark.sql.rankLimit.enabled to true. | ||
|
||
## Size-based Join Reorder Optimization | ||
The default behavior in Spark is to join tables from left to right, as listed in the query. | ||
We can improve query performance by reordering joins involving tables with filters. | ||
You can enable this feature by setting the Spark configuration parameter spark.sql.optimizer.sizeBasedJoinReorder.enabled to true. | ||
|
||
## Distinct Before Intersect Optimization | ||
This optimization optimizes joins when using INTERSECT. | ||
Queries using INTERSECT are automatically converted to use a left-semi join. | ||
When this optimization is enabled, the query optimizer will try to estimate whether pushing the DISTINCT operator | ||
to the children of INTERSECT has benefit according to the duplication of data in the left table and the right table. | ||
You can enable it by setting the Spark property spark.sql.optimizer.distinctBeforeIntersect.enabled. | ||
|
||
## Flatten Scalar Subquery Optimization | ||
|
||
## Flatten Single Row Aggregate Optimization | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
Running Optimized AI | ||
============== | ||
|
||
.. toctree:: | ||
:maxdepth: 1 | ||
|
||
RunningOptimizedAI/running-ml-examples.md |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
Running Optimized Analytics with Spark | ||
============== | ||
|
||
.. toctree:: | ||
:maxdepth: 1 | ||
|
||
RunningOptimizedAnalytics/spark-optimizations.md | ||
RunningOptimizedAnalytics/running-analytics-benchmarks.md | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,35 +1,72 @@ | ||
# Run AI Examples on Cloudtik cluster | ||
# Distributed Machine Learning and Deep Learning Examples | ||
|
||
Here we provide a guide for you to run ML/DL related examples based on CloudTik ML runtime | ||
which includes selected ML/DL frameworks and libraries. | ||
|
||
Here we provide a guide for you to run some AI related examples using many popular open-source libraries. | ||
We provide these examples in two forms: | ||
1. Python jobs | ||
2. Jupyter notebooks | ||
|
||
## Running the examples using python jobs | ||
|
||
#### MLflow HyperOpt Scikit_Learn | ||
### Running Spark Distributed Deep Learning example | ||
This example runs Spark distributed deep learning with Hyperopt, Horovod and Tensorflow. | ||
It trains a simple ConvNet on the MNIST dataset using Keras + Horovod using Cloudtik Spark Runtime. | ||
|
||
This notebook uses scikit-learn to illustrate a complete end-to-end example of loading data, training a model, distributed hyperparameter tuning, and model inference. It also illustrates how to use MLflow and Model Registry. | ||
Download [Spark Deep Learning with Horovod and Tensorflow](jobs/spark-mlflow-hyperopt-horovod-tensorflow.py) | ||
and execute: | ||
``` | ||
cloudtik submit /path/to/your-cluster-config.yaml local-download-path/spark-mlflow-hyperopt-horovod-tensorflow.py -f "your-cloud-storage-fsdir" | ||
``` | ||
|
||
1. Uploading example notebook and run | ||
|
||
Upload example notebook [MLflow_HyperOpt_Scikit-Learn](notebooks/mlflow_hyperopt_scikit_learn.ipynb) to JupyterLab or $HOME/runtime/jupyter. | ||
Replace the cloud storage fsdir with the workspace cloud storage uri or hdfs dir. For example S3, "s3a://cloudtik-workspace-bucket" | ||
|
||
2. Open this notebook on JupyterLab, and choose the Python 3 kernel to run the notebook. | ||
|
||
3. Checking the MLflow Server | ||
### Running Spark Distributed Machine Learning example | ||
This example runs Spark distributed training using scikit-learn. | ||
It illustrates a complete end-to-end example of loading data, training a model, distributed hyperparameter tuning, and model inference | ||
under CloudTik Spark cluster. It also illustrates how to use MLflow and model Registry. | ||
|
||
Type the below URL in your browser | ||
Download [Spark Machine Learning with Scikit-Learn](jobs/spark-mlflow-hyperopt-scikit.py) | ||
and execute: | ||
``` | ||
http://<head_IP>:5001 | ||
cloudtik submit /path/to/your-cluster-config.yaml local-download-path/spark-mlflow-hyperopt-scikit.py -f "your-cloud-storage-fsdir" | ||
``` | ||
If the MLflow Server have started, you can see the below UI. | ||
|
||
![MLflowUI](images/MLflowUI.png) | ||
Replace the cloud storage fsdir with the workspace cloud storage uri or hdfs dir. For example S3, "s3a://cloudtik-workspace-bucket" | ||
|
||
|
||
## Running the examples using Jupyter notebooks | ||
|
||
#### Mlflow HyperOpt Horovod on Spark | ||
### Running Spark Distributed Deep Learning example | ||
|
||
This example is to train a simple ConvNet on the MNIST dataset using Keras + Horovod using Cloudtik Spark Runtime. | ||
This notebook example runs Spark distributed deep learning with Hyperopt, Horovod and Tensorflow. | ||
It trains a simple ConvNet on the MNIST dataset using Keras + Horovod using Cloudtik Spark Runtime. | ||
|
||
1. Upload example notebook [Mlflow_HyperOpt_Horovod-on-Spark](notebooks/mlflow_hyperopt_horovod_on_spark.ipynb) to JupyterLab or $HOME/runtime/jupyter. | ||
1. Upload notebook [Spark Deep Learning with Horovod and Tensorflow](notebooks/spark-mlflow-hyperopt-horovod-tensorflow.ipynb) to JupyterLab. | ||
You can also download and cloudtik rsync-up the file to ~/jupyter of cluster head: | ||
|
||
``` | ||
cloudtik rsync-up /path/to/your-cluster-config.yaml local-download-path/spark-mlflow-hyperopt-horovod-tensorflow.ipynb ~/jupyter/spark-mlflow-hyperopt-horovod-tensorflow.ipynb | ||
``` | ||
|
||
2. Open this notebook on JupyterLab, and choose the Python 3 kernel to run the notebook. | ||
|
||
3. Optionally, you can check the training experiments and model registry through MLflow Web UI after the notebook finishes. | ||
|
||
### Running Spark Distributed Machine Learning example | ||
|
||
This notebook example runs Spark distributed training using scikit-learn. | ||
It illustrates a complete end-to-end example of loading data, training a model, distributed hyperparameter tuning, and model inference | ||
under CloudTik Spark cluster. It also illustrates how to use MLflow and model Registry. | ||
|
||
1. Upload notebook [Spark Machine Learning with Scikit-Learn](notebooks/spark-mlflow-hyperopt-scikit.ipynb) to JupyterLab. | ||
You can also download and cloudtik rsync-up the file to ~/jupyter of cluster head: | ||
|
||
``` | ||
cloudtik rsync-up /path/to/your-cluster-config.yaml local-download-path/spark-mlflow-hyperopt-scikit.ipynb ~/jupyter/spark-mlflow-hyperopt-scikit.ipynb | ||
``` | ||
|
||
2. Open this notebook on JupyterLab, and choose the Python 3 kernel to run the notebook. | ||
|
||
2. Open this notebook on JupyterLab, and choose the Python 3 kernel to run the notebook. | ||
3. Optionally, you can check the training experiments and model registry through MLflow Web UI after the notebook finishes. |
Binary file not shown.