Skip to content

Commit

Permalink
[ML-85][Doc] Add explanation for spark.shuffle.reduceLocality.enabled
Browse files Browse the repository at this point in the history
  • Loading branch information
xwu99 authored Jul 27, 2021
1 parent 6739b0e commit eddc4c4
Showing 1 changed file with 3 additions and 1 deletion.
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,8 @@ spark.executor.extraClassPath /path/to/oap-mllib-x.x.x.jar

OAP MLlib adopted oneDAL as implementation backend. oneDAL requires enough native memory allocated for each executor. For large dataset, depending on algorithms, you may need to tune `spark.executor.memoryOverhead` to allocate enough native memory. Setting this value to larger than __dataset size / executor number__ is a good starting point.

OAP MLlib expects 1 executor acts as 1 oneCCL rank for compute. As `spark.shuffle.reduceLocality.enabled` option is `true` by default, when the dataset is not evenly distributed accross executors, this option may result in assigning more than 1 rank to single executor and task failing. The error could be fixed by setting `spark.shuffle.reduceLocality.enabled` to `false`.

### Sanity Check

#### Setup `env.sh`
Expand Down Expand Up @@ -218,4 +220,4 @@ K-Means | CPU, GPU | Experimental
PCA | CPU | Experimental
ALS | CPU | Experimental
Naive Bayes | CPU | Experimental
Linear Regression | CPU | Experimental
Linear Regression | CPU | Experimental

0 comments on commit eddc4c4

Please sign in to comment.