[ML-85][Doc] Add explanation for spark.shuffle.reduceLocality.enabled

oap-project · Jul 27, 2021 · eddc4c4 · eddc4c4
1 parent 6739b0e
commit eddc4c4
Showing 1 changed file with 3 additions and 1 deletion.
diff --git a/README.md b/README.md
@@ -77,6 +77,8 @@ spark.executor.extraClassPath     /path/to/oap-mllib-x.x.x.jar
 
 OAP MLlib adopted oneDAL as implementation backend. oneDAL requires enough native memory allocated for each executor. For large dataset, depending on algorithms, you may need to tune `spark.executor.memoryOverhead` to allocate enough native memory. Setting this value to larger than __dataset size / executor number__ is a good starting point.
 
+OAP MLlib expects 1 executor acts as 1 oneCCL rank for compute. As `spark.shuffle.reduceLocality.enabled` option is `true` by default, when the dataset is not evenly distributed accross executors, this option may result in assigning more than 1 rank to single executor and task failing. The error could be fixed by setting `spark.shuffle.reduceLocality.enabled` to `false`.
+
 ### Sanity Check
 
 #### Setup `env.sh`
@@ -218,4 +220,4 @@ K-Means           | CPU, GPU | Experimental
 PCA               | CPU      | Experimental
 ALS               | CPU      | Experimental
 Naive Bayes       | CPU      | Experimental
-Linear Regression | CPU      | Experimental
+Linear Regression | CPU      | Experimental