Add RF Classifier predict accuracy issue as a known limitation in docs (

#3776) This small PR adds details regarding accuracy issue detailed [here](#3764) as a known limitation for users of Random Forest Classifier. Authors: - Venkat (https://github.com/venkywonka) Approvers: - Philip Hyunsu Cho (https://github.com/hcho3) - Dante Gama Dessavre (https://github.com/dantegd) URL: #3776
rapidsai · Apr 22, 2021 · 0a21e87 · 0a21e87
1 parent d4d2a81
commit 0a21e87
Showing 1 changed file with 13 additions and 0 deletions.
diff --git a/python/cuml/ensemble/randomforestclassifier.pyx b/python/cuml/ensemble/randomforestclassifier.pyx
@@ -145,6 +145,19 @@ class RandomForestClassifier(BaseRandomForestModel,
         reduce memory consumption.
       * While training the model for multi class classification problems,
         using deep trees or `max_features=1.0` provides better performance.
+      * Prediction of classes is currently different from how scikit-learn
+        predicts:
+          * scikit-learn predicts random forest classifiers by obtaining class
+            probabilities from each component tree, then averaging these class
+            probabilities over all the ensemble members, and finally resolving
+            to the label with highest probability as prediction.
+          * cuml random forest classifier prediction differs in that, each
+            component tree generates labels instead of class probabilities;
+            with the most frequent label over all the trees (the statistical
+            mode) resolved as prediction.
+        The above differences might cause marginal variations in accuracy in
+        tradeoff to better performance.
+        See: https://github.com/rapidsai/cuml/issues/3764
 
     Examples
     --------