diff --git a/python/cuml/ensemble/randomforestclassifier.pyx b/python/cuml/ensemble/randomforestclassifier.pyx index 78e8fd63b8..1772b9678e 100644 --- a/python/cuml/ensemble/randomforestclassifier.pyx +++ b/python/cuml/ensemble/randomforestclassifier.pyx @@ -145,6 +145,19 @@ class RandomForestClassifier(BaseRandomForestModel, reduce memory consumption. * While training the model for multi class classification problems, using deep trees or `max_features=1.0` provides better performance. + * Prediction of classes is currently different from how scikit-learn + predicts: + * scikit-learn predicts random forest classifiers by obtaining class + probabilities from each component tree, then averaging these class + probabilities over all the ensemble members, and finally resolving + to the label with highest probability as prediction. + * cuml random forest classifier prediction differs in that, each + component tree generates labels instead of class probabilities; + with the most frequent label over all the trees (the statistical + mode) resolved as prediction. + The above differences might cause marginal variations in accuracy in + tradeoff to better performance. + See: https://github.com/rapidsai/cuml/issues/3764 Examples --------