Skip to content

Commit

Permalink
Add RF Classifier predict accuracy issue as a known limitation in docs (
Browse files Browse the repository at this point in the history
#3776)

This small PR adds details regarding accuracy issue detailed [here](#3764) as a known limitation for users of Random Forest Classifier.

Authors:
  - Venkat (https://github.com/venkywonka)

Approvers:
  - Philip Hyunsu Cho (https://github.com/hcho3)
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: #3776
  • Loading branch information
venkywonka authored Apr 22, 2021
1 parent d4d2a81 commit 0a21e87
Showing 1 changed file with 13 additions and 0 deletions.
13 changes: 13 additions & 0 deletions python/cuml/ensemble/randomforestclassifier.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -145,6 +145,19 @@ class RandomForestClassifier(BaseRandomForestModel,
reduce memory consumption.
* While training the model for multi class classification problems,
using deep trees or `max_features=1.0` provides better performance.
* Prediction of classes is currently different from how scikit-learn
predicts:
* scikit-learn predicts random forest classifiers by obtaining class
probabilities from each component tree, then averaging these class
probabilities over all the ensemble members, and finally resolving
to the label with highest probability as prediction.
* cuml random forest classifier prediction differs in that, each
component tree generates labels instead of class probabilities;
with the most frequent label over all the trees (the statistical
mode) resolved as prediction.
The above differences might cause marginal variations in accuracy in
tradeoff to better performance.
See: https://github.com/rapidsai/cuml/issues/3764
Examples
--------
Expand Down

0 comments on commit 0a21e87

Please sign in to comment.