-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster model import for sklearn tree models #264
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved.
Possibly for a different pull request: would it be possible to import other Scikit-Learn tree-based models from https://scikit-learn.org/stable/modules/classes.html#module-sklearn.ensemble as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, this looks great! My only significant questions here are around how we're doing type inspection on the Python side.
Codecov Report
@@ Coverage Diff @@
## mainline #264 +/- ##
==============================================
+ Coverage 83.62% 84.18% +0.56%
Complexity 44 44
==============================================
Files 93 95 +2
Lines 6937 7228 +291
Branches 42 42
==============================================
+ Hits 5801 6085 +284
- Misses 1112 1119 +7
Partials 24 24
Continue to review full report at Codecov.
|
Upgrade to Treelite 1.3.0 to take advantage of the following new features: * Faster model import for scikit-learn tree models (dmlc/treelite#264). Fixes #3768 * Binary serializer to a file stream (dmlc/treelite#270, dmlc/treelite#273) * [EXPERIMENTAL] Add GTIL, reference inference backend (dmlc/treelite#274) Make progress towards #3853 Depends on rapidsai/integration#270 Authors: - Philip Hyunsu Cho (https://github.com/hcho3) Approvers: - William Hicks (https://github.com/wphicks) - AJ Schmidt (https://github.com/ajschmidt8) - Dante Gama Dessavre (https://github.com/dantegd) URL: #3855
Upgrade to Treelite 1.3.0 to take advantage of the following new features: * Faster model import for scikit-learn tree models (dmlc/treelite#264). Fixes rapidsai#3768 * Binary serializer to a file stream (dmlc/treelite#270, dmlc/treelite#273) * [EXPERIMENTAL] Add GTIL, reference inference backend (dmlc/treelite#274) Make progress towards rapidsai#3853 Depends on rapidsai/integration#270 Authors: - Philip Hyunsu Cho (https://github.com/hcho3) Approvers: - William Hicks (https://github.com/wphicks) - AJ Schmidt (https://github.com/ajschmidt8) - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#3855
Closes #263
Implement
treelite.sklearn.import_model_v2
, a faster method for importing sklearn models. The method is much faster than its predecessor (treelite.sklearn.import_model
) because:treelite::Tree
object directly in C++.Preliminary result: The following code snippet completes in 1.1 sec. Since the same model would take 8 hours to load using the previous method, we have achieved 26122x speedup.
TODOs:
RandomForestRegressor
RandomForestClassifier
GradientBoostingRegressor
GradientBoostingClassifier
ExtraTreesRegressor
ExtraTreesClassifier
cc @wphicks @canonizer @JohnZed