Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster model import for sklearn tree models #264

Merged
merged 20 commits into from
Apr 28, 2021

Conversation

hcho3
Copy link
Collaborator

@hcho3 hcho3 commented Apr 10, 2021

Closes #263

Implement treelite.sklearn.import_model_v2, a faster method for importing sklearn models. The method is much faster than its predecessor (treelite.sklearn.import_model) because:

  • The new method eschews the Model Builder API and builds a treelite::Tree object directly in C++.
  • The new method avoids the use of recursive function calls when parsing the sklearn tree model. Instead, it takes in 7 flat arrays per tree.

Preliminary result: The following code snippet completes in 1.1 sec. Since the same model would take 8 hours to load using the previous method, we have achieved 26122x speedup.

import time
import ctypes
import numpy as np
from sklearn.datasets import make_regression
from sklearn.ensemble import RandomForestRegressor
from treelite.sklearn import import_model_v2

X, y = make_regression(n_samples=100000, n_features=100, n_informative=100)
clf = RandomForestRegressor(n_estimators=100, n_jobs=-1)
clf.fit(X, y)

tstart = time.perf_counter()
import_model_v2(clf)
tend = time.perf_counter()

print(f'Time elapsed = {tend - tstart} sec')

TODOs:

  • Implement RandomForestRegressor
  • Implement RandomForestClassifier
  • Implement GradientBoostingRegressor
  • Implement GradientBoostingClassifier
  • Implement ExtraTreesRegressor
  • Implement ExtraTreesClassifier
  • Add tests

cc @wphicks @canonizer @JohnZed

@dmlc dmlc deleted a comment from codecov-io Apr 10, 2021
src/frontend/sklearn.cc Outdated Show resolved Hide resolved
@hcho3 hcho3 changed the title [WIP] Faster model import for sklearn tree models Faster model import for sklearn tree models Apr 21, 2021
Copy link
Contributor

@canonizer canonizer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved.

Possibly for a different pull request: would it be possible to import other Scikit-Learn tree-based models from https://scikit-learn.org/stable/modules/classes.html#module-sklearn.ensemble as well?

include/treelite/c_api.h Show resolved Hide resolved
src/frontend/sklearn.cc Show resolved Hide resolved
python/treelite/sklearn/__init__.py Outdated Show resolved Hide resolved
Copy link
Contributor

@wphicks wphicks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, this looks great! My only significant questions here are around how we're doing type inspection on the Python side.

python/treelite/sklearn/__init__.py Outdated Show resolved Hide resolved
python/treelite/sklearn/__init__.py Outdated Show resolved Hide resolved
python/treelite/sklearn/__init__.py Outdated Show resolved Hide resolved
@codecov-commenter
Copy link

codecov-commenter commented Apr 27, 2021

Codecov Report

Merging #264 (a5b249d) into mainline (0d98548) will increase coverage by 0.56%.
The diff coverage is 97.01%.

Impacted file tree graph

@@              Coverage Diff               @@
##             mainline     #264      +/-   ##
==============================================
+ Coverage       83.62%   84.18%   +0.56%     
  Complexity         44       44              
==============================================
  Files              93       95       +2     
  Lines            6937     7228     +291     
  Branches           42       42              
==============================================
+ Hits             5801     6085     +284     
- Misses           1112     1119       +7     
  Partials           24       24              
Impacted Files Coverage Δ Complexity Δ
include/treelite/frontend.h 90.00% <ø> (ø) 0.00 <0.00> (ø)
python/treelite/sklearn/__init__.py 90.38% <84.21%> (-0.53%) 0.00 <0.00> (ø)
python/treelite/sklearn/importer.py 92.20% <92.20%> (ø) 0.00 <0.00> (?)
src/c_api/c_api.cc 96.32% <100.00%> (+0.42%) 0.00 <0.00> (ø)
src/frontend/sklearn.cc 100.00% <100.00%> (ø) 0.00 <0.00> (?)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0d98548...a5b249d. Read the comment docs.

@hcho3 hcho3 merged commit 7c5e38d into dmlc:mainline Apr 28, 2021
@hcho3 hcho3 deleted the sklearn_import_model_v2 branch April 28, 2021 19:08
This was referenced May 4, 2021
rapids-bot bot pushed a commit to rapidsai/cuml that referenced this pull request May 14, 2021
Upgrade to Treelite 1.3.0 to take advantage of the following new features:

* Faster model import for scikit-learn tree models (dmlc/treelite#264). Fixes #3768
* Binary serializer to a file stream (dmlc/treelite#270, dmlc/treelite#273)
* [EXPERIMENTAL] Add GTIL, reference inference backend (dmlc/treelite#274)

Make progress towards #3853

Depends on rapidsai/integration#270

Authors:
  - Philip Hyunsu Cho (https://github.com/hcho3)

Approvers:
  - William Hicks (https://github.com/wphicks)
  - AJ Schmidt (https://github.com/ajschmidt8)
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: #3855
vimarsh6739 pushed a commit to vimarsh6739/cuml that referenced this pull request Oct 9, 2023
Upgrade to Treelite 1.3.0 to take advantage of the following new features:

* Faster model import for scikit-learn tree models (dmlc/treelite#264). Fixes rapidsai#3768
* Binary serializer to a file stream (dmlc/treelite#270, dmlc/treelite#273)
* [EXPERIMENTAL] Add GTIL, reference inference backend (dmlc/treelite#274)

Make progress towards rapidsai#3853

Depends on rapidsai/integration#270

Authors:
  - Philip Hyunsu Cho (https://github.com/hcho3)

Approvers:
  - William Hicks (https://github.com/wphicks)
  - AJ Schmidt (https://github.com/ajschmidt8)
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: rapidsai#3855
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Speed up loading time for sklearn RF models
4 participants