[REVIEW] Sklearn meta-estimators into namespace #3493

viclafargue · 2021-02-12T14:10:13Z

Imports sklearn's Pipeline and GridSearch meta-estimators into cuML namespace for ease-of-use.

wphicks

This looks great! It's a straightforward change that will help make the transition easier for people currently using an sklearn-only workflow.

A few thoughts on this:

We should probably delay merging this until [REVIEW] Ensure global_output_type is thread-safe #3497 is in, since our estimators currently do not play nice with sklearn's GridSearchCV in any sort of multithreaded environment.
We should probably do something to explicitly and unambiguously give the fine folks of scikit-learn credit for these features in our docs. I could imagine doing that a couple different ways. One would be to edit the docstring at import time, adding a line that clarifies where these classes come from. The other would be to wrap them in a class that provides a docstring which links directly to scikit-learn documentation.
Since we're bringing these features into our namespace, do we want to do something to test them as well? I don't think there's any need to duplicate sklearn tests, but perhaps we could explicitly use these features with one of our estimators, which would open up the possibility of detecting bugs like the one addressed by [REVIEW] Ensure global_output_type is thread-safe #3497.

python/cuml/model_selection/__init__.py

python/cuml/pipeline/__init__.py

JohnZed · 2021-02-22T16:54:59Z

Updating the docstrings seems appealing... if we import 4 classes, we can iterate over them and append something vaguely like this:

"""This code is developed and maintained by scikit-learn (insert proper citation here) and imported by cuML to maintain the familiar sklearn.* namespace structure. cuML includes tests to ensure full compatibility of these wrappers with CUDA-based data and cuML estimators, but all of the underlying code is due to the scikit-learn developers."""

JohnZed

Looks great... I just have one question/topic regarding tests.

JohnZed · 2021-02-25T00:04:09Z

python/cuml/model_selection/__init__.py


+GridSearchCV.__doc__ = """


ah, bummer that we have to copy past since these are in separate files (pipeline and model_selection) - if it goes to a third file ever I think we should break it out in a shared way

JohnZed · 2021-02-25T00:14:44Z

python/cuml/test/test_meta_estimators.py

+from cuml.svm import SVC
+
+
+def test_pipeline():


Should we test that every model is compatible? seems like we could do this with about 10 rows of data and make it fast. Could use the same approach as test_pickle to grab at least all regression and classification models pretty easily. This is an open question, so LMK if you think it's a bad idea...
It might be nice also to confirm that output type doesn't get messed up. Basically, predict should preserve the on-gpu-ness of the data even through the pipeline (predict + check output class type). Not sure the scoring is even necessary since this is more a question of whether the pipeline is flowing...

wphicks

This looks good to me. I know you're still tweaking some things, but wanted to go ahead and approve what I've seen so far since I'll be out next week.

JohnZed

Looks great!

JohnZed · 2021-03-02T17:20:05Z

rerun tests

viclafargue · 2021-03-03T17:41:18Z

rerun tests

codecov-io · 2021-03-05T21:04:02Z

Codecov Report

Merging #3493 (79b4a95) into branch-0.19 (6dfff66) will increase coverage by 1.62%.
The diff coverage is 94.95%.

@@               Coverage Diff               @@
##           branch-0.19    #3493      +/-   ##
===============================================
+ Coverage        79.21%   80.84%   +1.62%     
===============================================
  Files              226      228       +2     
  Lines            17900    17742     -158     
===============================================
+ Hits             14180    14343     +163     
+ Misses            3720     3399     -321

Flag	Coverage Δ
dask	`45.30% <70.42%> (+1.55%)`	⬆️
non-dask	`73.10% <94.95%> (+1.62%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
python/cuml/dask/cluster/dbscan.py	`97.29% <ø> (+0.07%)`	⬆️
python/cuml/dask/linear_model/elastic_net.py	`100.00% <ø> (ø)`
python/cuml/datasets/arima.pyx	`97.56% <ø> (ø)`
python/cuml/datasets/blobs.py	`77.27% <ø> (ø)`
python/cuml/datasets/regression.pyx	`98.11% <ø> (ø)`
python/cuml/decomposition/incremental_pca.py	`94.70% <ø> (ø)`
python/cuml/feature_extraction/_tfidf.py	`94.73% <ø> (+0.11%)`	⬆️
python/cuml/metrics/pairwise_distances.pyx	`98.83% <ø> (ø)`
python/cuml/neighbors/__init__.py	`100.00% <ø> (ø)`
python/cuml/preprocessing/LabelEncoder.py	`94.73% <ø> (ø)`
... and 93 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bacb05e...79b4a95. Read the comment docs.

JohnZed · 2021-03-16T16:59:34Z

@gpucibot merge

Sklearn meta-estimator into namespace

d259d95

viclafargue requested a review from a team as a code owner February 12, 2021 14:10

github-actions bot added the Cython / Python Cython or Python issue label Feb 12, 2021

viclafargue added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Feb 12, 2021

viclafargue changed the title ~~[REVIEW] Sklearn meta-estimators into namespace~~ Sklearn meta-estimators into namespace Feb 12, 2021

viclafargue added the 3 - Ready for Review Ready for review by team label Feb 12, 2021

wphicks requested changes Feb 22, 2021

View reviewed changes

python/cuml/model_selection/__init__.py Outdated Show resolved Hide resolved

python/cuml/pipeline/__init__.py Outdated Show resolved Hide resolved

viclafargue added 3 commits February 23, 2021 10:19

Merge branch 'branch-0.19' into sklearn-meta-estimator-into-namespace

2ddb3d1

Update docstring

37e9f5f

Meta-estimators testing

f643055

JohnZed reviewed Feb 25, 2021

View reviewed changes

viclafargue added 2 commits February 25, 2021 13:02

Improve pipeline testing

3a7616c

Update meta-estimator testing

eec2881

wphicks approved these changes Feb 27, 2021

View reviewed changes

JohnZed approved these changes Mar 2, 2021

View reviewed changes

Merge branch 'branch-0.19' into sklearn-meta-estimator-into-namespace

d53f13a

viclafargue force-pushed the sklearn-meta-estimator-into-namespace branch from 8699bad to d53f13a Compare March 3, 2021 08:53

Merge branch 'branch-0.19' into sklearn-meta-estimator-into-namespace

79b4a95

viclafargue changed the title ~~Sklearn meta-estimators into namespace~~ [REVIEW] Sklearn meta-estimators into namespace Mar 5, 2021

rapids-bot bot merged commit 28c3e39 into rapidsai:branch-0.19 Mar 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[REVIEW] Sklearn meta-estimators into namespace #3493

[REVIEW] Sklearn meta-estimators into namespace #3493

viclafargue commented Feb 12, 2021 •

edited by JohnZed

Loading

wphicks left a comment

JohnZed commented Feb 22, 2021

JohnZed left a comment

JohnZed Feb 25, 2021

JohnZed Feb 25, 2021

wphicks left a comment

JohnZed left a comment

JohnZed commented Mar 2, 2021

viclafargue commented Mar 3, 2021

codecov-io commented Mar 5, 2021

JohnZed commented Mar 16, 2021


		GridSearchCV.__doc__ = """

[REVIEW] Sklearn meta-estimators into namespace #3493

[REVIEW] Sklearn meta-estimators into namespace #3493

Conversation

viclafargue commented Feb 12, 2021 • edited by JohnZed Loading

wphicks left a comment

Choose a reason for hiding this comment

JohnZed commented Feb 22, 2021

JohnZed left a comment

Choose a reason for hiding this comment

JohnZed Feb 25, 2021

Choose a reason for hiding this comment

JohnZed Feb 25, 2021

Choose a reason for hiding this comment

wphicks left a comment

Choose a reason for hiding this comment

JohnZed left a comment

Choose a reason for hiding this comment

JohnZed commented Mar 2, 2021

viclafargue commented Mar 3, 2021

codecov-io commented Mar 5, 2021

Codecov Report

JohnZed commented Mar 16, 2021

viclafargue commented Feb 12, 2021 •

edited by JohnZed

Loading