-
-
Notifications
You must be signed in to change notification settings - Fork 256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: ColumnTransformer #315
Conversation
+1 for opening an issue on scikit-learn and discuss your suggestion (or even a pull request). |
If you do a quick sklearn PR it could be part of the 0.20 release. |
Will do quick.
…On Thu, Jul 26, 2018 at 11:21 AM Olivier Grisel ***@***.***> wrote:
If you do a quick sklearn PR it could be part of the 0.20 release.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#315 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIlMN2FOutbEgJOYPUBFs2erbTXH1ks5uKeyWgaJpZM4Vgwlc>
.
|
This lets subclasses re-use more of sklearn.compose._column_transformer. xref dask/dask-ml#315
This lets subclasses re-use more of sklearn.compose._column_transformer. xref dask/dask-ml#315
Scikit-Learn PR at scikit-learn/scikit-learn#11689 This passes locally for me, but won't pass here till dask/dask#3212 and scikit-learn/scikit-learn#11689 are done. |
This lets subclasses re-use more of sklearn.compose._column_transformer. xref dask/dask-ml#315
commit 3f9ba71 Author: Tom Augspurger <[email protected]> Date: Mon Jul 30 14:37:00 2018 -0500 Removed ndarray special casing commit ce632b7 Author: Tom Augspurger <[email protected]> Date: Mon Jul 30 14:20:09 2018 -0500 fix shape commit e570321 Author: Tom Augspurger <[email protected]> Date: Mon Jul 30 14:12:10 2018 -0500 fix shape
commit 764872c Author: Tom Augspurger <[email protected]> Date: Mon Jul 30 14:59:12 2018 -0500 Handle ndarrays gracefully commit 3f9ba71 Author: Tom Augspurger <[email protected]> Date: Mon Jul 30 14:37:00 2018 -0500 Removed ndarray special casing commit ce632b7 Author: Tom Augspurger <[email protected]> Date: Mon Jul 30 14:20:09 2018 -0500 fix shape commit e570321 Author: Tom Augspurger <[email protected]> Date: Mon Jul 30 14:12:10 2018 -0500 fix shape
The upstream PRs are in. Merging later today. |
Ignoring the coverage failure, since coverage isn't run against sklearn dev. |
This PR implements a daskified column transformer.
There are two issues preventing us from just using scikit-learn's
.shape
( Add (lazy) shape property to dataframe and series dask#3212)sklearn.compose.compose._column_transformer._hstack
doesn't handle dask objects (or pandas dataframes). Just sparse objects and ndarrays. The_hstack
implemented here handles arrays (dask or numpy) dataframes (dask or numpy) and sparse matricies.Long-term, it'd be nice to remove this class entirely, but that'll probably require a lot of work upstream (scipy adopting pydata/sparse, NumPy implementing and libraries adopting
__array_function__
).Medium-term,
_hstack
could become a staticmethod onColumnTransformer
. Then this subclass would just override_hstack
, and everything else could be removed.cc @jorisvandenbossche @ogrisel for that last point. Should I open an issue on scikit-learn to discuss that further?