-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MAINT use composition in TableVectorizer #675
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PoC, the implementation looks good!
skrub/_table_vectorizer.py
Outdated
@property | ||
def named_transformers_(self): | ||
return self._column_transformer.named_transformers_ | ||
|
||
@property | ||
def sparse_output_(self): | ||
return self._column_transformer.sparse_output_ | ||
|
||
@property | ||
def output_indices_(self): | ||
return self._column_transformer.output_indices_ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need tests to check those attributes.
I think you need to merge with main for the tests to run :) |
…ve_table_vectorizer # Conflicts: # skrub/_table_vectorizer.py # skrub/tests/test_table_vectorizer.py
# TODO: _check_feature_names raises a warning when fitting on dataframe | ||
# but transforming on a numpy array. | ||
# In practice, this looks error-prone and we need to discuss | ||
# whether to raise an error instead. | ||
# | ||
# Note that when fitting on a dataframe and transforming on | ||
# the same dataframe with different column names, | ||
# _check_feature_names will raise an error. | ||
self._check_feature_names(X, reset=reset) | ||
feature_names = _get_feature_names(X) | ||
feature_names_in = getattr(self, "feature_names_in_", None) | ||
if feature_names is None and feature_names_in is not None: | ||
X.columns = feature_names_in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following our IRL discussion, @glemaitre
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LeoGrin and @glemaitre this should fix #709! This is very similar to what is done in ColumnTransformer
Merging #592 makes this design more complex. I will probably restart from scratch since we need to handle the split/merge of the transformer that are parallelized. |
Should we close this PR @glemaitre? |
It will be automatically close when #761 will be merged. |
closes #660
This is a POC for #660 that uses internally a
ColumnTransformer
instead of inheriting from it.