-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make PandasTypeSelector selector dataframe-agnostic #670
Make PandasTypeSelector selector dataframe-agnostic #670
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am very excited about this one! Left a few comments and considerations here and there but I think we are going to merge it soon 😁
pyproject.toml
Outdated
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta" | |||
|
|||
[project] | |||
name = "scikit-lego" | |||
version = "0.8.2" | |||
version = "0.8.13" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was line 23 the intended target?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah i probably shouldn't make commits in a hurry whilst on a train sorry
@@ -173,12 +222,18 @@ def _check_column_names(self, X): | |||
|
|||
|
|||
class PandasTypeSelector(BaseEstimator, TransformerMixin): | |||
"""The `PandasTypeSelector` transformer allows to select columns in a pandas DataFrame based on their type. | |||
"""The `PandasTypeSelector` transformer allows to select columns in a DataFrame based on their type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering its name, we could do the following:
class PandasTypeSelector(BaseEstimator, TransformerMixin):
def __init__(self, include=None, exclude=None):
warn(
"Please use `TypeSelector` instead of `PandasTypeSelector`, `PandasTypeSelector` will be deprecated in future versions",
DeprecationWarning,
)
return TypeSelector(include, exclude)
and then
class TypeSelector(BaseEstimator, TransformerMixin):
...
!!! info "New in version 0.9.0"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, and I think the whole pandastransformers.py
module needs renaming
OK to do it all in one go in a separate PR, so that all the ones in pandastransformers.py
point to the equivalent one in, say, dataframe_transformers.py
?
EDIT: I noticed that this is already exported from sklego.preprocessing
, and that that's the path the examples use. I've renamed and deprecated as part of this PR then
The contribution.md
page still shows PandasTypeSelector
, but that page already looks out-of-date anyway and probably needs a revamp - will address that separately (something about Narwhals probably needs mentioning too, as it's used internally in quite a few places)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we can rename it to have a more intuitive naming path, but as you spotted, it shouldn't matter too much as they are exported into preprocessing
.
except ValueError as e: | ||
raise ValueError("Columns were not equal during fit and transform") from e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup, the last test in tests/test_preprocessing/test_pandastypeselector.py
goes there
I've unified the messages and included the error message in the test
Co-authored-by: Francesco Bruzzesi <[email protected]>
* placeholder to develop narwhals features * feat: make `ColumnDropper` dataframe-agnostic (#655) * feat: make ColumnDropped dataframe-agnostic * use narwhals[polars] in pyproject.toml, link to list of supported libraries * note that narwhals is used for cross-dataframe support * test refactor * docstrings --------- Co-authored-by: FBruzzesi <[email protected]> * feat: make ColumnSelector dataframe-agnostic (#659) * columnselector with test rufformatted * adding whitespace * fixed the fit and transform * removed intendation in examples * font:false * feat: make `add_lags` dataframe-agnostic (#661) * make add_lags dataframe-agnostic * try getting tests to run? * patch: cvxpy 1.5.0 support (#663) --------- Co-authored-by: Francesco Bruzzesi <[email protected]> * Make `RegressionOutlier` dataframe-agnostic (#665) * make regression outlier df-agnostic * need to use eager-only for this one * pass native to check_array * remove cudf, link to check_X_y * feat: Make InformationFilter dataframe-agnostic * Make Timegapsplit dataframe-agnostic (#668) * make timegapsplit dataframe-agnostic * actually, include cuDF * feat: make FairClassifier data-agnostic (#669) * start all over * fixture working * wip * passing tests - again * pre-commit complaining * changed fixture on test_demographic_parity * feat: Make PandasTypeSelector selector dataframe-agnostic (#670) * make pandas dtype selector df-agnostic * bump version * 3.8 compat * Update sklego/preprocessing/pandastransformers.py Co-authored-by: Francesco Bruzzesi <[email protected]> * fixup pyproject.toml * unify (and test!) error message * deprecate * update readme * undo contribution.md change --------- Co-authored-by: Francesco Bruzzesi <[email protected]> * format typeselector and bump version * feat: Make grouped and hierarchical dataframe-agnostic (#667) * feat: make grouped and hierarchical dataframe-agnostic * add pyarrow * narwhals grouped_transformer * grouped transformer eureka * hierarchical narwhalified * so close but so far * return series instead of DataFrame for y * grouped WIP * merge branch and fix grouped * future annotations * format * handling negative indices * solve conflicts * hacking C * fairness: change C values in tests --------- Co-authored-by: Marco Edward Gorelli <[email protected]> Co-authored-by: Magdalena Anopsy <[email protected]> Co-authored-by: Dea María Léon <[email protected]>
Description
Towards #658
Type of change
Checklist: