-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make Timegapsplit dataframe-agnostic #668
Make Timegapsplit dataframe-agnostic #668
Conversation
4eda210
to
25b04f9
Compare
X_train_df = X_index_df[ | ||
(X_index_df["__date__"] >= start_date) & (X_index_df["__date__"] < current_date + self.train_duration) | ||
] | ||
X_valid_df = X_index_df[ | ||
(X_index_df["__date__"] >= current_date + self.train_duration + self.gap_duration) | ||
& ( | ||
X_index_df["__date__"] | ||
< current_date + self.train_duration + self.valid_duration + self.gap_duration | ||
) | ||
] | ||
X_train_df = X_index_df.filter( | ||
nw.col("__date__") >= start_date, nw.col("__date__") < current_date + self.train_duration | ||
) | ||
X_valid_df = X_index_df.filter( | ||
nw.col("__date__") >= current_date + self.train_duration + self.gap_duration, | ||
nw.col("__date__") < current_date + self.train_duration + self.valid_duration + self.gap_duration, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this diff so pleasing 😍
if not date_serie.index.is_unique: | ||
raise ValueError("date_serie doesn't have a unique index") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is checked as part of nw.maybe_align_index
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another very satisfying migration 🙌🏼
I left only one pedantic question 🙃
sklego/model_selection.py
Outdated
- Modin | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is CuDF breaking for some reason? It seems we are not doing any checking on X (check_array
I am looking at you)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, thanks for asking - indeed, cuDF should be in the list, as to_numpy()
is called explicitly here, and they do support that https://docs.rapids.ai/api/cudf/nightly/user_guide/api_docs/api/cudf.dataframe.to_numpy/
it's __array__
(which gets triggered when you call np.asarray
on an object) which they don't support they don't support #665 (comment) . So, indeed, they should be included here
* placeholder to develop narwhals features * feat: make `ColumnDropper` dataframe-agnostic (#655) * feat: make ColumnDropped dataframe-agnostic * use narwhals[polars] in pyproject.toml, link to list of supported libraries * note that narwhals is used for cross-dataframe support * test refactor * docstrings --------- Co-authored-by: FBruzzesi <[email protected]> * feat: make ColumnSelector dataframe-agnostic (#659) * columnselector with test rufformatted * adding whitespace * fixed the fit and transform * removed intendation in examples * font:false * feat: make `add_lags` dataframe-agnostic (#661) * make add_lags dataframe-agnostic * try getting tests to run? * patch: cvxpy 1.5.0 support (#663) --------- Co-authored-by: Francesco Bruzzesi <[email protected]> * Make `RegressionOutlier` dataframe-agnostic (#665) * make regression outlier df-agnostic * need to use eager-only for this one * pass native to check_array * remove cudf, link to check_X_y * feat: Make InformationFilter dataframe-agnostic * Make Timegapsplit dataframe-agnostic (#668) * make timegapsplit dataframe-agnostic * actually, include cuDF * feat: make FairClassifier data-agnostic (#669) * start all over * fixture working * wip * passing tests - again * pre-commit complaining * changed fixture on test_demographic_parity * feat: Make PandasTypeSelector selector dataframe-agnostic (#670) * make pandas dtype selector df-agnostic * bump version * 3.8 compat * Update sklego/preprocessing/pandastransformers.py Co-authored-by: Francesco Bruzzesi <[email protected]> * fixup pyproject.toml * unify (and test!) error message * deprecate * update readme * undo contribution.md change --------- Co-authored-by: Francesco Bruzzesi <[email protected]> * format typeselector and bump version * feat: Make grouped and hierarchical dataframe-agnostic (#667) * feat: make grouped and hierarchical dataframe-agnostic * add pyarrow * narwhals grouped_transformer * grouped transformer eureka * hierarchical narwhalified * so close but so far * return series instead of DataFrame for y * grouped WIP * merge branch and fix grouped * future annotations * format * handling negative indices * solve conflicts * hacking C * fairness: change C values in tests --------- Co-authored-by: Marco Edward Gorelli <[email protected]> Co-authored-by: Magdalena Anopsy <[email protected]> Co-authored-by: Dea María Léon <[email protected]>
Description
This was quite a fun one! As a precursor, I added
nw.maybe_align_index
andnw.maybe_set_index
, so that the automated index alignment can be preserved for pandas users, but Polars can still be supported (so long asdate_serie
andX
/y
are the same length), as discussed hereAnother step towards #658
Type of change
Checklist: