-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
patch: scikit-learn 1.6 compatibility #726
Conversation
X = check_array(X, estimator=self, dtype=FLOAT_DTYPES) | ||
_check_n_features(self, X, reset=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you probably want to call validate_data
instead of check_array
, check_X_y
, and _check_n_features
. In sklearn itself, we rarely call these functions. It's almost always validate_data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @adrinjalali ! I will have a deeper look.
If you look at the first 7 commits of this PR, that was exactly how I approached it - i.e. moving from check_array
and check_X_y
to validate_data
. However, many tests were failing (especially for meta estimators), thus I move forward with the safe path of adjusting what we already have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be happy to have a look if you point me to some errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems it was a skill issue of mine! Should be fixed now 😇
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not the biggest fan of a big vendor, but I guess it is the most pragmatic way to go about it for now. I guess time will tell if this becomes a burden but I guess by that time we can just assume folks use scikit-learn >> 1.6.
@FBruzzesi well done! This was a lot of work!
Description
Overseeds #720 by using and adapting sklearn-compat
Tested on python 3.10 and 3.12 with scikit-learn 1.5 and 1.6: all tests are passing these local runs
Tip for review
I tried to carefully commit each module at the time, hence the best way forward would be to review one commit at the time.
With a big disclaimer, at the end I realized that wevalidate_data
was not gonna work in all situations, but an adaptation ofcheck_X_y
would, therefore I rolled back those entirely in another branch which was merged into this oneEdit: as suggested, using
validate_data
is the right way to go, and after some deeper dive, I was able to refactor the codebase using such function