Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

patch: scikit-learn 1.6 compatibility #726

Merged
merged 11 commits into from
Dec 17, 2024
Merged

Conversation

FBruzzesi
Copy link
Collaborator

@FBruzzesi FBruzzesi commented Dec 16, 2024

Description

Overseeds #720 by using and adapting sklearn-compat

Tested on python 3.10 and 3.12 with scikit-learn 1.5 and 1.6: all tests are passing these local runs

Tip for review

I tried to carefully commit each module at the time, hence the best way forward would be to review one commit at the time.

With a big disclaimer, at the end I realized that we validate_data was not gonna work in all situations, but an adaptation of check_X_y would, therefore I rolled back those entirely in another branch which was merged into this one

Edit: as suggested, using validate_data is the right way to go, and after some deeper dive, I was able to refactor the codebase using such function

sklego/_sklearn_compat.py Outdated Show resolved Hide resolved
Comment on lines 99 to 100
X = check_array(X, estimator=self, dtype=FLOAT_DTYPES)
_check_n_features(self, X, reset=True)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you probably want to call validate_data instead of check_array, check_X_y, and _check_n_features. In sklearn itself, we rarely call these functions. It's almost always validate_data

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @adrinjalali ! I will have a deeper look.

If you look at the first 7 commits of this PR, that was exactly how I approached it - i.e. moving from check_array and check_X_y to validate_data. However, many tests were failing (especially for meta estimators), thus I move forward with the safe path of adjusting what we already have.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be happy to have a look if you point me to some errors.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems it was a skill issue of mine! Should be fixed now 😇

Copy link
Owner

@koaning koaning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not the biggest fan of a big vendor, but I guess it is the most pragmatic way to go about it for now. I guess time will tell if this becomes a burden but I guess by that time we can just assume folks use scikit-learn >> 1.6.

@FBruzzesi well done! This was a lot of work!

@koaning koaning merged commit c17cd27 into main Dec 17, 2024
15 checks passed
@koaning koaning deleted the patch/scikit-learn-16-compat branch December 17, 2024 12:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants