Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compatible version of scikit-learn should be use for training and application of models #1333

Open
morcuended opened this issue Jan 10, 2025 · 4 comments

Comments

@morcuended
Copy link
Member

From #1326

One needs to use a compatible scikit-learn version to train models and their application to observed data. Most of the available standard models have been produced with scikit-learn 1.2.X, while a fresh lstchain installation might install any 1.X version, thus the incompatibility problem with running DL1 to DL2 step. Fixing the version of scikit-learn to 1.2.X in the lstchain requirements is not the final solution since models can be trained/used with newer scikit-learn versions. They just have to be consistent.

@maxnoe
Copy link
Member

maxnoe commented Jan 10, 2025

I think we should turn the warning by sklearn into an error that explains a bit more, something like this maybe?

from joblib import load
from sklearn import __version__ as sklearn_version
from sklearn.exceptions import InconsistentVersionWarning
import warnings


class WrongSklearnVersion(Exception):

    def __init__(self, path, e):
        msg = (
            f"Model {path} was trained using a different sklearn version"
            f" than is currently installed. Please install the correct sklearn version: {e}"
        )
        super().__init__(msg)


def load_model(path):
    with warnings.catch_warnings():
        warnings.simplefilter("error", InconsistentVersionWarning)

        with open("./model.pkl", "rb") as f:
            try:
                clf = load(f)
                return clf
            except InconsistentVersionWarning as e:
                raise WrongSklearnVersion(path, e) from None

Results in:

WrongSklearnVersion: Model ./model.pkl was trained using a different sklearn version than is currently installed. Please install the correct sklearn version: Trying to unpickle estimator DecisionTreeClassifier from version 1.2.2 when using version 1.4.2. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations

@moralejo
Copy link
Collaborator

I do not understand why fixing the version (to the one used for the recently produced new batch of standard RFs, with different NSB levels) is not a good solution. I do not know what version it is, and can't check now. If for whatever reason someone needs to use non-standard RFs produced with newer version, they can modify their own installation, right?

@maxnoe
Copy link
Member

maxnoe commented Jan 10, 2025

Unfortunately, the "InconsistentVersionWarning" was only introduced after 1.2. in 1.2 the warning is a simple user warning, but it can be matched using the message.

@maxnoe
Copy link
Member

maxnoe commented Jan 10, 2025

they can modify their own installation, right?

That's the point, they cannot. Pip will refuse to install lstchain with a different version of scikit-learn if lstchain itself has a fixed requirement.

OSA or lstmcpipe could/should fix the version for official productions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants