Skip to content
This repository has been archived by the owner on Sep 13, 2023. It is now read-only.

Do the requirements in the .mlem have to be installed in order to build it? #382

Closed
coltonflowers1 opened this issue Aug 18, 2022 · 7 comments · Fixed by #386
Closed

Do the requirements in the .mlem have to be installed in order to build it? #382

coltonflowers1 opened this issue Aug 18, 2022 · 7 comments · Fixed by #386
Labels
bug Something isn't working

Comments

@coltonflowers1
Copy link

coltonflowers1 commented Aug 18, 2022

Using the latest release 0.2.7, I have linked a model from another repo and named it mental_health_model. It has scikit-learn under the requirements section of its .mlem file.

When I run:

mlem build --conf package_name=mental_health_classification --conf target=build/ mental_health_model pip

I get:

⏳️ Loading model from .mlem/link/mental_health_model.mlem
🔗 Loading link to 
https://github.com/trend-community/mentalHealthClassification/tree/master/.mlem/model/classifier.mlem
❌ Unexpected error: No module named 'sklearn'                                                        
Please report it here: <https://github.com/iterative/mlem/issues>

Am I expected to already have the requirements of the linked model installed in order to build this?

@mike0sv
Copy link
Contributor

mike0sv commented Aug 18, 2022

Can you confirm that if you install the module everything works?

@coltonflowers1
Copy link
Author

It does, but interestingly, there are additional dependencies in the .mlem requirements like Pandas that aren't installed when building. This is what's in the requirements section of the .mlem

- module: sklearn
  version: 1.1.1
- module: scipy
  version: 1.8.1
- module: pandas
  version: 1.4.3
- module: numpy
  version: 1.23.1

Could have something to do with the naming discrepancy of module/package name of scikit-learn?

@mike0sv
Copy link
Contributor

mike0sv commented Aug 18, 2022

This happens because building pip package involves model cloning, and cloning involves reading and deserializing metadata. model_type field of model metadata tries to deserialize into SklearnModel, which is a part for mlem.contrib.sklearn extension, which in turn requires sklearn to be available.
This should not happen since model_type is a lazy field (it is deserialized only on direct access) and cloning does not need direct access. I will investigate why it does not work as it should

@mike0sv mike0sv added the bug Something isn't working label Aug 21, 2022
@mike0sv
Copy link
Contributor

mike0sv commented Aug 21, 2022

Having trouble reproducing it, can you install from main and retry?

@coltonflowers1
Copy link
Author

Installed from main. I am using Python 3.10.6, and I created my virtual env with the following setup.py:

from setuptools import setup

setup(
    name="treNLP_model_registry",
    py_modules=[],
    install_requires=[
        "mlem",
        "gto",
        "dvc[s3]==2.9.5"
    ]
)

Not sure if it's relevant but I am using an old release of DVC due to this issue. After running that same command in the initial comment, I do get the same output.

@mike0sv
Copy link
Contributor

mike0sv commented Aug 23, 2022

Can you try this branch?

mike0sv added a commit that referenced this issue Aug 24, 2022
* Don't load model_type on pip build

closes #382

* remove print

* add dvc[s3] test dep
@coltonflowers1
Copy link
Author

That worked!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants