Add Checkpoint Loading from MLflow Model Registry #17618

sam-h-bean · 2022-06-09T04:38:38Z

Feature request

I would like the ability to pass a model URI that points to a model in an MLflow model registry then load a HuggingFace transformer directly from the registry.

Motivation

Model versioning and lifecycle management is a common practice in MLOps so I think it makes sense for this to be a first-class feature in HuggingFace.

Your contribution

I already have this functional but would like to contribute it back to the project so others can leverage the MLflow model registry to maintain their model lifecycles from development to production!

sam-h-bean · 2022-06-17T17:45:23Z

@sgugger this is linked to #17686

The code I have working currently looks like

import os
import mlflow
import glob


def download_from_registry(src_path, dst_path):
    if not os.path.isdir(dst_path):
        os.mkdir(dst_path)

    mlflow.pyfunc.load_model(src_path, dst_path=dst_path)

    return glob.glob(os.path.join(dst_path, "artifacts", "checkpoint-*"))[0]

model_path = download_from_registry("models:/my-model/1", "./my-model/")
model = AutoModelForSequenceClassification.from_pretrained(model_path)

and I'm wondering what you think this would look like contributed to the open-source. The code above only works once you have logged the checkpoint as an artifact and registered that model in the MLflow registry. However, it could be factored to also load the model from an MLflow run I think. wdyt?

sgugger · 2022-06-17T17:58:02Z

Hi @sam-h-bean !

We do not plan on supporting other model repositories than our model Hub for the from_pretrained method in Transformers. You should build a bridge to upload those checkpoints from MLFlow to the Hub and benefit from all the goodies we have such as the inference widget, model cards, community PRs etc. 😃

Your solution also works since we support local checkpoints, and only takes three lines of code as you demonstrated 😉

sam-h-bean · 2022-06-18T14:17:28Z

Hey @sgugger does the Hub have support for private models? These are proprietary models and thus can not be made public. What is the suggest method for cases such as this? It seems like this is a case that will become more prevalent as more companies adopt large language models.

sgugger · 2022-06-20T12:25:02Z

Yes, you can have private models/datasets/spaces on the Hub. See the doc!

github-actions · 2022-07-14T15:02:11Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions bot closed this as completed Jul 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Checkpoint Loading from MLflow Model Registry #17618

Add Checkpoint Loading from MLflow Model Registry #17618

sam-h-bean commented Jun 9, 2022

sam-h-bean commented Jun 17, 2022 •

edited

Loading

sgugger commented Jun 17, 2022 •

edited

Loading

sam-h-bean commented Jun 18, 2022

sgugger commented Jun 20, 2022

github-actions bot commented Jul 14, 2022

Add Checkpoint Loading from MLflow Model Registry #17618

Add Checkpoint Loading from MLflow Model Registry #17618

Comments

sam-h-bean commented Jun 9, 2022

Feature request

Motivation

Your contribution

sam-h-bean commented Jun 17, 2022 • edited Loading

sgugger commented Jun 17, 2022 • edited Loading

sam-h-bean commented Jun 18, 2022

sgugger commented Jun 20, 2022

github-actions bot commented Jul 14, 2022

sam-h-bean commented Jun 17, 2022 •

edited

Loading

sgugger commented Jun 17, 2022 •

edited

Loading