Tools to export and import MLflow runs, experiments or registered models from one tracking server to another.
Some of the reasons to use MLflow Export Import:
- Backup your experiments
- Migrate experiments to another tracking server
- Disaster recovery
There are two ways to run MLflow Export Import:
- As a normal Python package - this page.
- Databricks notebooks.
Note, there is also a secondary "direct copy" feature with documented limitations.
- Export experiments to a directory.
- Import experiments from a directory.
- Export a run to a directory.
- Import a run from a directory or zip file.
- Export a registered model to a directory.
- Import a registered model from a directory.
- List all registered models.
- Nested runs are only supported when you import an experiment. For a run, it is still a TODO.
- The Databricks API does not support exporting or importing notebook revisions. The workspace/export API endpoint only exports a notebook representing the latest revision.
- When you import a run, the link to its source notebook revision ID will appear in the UI but you cannot reach that revision (link is dead).
- For convenience, the export tool exports the latest notebook revision for a notebook-based experiment but again, it cannot be attached to a run when imported. Its stored as an artifact in the "notebooks" folder of the run's artifact root.
- When importing a run or experiment, for open source MLflow you can specify the user owner. For Databricks import you cannot - the owner will be based on the personal access token.
notebook-formats
- If exporting a Databricks experiment, the run's notebook (latest revision, not the revision associated with the run) can be saved in the specified formats (comma-delimited argument). Each format is saved in the notebooks folder of the run's artifact root directory as notebook.{format}
. Supported formats are SOURCE, HTML, JUPYTER and DBC. See Databricks Export Format documentation.
use-src-user-id
- Set the destination user ID to the source user ID. Source user ID is ignored when importing into Databricks since the user is automatically picked up from your Databricks access token.
export-metadata-tags
- Creates metadata tags (starting with mlflow_export_import.metadata
) that contain export information. These are the source mlflow
tags in addition to other information. This is useful for provenance and auditing purposes in regulated industries.
Name Value
mlflow_export_import.metadata.timestamp 1551037752
mlflow_export_import.metadata.timestamp_nice 2019-02-24 19:49:12
mlflow_export_import.metadata.experiment_id 2
mlflow_export_import.metadata.experiment-name sklearn_wine
mlflow_export_import.metadata.run-id 50fa90e751eb4b3f9ba9cef0efe8ea30
mlflow_export_import.metadata.tracking_uri http://localhost:5000
Supports python 3.7.6 or above.
First create a virtual environment.
python -m venv mlflow-export-import
source mlflow-export-import/bin/activate
There are two different ways to install the package.
pip install git+https:///github.com/amesar/mlflow-export-import/#egg=mlflow-export-import
git clone https://github.com/amesar/mlflow-export-import
cd mlflow-export-import
pip install -e .
There are two different ways to install the package.
Install notebook-scoped libraries with %pip.
pip install git+https:///github.com/amesar/mlflow-export-import/#egg=mlflow-export-import
Build the wheel artifact, upload it to DBFS and then install it on your cluster.
python setup.py bdist_wheel
databricks fs cp dist/mlflow_export_import-1.0.0-py3-none-any.whl {MY_DBFS_PATH}
There are two main programs to export experiments:
export_experiment
- exports one experiment.export_experiment_list
- exports a list of experiments.
Both accept either an experiment ID or name.
Export one experiment to a directory.
python -u -m mlflow_export_import.experiment.export_experiment --help
Options:
--experiment TEXT Experiment name or ID. [required]
--output-dir TEXT Output directory. [required]
--export-metadata-tags BOOLEAN Export source run metadata tags. [default: False]
--notebook-formats TEXT Notebook formats. Values are SOURCE, HTML,
JUPYTER or DBC. [default: ]
Export experiment by experiment ID.
python -u -m mlflow_export_import.experiment.export_experiment \
--experiment 2 \
--output-dir out
Export experiment by experiment name.
python -u -m mlflow_export_import.experiment.export_experiment \
--experiment sklearn-wine \
--output-dir out
See Access the MLflow tracking server from outside Databricks.
export MLFLOW_TRACKING_URI=databricks
export DATABRICKS_HOST=https://mycompany.cloud.databricks.com
export DATABRICKS_TOKEN=MY_TOKEN
python -u -m mlflow_export_import.experiment.export_experiment \
--experiment /Users/[email protected]/SklearnWine \
--output-dir out \
--notebook-formats DBC,SOURCE
The output directory contains a manifest file and a subdirectory for each run (by run ID). The run directory contains a run.json (OSS, Databricks), file containing run metadata and an artifact hierarchy.
+-manifest.json
+-441985c7a04b4736921daad29fd4589d/
| +-artifacts/
| +-plot.png
| +-sklearn-model/
| +-model.pkl
| +-conda.yaml
| +-MLmodel
Export several (or all) experiments to a directory.
python -u -m mlflow_export_import.experiment.export_experiment_list --help
--experiments TEXT Experiment names or IDs (comma delimited).
'all' will export all experiments. [required]
--output-dir TEXT Output directory. [required]
--export-metadata-tags BOOLEAN Export source run metadata tags. [default: False]
--notebook-formats TEXT Notebook formats. Values are SOURCE, HTML,
JUPYTER or DBC. [default: ]
Export experiments by experiment ID.
python -u -m mlflow_export_import.experiment.export_experiment_list \
--experiments 2,3 --output-dir out
Export experiments by experiment name.
python -u -m mlflow_export_import.experiment.export_experiment_list \
--experiments sklearn,sparkml --output-dir out
Export all experiments.
python -u -m mlflow_export_import.experiment.export_experiment_list \
--experiments all --output-dir out
The output directory contains a manifest file and a subdirectory for each experiment (by experiment ID).
Each experiment subdirectory in turn contains its own manifest file and a subdirectory for each run. The run directory contains a run.json file containing run metadata and artifact directories.
In the example below we have two experiments - 1 and 7. Experiment 1 (sklearn) has two runs (f4eaa7ddbb7c41148fe03c530d9b486f and 5f80bb7cd0fc40038e0e17abe22b304c) whereas experiment 7 (sparkml) has one run (ffb7f72a8dfb46edb4b11aed21de444b).
+-manifest.json
+-1/
| +-manifest.json
| +-f4eaa7ddbb7c41148fe03c530d9b486f/
| | +-run.json
| | +-artifacts/
| | +-plot.png
| | +-sklearn-model/
| | | +-model.pkl
| | | +-conda.yaml
| | | +-MLmodel
| | +-onnx-model/
| | +-model.onnx
| | +-conda.yaml
| | +-MLmodel
| +-5f80bb7cd0fc40038e0e17abe22b304c/
| | +-run.json
| +-artifacts/
| +-plot.png
| +-sklearn-model/
| | +-model.pkl
| | +-conda.yaml
| | +-MLmodel
| +-onnx-model/
| +-model.onnx
| +-conda.yaml
| +-MLmodel
+-7/
| +-manifest.json
| +-ffb7f72a8dfb46edb4b11aed21de444b/
| | +-run.json
| +-artifacts/
| +-spark-model/
| | +-sparkml/
| | +-stages/
| | +-metadata/
| +-mleap-model/
| +-mleap/
| +-model/
Sample experiment list manifest.json.
{
"info": {
"mlflow_version": "1.11.0",
"mlflow_tracking_uri": "http://localhost:5000",
"export_time": "2020-09-10 20:23:45"
},
"experiments": [
{
"id": "1",
"name": "sklearn"
},
{
"id": "7",
"name": "sparkml"
}
]
}
Sample experiment manifest.json.
{
"experiment": {
"experiment_id": "1",
"name": "sklearn",
"artifact_location": "/opt/mlflow/server/mlruns/1",
"lifecycle_stage": "active"
},
"export_info": {
"export_time": "2020-09-10 20:23:45",
"num_runs": 2
},
"run-ids": [
"f4eaa7ddbb7c41148fe03c530d9b486f",
"f80bb7cd0fc40038e0e17abe22b304c"
],
"failed_run-ids": []
}
Import experiments from a directory. Reads the manifest file to import expirements and their runs.
The experiment will be created if it does not exist in the destination tracking server. If the experiment already exists, the source runs will be added to it.
There are two main programs to import experiments:
- import_experiment - imports one experiment
- import_experiment_list - imports a list of experiments
Imports one experiment.
python -u -m mlflow_export_import.experiment.import_experiment --help \
Options:
--input-dir TEXT Input path - directory [required]
--experiment-name TEXT Destination experiment name [required]
--just-peek BOOLEAN Just display experiment metadata - do not import
--use-src-user-id BOOLEAN Set the destination user ID to the source
user ID. Source user ID is ignored when
importing into Databricks since setting it
is not allowed.
--import-mlflow-tags BOOLEAN Import mlflow tags
--import-metadata-tags BOOLEAN Import mlflow_export_import tags
python -u -m mlflow_export_import.experiment.import_experiment \
--experiment-name imported_sklearn \
--input-dir out
When importing into Databricks MLflow, make sure you set --import-mlflow-tags False
since Databricks does not allow you to set mlflow
tags unlike open source MLflow.
export MLFLOW_TRACKING_URI=databricks
python -u -m mlflow_export_import.experiment.import_experiment \
--experiment-name /Users/[email protected]/imported/SklearnWine \
--input-dir exported_experiments/3532228 \
--import-mlflow-tags False
Import a list of experiments.
python -m mlflow_export_import.experiment.import_experiment_list --help
Options:
--input-dir TEXT Input directory. [required]
--experiment-name-prefix TEXT If specified, added as prefix to experiment name.
--use-src-user-id BOOLEAN Set the destination user ID to the source
user ID. Source user ID is ignored when
importing into Databricks since setting it
is not allowed. [default: False]
--import-mlflow-tags BOOLEAN Import mlflow tags. [default: True]
--import-metadata-tags BOOLEAN Import mlflow_tools tags. [default: False]
python -u -m mlflow_export_import.experiment.import_experiment_list \
--experiment-name-prefix imported_ \
--input-dir out
Export run to directory or zip file.
Usage
python -m mlflow_export_import.run.export_run --help
Options:
--run-id TEXT Run ID. [required]
--output TEXT Output directory or zip file. [required]
--export-metadata-tags BOOLEAN Export source run metadata tags. [default: False]
--notebook-formats TEXT Notebook formats. Values are SOURCE, HTML,
JUPYTER or DBC. [default: ]
Run examples
python -u -m mlflow_export_import.run.export_run \
--run-id 50fa90e751eb4b3f9ba9cef0efe8ea30 \
--output out
python -u -m mlflow_export_import.run.export_run \
--run-id 50fa90e751eb4b3f9ba9cef0efe8ea30 \
--output run.zip
Produces a directory with the following structure:
run.json
artifacts
plot.png
sklearn-model
MLmodel
conda.yaml
model.pkl
Sample run.json: OSS - Databricks.
{
"info": {
"run-id": "50fa90e751eb4b3f9ba9cef0efe8ea30",
"experiment_id": "2",
...
},
"params": {
"max_depth": "16",
"max_leaf_nodes": "32"
},
"metrics": {
"mae": 0.5845562996214364,
"r2": 0.28719674214710467,
},
"tags": {
"mlflow.source.git.commit": "a42b9682074f4f07f1cb2cf26afedee96f357f83",
"mlflow.runName": "demo.sh",
"run_origin": "demo.sh",
"mlflow.source.type": "LOCAL",
"mlflow_export_import.metadata.tracking_uri": "http://localhost:5000",
"mlflow_export_import.metadata.timestamp": 1563572639,
"mlflow_export_import.metadata.timestamp_nice": "2019-07-19 21:43:59",
"mlflow_export_import.metadata.run-id": "130bca8d75e54febb2bfa46875a03d59",
"mlflow_export_import.metadata.experiment_id": "2",
"mlflow_export_import.metadata.experiment-name": "sklearn_wine"
}
}
Imports a run from a directory.
python -m mlflow_export_import.run.import_run --help
Options:
--input TEXT Input path - directory. [required]
--experiment-name TEXT Destination experiment name. [required]
--use-src-user-id BOOLEAN Set the destination user ID to the source
user ID. Source user ID is ignored when
importing into Databricks since setting it
is not allowed. [default: False]
--import-mlflow-tags BOOLEAN Import mlflow tags. [default: True]
--import-metadata-tags BOOLEAN Import mlflow_tools tags. [default: False]
Directory out
is where you exported your run.
python -u -m mlflow_export_import.run.import_run \
--run-id 50fa90e751eb4b3f9ba9cef0efe8ea30 \
--input out \
--experiment-name sklearn_wine_imported
When importing into Databricks MLflow, make sure you set --import-mlflow-tags False
since Databricks does not allow you to set mlflow
tags unlike open source MLflow.
export MLFLOW_TRACKING_URI=databricks
python -u -m mlflow_export_import.run.import_run \
--run-id 50fa90e751eb4b3f9ba9cef0efe8ea30 \
--input out \
--experiment-name /Users/[email protected]/imported/SklearnWine \
--import-mlflow-tags False
Export a registered model to a directory. The default is to export all versions of a model including all None and Archived stages. You can specify a list of stages to export.
Source: export_model.py.
Usage
python -m mlflow_export_import.model.export_model --help
Options:
--model TEXT Registered model name. [required]
--output-dir TEXT Output directory. [required]
--stages TEXT Stages to export (comma seperated). Default is all stages.
--notebook-formats TEXT Notebook formats. Values are SOURCE, HTML,
JUPYTER or DBC. [default: ]
python -u -m mlflow_export_import.model.export_model \
--model sklearn_wine \
--output-dir out \
--stages Production,Staging
Found 6 versions
Exporting version 3 stage 'Production' with run_id 24aa9cce1388474e9f26d17100724cdd to out/24aa9cce1388474e9f26d17100724cdd
Exporting version 5 stage 'Staging' with run_id 8efd80f59b7946119d8f1838515eea25 to out/8efd80f59b7946119d8f1838515eea25
Output export directory example.
+-749930c36dee49b8aeb45ee9cdfe1abb/
| +-artifacts/
| +-plot.png
| +-sklearn-model/
| | +-model.pkl
| | +-conda.yaml
| | +-MLmodel
| |
+-model.json
Sample model.json: OSS - Databricks.
{
"registered_model": {
"name": "sklearn_wine",
"creation_timestamp": "1587517284168",
"last_updated_timestamp": "1587572072601",
"description": "hi my desc",
"latest_versions": [
{
"name": "sklearn_wine",
"version": "1",
"creation_timestamp": "1587517284216",
. . .
Import a registered model from a directory.
Source: import_model.py.
Usage
python -m mlflow_export_import.model.import_model --help
Options:
--input-dir TEXT Input directory produced by export_model.py.
[required]
--model TEXT New registered model name. [required]
--experiment-name TEXT Destination experiment name - will be created
if it does not exist. [required]
--delete-model BOOLEAN First delete the model if it exists and all
its versions. [default: False]
--await-creation-for INTEGER Await creation for specified seconds.
--verbose BOOLEAN Verbose. [default: False]
--help Show this message and exit.
python -u -m mlflow_export_import.model.import_model \
--model sklearn_wine \
--experiment-name sklearn_wine_imported \
--input-dir out \
--delete-model True
Model to import:
Name: sklearn_wine
Description: my model
2 latest versions
Deleting 1 versions for model 'sklearn_wine_imported'
version=2 status=READY stage=Production run-id=f93d5e4d182e4f0aba5493a0fa8d9eb6
Importing latest versions:
Version 1:
current_stage: None:
Run to import:
run-id: 749930c36dee49b8aeb45ee9cdfe1abb
artifact_uri: file:///opt/mlflow/server/mlruns/1/749930c36dee49b8aeb45ee9cdfe1abb/artifacts
source: file:///opt/mlflow/server/mlruns/1/749930c36dee49b8aeb45ee9cdfe1abb/artifacts/sklearn-model
model_path: sklearn-model
run-id: 749930c36dee49b8aeb45ee9cdfe1abb
Importing run into experiment 'scratch' from 'out/749930c36dee49b8aeb45ee9cdfe1abb'
Imported run:
run-id: 03d0cfae60774ec99f949c42e1575532
artifact_uri: file:///opt/mlflow/server/mlruns/13/03d0cfae60774ec99f949c42e1575532/artifacts
source: file:///opt/mlflow/server/mlruns/13/03d0cfae60774ec99f949c42e1575532/artifacts/sklearn-model
Version: id=1 status=READY state=None
Waited 0.01 seconds
Calls the registered-models/list
API endpoint and creates the file registered_models.json
.
python -u -m mlflow_export_import.model.list_registered_models
cat registered_models.json
{
"registered_models": [
{
"name": "keras_mnist",
"creation_timestamp": "1601399113433",
"last_updated_timestamp": "1601399504920",
"latest_versions": [
{
"name": "keras_mnist",
"version": "1",
"creation_timestamp": "1601399113486",
"last_updated_timestamp": "1601399504920",
"current_stage": "Archived",
"description": "",
"source": "file:///opt/mlflow/server/mlruns/1/9176458a78194d819e55247eee7531c3/artifacts/keras-model",
"run_id": "9176458a78194d819e55247eee7531c3",
"status": "READY",
"run_link": ""
},