Skip to content
This repository has been archived by the owner on Oct 16, 2024. It is now read-only.

MLEM - DVC interaction update #211

Merged
merged 4 commits into from
Nov 3, 2022
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 30 additions & 19 deletions content/docs/user-guide/dvc.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,17 @@
# Versioning MLEM objects with DVC

To use MLEM with Git and enable GitOps, we need to commit MLEM models to Git
repository. While committing `.mlem` metafiles is easy, model binaries and
datasets are too heavy to store in Git. To fix that, we suggest using
[DVC](https://dvc.org). DVC
[stores objects in remote storages](https://dvc.org/doc/start/data-management/data-versioning),
allowing us to commit just pointers to them.

This page offers a small Tutorial on how to use DVC with already existing MLEM
project. We will reorganize our example repo to showcase that.

## Setting things up

<details>

### ⚙️ Expand for setup instructions
Expand All @@ -10,7 +22,6 @@ If you want to follow along with this tutorial, you can use our
```shell
$ git clone https://github.com/iterative/example-mlem-get-started
$ cd example-mlem-get-started
$ git checkout 5-deploy-meta
```

Next let's create a Python virtual environment to cleanly install all the
Expand All @@ -24,33 +35,38 @@ $ pip install -r requirements.txt

</details>

Often it’s a bad idea to store binary files in Git, especially big ones. To
solve this MLEM can utilize [DVC](https://dvc.org/doc) capabilities to connect
external cloud storage for model and dataset versioning.

We will reorganize our example repo to use DVC.

## Setting up repo

First, let’s initialize DVC and add a remote (we will use a local one for easier
testing, but you can use whatever is available to you):
First, let’s initialize DVC and add a DVC remote (we will use a local one for
easier testing, but you can use whatever is available to you):

```cli
$ dvc init
$ dvc remote add myremote -d /tmp/dvcstore/
$ git add .dvc/config
```

[DVC Initialized](https://github.com/iterative/example-mlem-get-started/tree/7-dvc-dvc-init)

Now, we also need to setup MLEM so it knows to use DVC.

```cli
$ mlem config set core.storage.type dvc
✅ Set `storage.type` to `dvc` in repo .
```

After the initial configuration is done, we need to select how we're going to
use MLEM with DVC:

- We could only use DVCs ability to track binary files, manually adding model
binaries to version control. This scenario is covered in the section
[Versioning binaries manually](#versioning-binaries-manually) below (use this
option if you hear about DVC for the first time).
- We could use
[DVC pipelines](https://dvc.org/doc/start/data-management/data-pipelines) to
manage all stages of model creation (data cleaning, featurization, training,
etc.). In this case, we may want DVC to automatically store binaries. This
case is covered below under
[Using MLEM in DVC Pipeline](#using-mlem-in-dvc-pipeline).

## Versioning binaries manually

Also, let’s add `.mlem` files to `.dvcignore` so that metafiles are ignored by
DVC.

Expand All @@ -65,11 +81,6 @@ Finally, we need to stop Git from keeping already indexed binaries.
$ git rm -r --cached models data
```

[Configured MLEM to work with DVC](https://github.com/iterative/example-mlem-get-started/tree/8-dvc-mlem-config)

## Saving objects

Next, let’s remove artifacts from Git and re-save them, so MLEM can use new
storage for them. You don't need to change a single line of code

Expand Down
35 changes: 35 additions & 0 deletions yarn.lock
Original file line number Diff line number Diff line change
Expand Up @@ -2536,6 +2536,15 @@
"@sentry/utils" "7.13.0"
tslib "^1.9.3"

"@sentry/[email protected]":
version "7.17.3"
resolved "https://registry.yarnpkg.com/@sentry/core/-/core-7.17.3.tgz#2b45c0507f1ef7018335b9bb61ed6b3f16accfad"
integrity sha512-PSboa9aOVnvZU+C6/shKlHUA7zjAl6z5BKRHF8mEljEYql6bh0HfJJKXtBHMz1sWnmzMa/qABSKLpnP5ZQlJNw==
dependencies:
"@sentry/types" "7.17.3"
"@sentry/utils" "7.17.3"
tslib "^1.9.3"

"@sentry/gatsby@^7.13.0":
version "7.13.0"
resolved "https://registry.yarnpkg.com/@sentry/gatsby/-/gatsby-7.13.0.tgz#50965623d43dc437704660a6cbef8ac2927043fc"
Expand All @@ -2556,6 +2565,19 @@
"@sentry/utils" "7.13.0"
tslib "^1.9.3"

"@sentry/node@^7.12.1":
version "7.17.3"
resolved "https://registry.yarnpkg.com/@sentry/node/-/node-7.17.3.tgz#d817e9ca53331b3c192d997ce0e570fb51c1eda9"
integrity sha512-kBmj5GiE0BWQ1CqnJN3bOOmaNNvS+HKb9nPic+QloPnH6xDFVUcmx774s3qjtnyLOQTzPpy3vXCA15rYflNJBQ==
dependencies:
"@sentry/core" "7.17.3"
"@sentry/types" "7.17.3"
"@sentry/utils" "7.17.3"
cookie "^0.4.1"
https-proxy-agent "^5.0.0"
lru_map "^0.3.3"
tslib "^1.9.3"

"@sentry/[email protected]":
version "7.13.0"
resolved "https://registry.yarnpkg.com/@sentry/react/-/react-7.13.0.tgz#64fa5a2b944c977f75626c6208afa3478c13714c"
Expand All @@ -2582,6 +2604,11 @@
resolved "https://registry.yarnpkg.com/@sentry/types/-/types-7.13.0.tgz#398e33e5c92ea0ce91e2c86e3ab003fe00c471a2"
integrity sha512-ttckM1XaeyHRLMdr79wmGA5PFbTGx2jio9DCD/mkEpSfk6OGfqfC7gpwy7BNstDH/VKyQj/lDCJPnwvWqARMoQ==

"@sentry/[email protected]":
version "7.17.3"
resolved "https://registry.yarnpkg.com/@sentry/types/-/types-7.17.3.tgz#ec66ea7b6881ae243255546680722488e7ff23bf"
integrity sha512-+buEJo/4TKErjwF8Tq3XXKFZx4Utpvqs52e7i7Sur2qfyBNwRgBILceQvdnzw86JNZT2myeYmrfVbsaxAk7ilA==

"@sentry/[email protected]":
version "7.13.0"
resolved "https://registry.yarnpkg.com/@sentry/utils/-/utils-7.13.0.tgz#0d47a9278806ece78ba3a83c7dbebce817462759"
Expand All @@ -2590,6 +2617,14 @@
"@sentry/types" "7.13.0"
tslib "^1.9.3"

"@sentry/[email protected]":
version "7.17.3"
resolved "https://registry.yarnpkg.com/@sentry/utils/-/utils-7.17.3.tgz#aafa67ed372f00be2e1bb490fa62d9d2d06a4c2f"
integrity sha512-Sd7BwVn6IClvaXbZaj/LnEcrMm8yjQtZkTVSrM2Vlv1lLeaH61JxSAFU6QntF+f/cCfZ7wSdNhWOfW3qZJ7t3Q==
dependencies:
"@sentry/types" "7.17.3"
tslib "^1.9.3"

"@sentry/[email protected]":
version "1.19.0"
resolved "https://registry.yarnpkg.com/@sentry/webpack-plugin/-/webpack-plugin-1.19.0.tgz#2b134318f1552ba7f3e3f9c83c71a202095f7a44"
Expand Down