Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding artifacts section to docs #4481

Merged
merged 12 commits into from
Apr 25, 2023
6 changes: 5 additions & 1 deletion content/docs/dvclive/how-it-works.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,13 +81,17 @@ find but that don't clutter your Git history or create extra branches.
### Track large artifacts with DVC

Models and data are often large and aren't easily tracked in Git.
`Live.log_artifact("model.pt")` will
`Live.log_artifact("model.pt", type="model")` will
[cache](/doc/start/data-management/data-versioning) the `model.pt` file with DVC
and make Git ignore it. It will generate a `model.pt.dvc` metadata file, which
can be tracked in Git and becomes part of the experiment. With this metadata
file, you can [retrieve](/doc/start/data-management/data-versioning#retrieving)
the versioned artifact from the Git commit.

Passing `type="model"` or `type="data"` will add it to `artifacts` section of
`dvc.yaml`, allowing DVC to understand what it is and show models in
[Studio Model Registry](/doc/use-cases/model-registry).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still feels a little confusing to me because:

  1. We aren't currently working on any Studio functionality for type=data (let's only talk about plans for things that are at least in progress already).
  2. It still sounds like it only adds to artifacts if type=model or type=data, which isn't true.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, sorry - I thought I fixed that. Doing it now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, did that. It's ready to merge I believe.


### Run with DVC

Experimenting in Python interactively (like in notebooks) is great for
Expand Down
26 changes: 24 additions & 2 deletions content/docs/dvclive/live/log_artifact.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,14 @@
Tracks an existing directory or file with DVC.

```py
def log_artifact(path: Union[str, Path]):
def log_artifact(
path: Union[str, Path],
type: Optional[str] = None,
name: Optional[str] = None,
desc: Optional[str] = None,
labels: Optional[List[str]] = None,
meta: Optional[Dict[str, Any]] = None,
):
```

## Usage
Expand All @@ -16,7 +23,13 @@ from dvclive import Live
Path("model.pt").write_text("weights")

with Live() as live:
live.log_artifact("model.pt")
live.log_artifact(
"model.pt",
type="model",
name="mymodel",
desc="Fine-tuned Resnet50",
labels=["resnet", "imagenet"],
)
```

## Description
Expand All @@ -26,6 +39,15 @@ Uses `dvc add` to track `path` with DVC, generating a `{path}.dvc` file.
When combined with `save_dvc_exp=True`, it will ensure that `{path}.dvc` is
included in the experiment.

If `Live` was initialized with `dvcyaml=True` (which is the default), it will
add an [artifact](/doc/user-guide/project-structure/dvcyaml-files#artifacts) and
all the metadata passed as arguments to corresponding `dvc.yaml`. Passing
aguschin marked this conversation as resolved.
Show resolved Hide resolved
`type="model"` will mark it as a `model` for DVC and will make it appear in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... for DVC and will also make Studio Model Registry support it (coming soon!).

Same as above

[Studio Model Registry](/doc/studio) (coming soon).

If `name` is not provided, the path stem (last part of the path without the file
extension) will be used as the artifact name.

## Parameters

- `path` - existing directory or file
Expand Down
12 changes: 8 additions & 4 deletions content/docs/use-cases/model-registry.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,17 +41,21 @@ See also [Data Registry](/doc/use-cases/data-registry).
</admon>

To make a Git-native registry (on top of DVC or not), one option is to use [GTO]
aguschin marked this conversation as resolved.
Show resolved Hide resolved
(Git Tag Ops). It tags ML model releases and promotions, and links them to
artifacts in the repo using versioned annotations. This creates abstractions for
your models, which lets you **manage their lifecycle** freely and directly from
Git.
(Git Tag Ops). It tags ML model releases and promotions. This creates
aguschin marked this conversation as resolved.
Show resolved Hide resolved
abstractions for your models, which lets you **manage their lifecycle** freely
and directly from Git.

And to **productionize** the models, you can save and package them with the
[MLEM] Python API or CLI, which automagically captures all the context needed to
distribute them. It can store model files on the cloud (by itself or with DVC),
list and transfer them within locations, wrap them as a local REST server, or
even containerize and deploy them to cloud providers!

To let your teams **collaborate** on project more efficiently, you can use
aguschin marked this conversation as resolved.
Show resolved Hide resolved
[Studio](/doc/studio/user-guide/model-registry/what-is-a-model-registry) web
aguschin marked this conversation as resolved.
Show resolved Hide resolved
application to organize, discover, version, promote models to production and
aguschin marked this conversation as resolved.
Show resolved Hide resolved
track their lineage.

This ecosystem of tools from [Iterative](https://iterative.ai/) introduces
[GitOps] into your ML process. This means you can manage and deliver ML models
with software engineering methods such as continuous integration (CI/CD), which
Expand Down
32 changes: 29 additions & 3 deletions content/docs/user-guide/project-structure/dvcyaml-files.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@

You can configure machine learning projects in one or more `dvc.yaml` files. The
list of [`stages`](#stages) is typically the most important part of a `dvc.yaml`
file, though the file can also be used to configure [`metrics`](#metrics),
[`params`](#params), and [`plots`](#plots), either as part of a stage definition
or on their own.
file, though the file can also be used to configure [`artifacts`](#artifacts),
[`metrics`](#metrics), [`params`](#params), and [`plots`](#plots), either as
part of a stage definition or on their own.

`dvc.yaml` uses the [YAML 1.2](https://yaml.org/) format and a human-friendly
schema explained below. We encourage you to get familiar with it so you may
Expand All @@ -13,6 +13,32 @@ modify, write, or generate them by your own means.
`dvc.yaml` files are designed to be small enough so you can easily version them
with Git along with other <abbr>DVC files</abbr> and your project's code.

## Artifacts

This section allows you to declare structured metadata about your artifacts.
Although you can specify artifacts of any `type`, we are in the process of
building a DVC-based [model registry](/doc/use-cases/model-registry) that will
use any artifacts with `type: model`. Specically, they will appear in
aguschin marked this conversation as resolved.
Show resolved Hide resolved
[Studio Model Registry](/doc/studio) (coming soon).
aguschin marked this conversation as resolved.
Show resolved Hide resolved

```yaml
artifacts:
cv-classification: # artifact ID (name)
path: models/resnet.pt
type: model
desc: 'CV classification model, ResNet50'
labels:
- resnet50
- classification
meta:
framework: pytorch
```

Artifact IDs
[must](https://github.com/iterative/dvc/blob/main/dvc/repo/artifacts.py#L16)
aguschin marked this conversation as resolved.
Show resolved Hide resolved
consist of letters and numbers, and use '-' as separator (but not at the start
or end). The first character must be a letter.

## Metrics

The list of `metrics` contains one or more paths to <abbr>metrics</abbr> files.
Expand Down