Skip to content

Commit

Permalink
Merge pull request #2218 from iterative/dvc-20-pre-release-fixes
Browse files Browse the repository at this point in the history
2.0 pre-release blog fixes
  • Loading branch information
dmpetrov authored Feb 19, 2021
2 parents 943e55f + e3e3653 commit 1908fa1
Show file tree
Hide file tree
Showing 2 changed files with 54 additions and 513 deletions.
106 changes: 54 additions & 52 deletions content/blog/2021-02-18-dvc-2-0-pre-release.md
Original file line number Diff line number Diff line change
Expand Up @@ -136,21 +136,25 @@ stages:

## Lightweight ML experiments

DVC uses Git as a foundation for ML experiments. This solid foundation makes
each ML experiment reproducible and accessible from Git history. This Git-based
approach works very well for ML projects with mature ML models when only a few
new experiments per day are running. However, in more active development when
dozens or hundreds of experiments need to be run in a single day, Git creates
overhead - each experiment run requires additional Git commands
`git add/commit`, and comparing all experiments is difficult.
DVC uses Git versioning as the basis for ML experiments. This solid foundation
makes each experiment reproducible and accessible from the project's history.
This Git-based approach works very well for ML projects with mature models when
only a few new experiments per day are run.

We introduce lightweight experiments in DVC 2.0! This is the way of
auto-tracking without any overhead from ML engineers.
However, in more active development when dozens or hundreds of experiments need
to be run in a single day, Git creates overhead β€” each experiment run requires
additional Git commands `git add/commit`, and comparing all experiments is
difficult.

⚠️ Note, ML experiment is an experimental feature in the coming release. It
means the commands might change a bit even after the release.
We introduce lightweight experiments in DVC 2.0! This is how you can auto-track
ML experiments without any overhead from ML engineers.

Run an ML experiment with a new hyperparameter from `params.yaml`:
⚠️ Note, our new ML experiment features (`dvc exp`) are experimental in the
coming release. This means that the commands might change a bit in following
minor releases.

`dvc exp run` can run an ML experiment with a new hyperparameter from
`params.yaml` while `dvc exp diff` shows metrics and params difference:

```dvc
$ dvc exp run --set-param featurize.max_features=3000
Expand Down Expand Up @@ -183,10 +187,10 @@ Reproduced experiment(s): exp-80655
Experiment results have been applied to your workspace.
```

In the examples above, hyperparamters were changed automaticaly by option
`--set-param`. User can make this changes manualy by modifying the file. The
same way _any code or data files can be changed_ and `dvc exp run` will capture
the changes.
In the examples above, hyperparamters were changed with the `--set-param`
option, but you can make these changes by modifying the params file instead. In
fact _any code or data files can be changed_ and `dvc exp run` will capture the
variations.

See all the runs:

Expand All @@ -205,16 +209,16 @@ $ dvc exp show --no-pager --no-timestamp \
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

Under the hood DVC uses Git to store the experiments meta-information.
Straight-forward implementation on top of Git should include branches and
auto-commits in the branches. This approach over-pollutes the branch namespace
very quickly. To avoid this issue, we introduced Git custom references `exps`
the same way as GitHub uses Git custom references `pulls` to track pull
requests. This is an interesting technical topic that deserves a separate blog
post. Below you can see how it works.
Under the hood DVC uses Git to store the experiments meta-information. A
straight-forward implementation would create visible branches and auto-commit in
them, but that approach would over-pollute the branch namespace very quickly. To
avoid this issue, we introduced custom Git references `exps`, the same way as
GitHub uses custom references `pulls` to track pull requests (this is an
interesting technical topic that deserves a separate blog post). Below you can
see how it works.

No artificial branches, only custome references `exps` (do not worry if you
don't understand this part - it is an implementation detail):
No artificial branches, only custom references `exps` (do not worry if you don't
understand this part - it is an implementation detail):

```dvc
$ git branch
Expand Down Expand Up @@ -288,11 +292,11 @@ Adding stage 'train' in 'dvc.yaml'
```

Note, we use `dvc stage add` command instead of `dvc run`. Starting from DVC 2.0
we extracting all stage specific functionality under `dvc stage` unbrella.
`dvc run` is still working but it wll be depricated in the following DVC version
we extracting all stage specific functionality under `dvc stage` umbrella.
`dvc run` is still working but it wll be deprecated in the following DVC version
(most likely in 3.0).

Start the training process and interrupt it after 5 epoches:
Start the training process and interrupt it after 5 epochs:

```dvc
$ dvc exp run
Expand All @@ -313,7 +317,7 @@ $ dvc exp show --no-pager --no-timestamp
┃ Experiment ┃ step ┃ loss ┃ accuracy ┃ val_loss ┃ … ┃ epochs ┃ … ┃
┑━━━━━━━━━━━━━━━╇━━━━━━╇━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇━━━╇━━━━━━━━╇━━━┩
β”‚ workspace β”‚ 4 β”‚ 2.0702 β”‚ 0.30388 β”‚ 2.025 β”‚ … β”‚ 5 β”‚ … β”‚
β”‚ master β”‚ - β”‚ 5 β”‚ 2.1e-07 β”‚ logs β”‚ … β”‚ 0.124 β”‚ … β”‚
β”‚ master β”‚ - β”‚ - β”‚ - β”‚ - β”‚ … β”‚ 5 β”‚ … β”‚
β”‚ β”‚ β•“ exp-e15bc β”‚ 4 β”‚ 2.0702 β”‚ 0.30388 β”‚ 2.025 β”‚ … β”‚ 5 β”‚ … β”‚
β”‚ β”‚ β•Ÿ 5ea8327 β”‚ 4 β”‚ 2.0702 β”‚ 0.30388 β”‚ 2.025 β”‚ … β”‚ 5 β”‚ … β”‚
β”‚ β”‚ β•Ÿ bc0cf02 β”‚ 3 β”‚ 2.1338 β”‚ 0.23988 β”‚ 2.0883 β”‚ … β”‚ 5 β”‚ … β”‚
Expand Down Expand Up @@ -343,14 +347,13 @@ $ dvc exp show --no-pager --no-timestamp
┃ Experiment ┃ step ┃ loss ┃ accuracy ┃ val_loss ┃ … ┃ epochs ┃ … ┃
┑━━━━━━━━━━━━━━━╇━━━━━━╇━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇━━━╇━━━━━━━━╇━━━┩
β”‚ workspace β”‚ 9 β”‚ 1.7845 β”‚ 0.58125 β”‚ 1.7381 β”‚ … β”‚ 5 β”‚ … β”‚
β”‚ master β”‚ - β”‚ 5 β”‚ 2.1e-07 β”‚ logs β”‚ … β”‚ 0.124 β”‚ … β”‚
β”‚ master β”‚ - β”‚ - β”‚ - β”‚ - β”‚ … β”‚ 5 β”‚ … β”‚
β”‚ β”‚ β•“ exp-e15bc β”‚ 9 β”‚ 1.7845 β”‚ 0.58125 β”‚ 1.7381 β”‚ … β”‚ 5 β”‚ … β”‚
β”‚ β”‚ β•Ÿ 205a8d3 β”‚ 9 β”‚ 1.7845 β”‚ 0.58125 β”‚ 1.7381 β”‚ … β”‚ 5 β”‚ … β”‚
β”‚ β”‚ β•Ÿ dd23d96 β”‚ 8 β”‚ 1.8369 β”‚ 0.54173 β”‚ 1.7919 β”‚ … β”‚ 5 β”‚ … β”‚
β”‚ β”‚ β•Ÿ 5bb3a1f β”‚ 7 β”‚ 1.8929 β”‚ 0.49108 β”‚ 1.8474 β”‚ … β”‚ 5 β”‚ … β”‚
β”‚ β”‚ β•Ÿ 6dc5610 β”‚ 6 β”‚ 1.951 β”‚ 0.43433 β”‚ 1.9046 β”‚ … β”‚ 5 β”‚ … β”‚
β”‚ β”‚ β•Ÿ a79cf29 β”‚ 5 β”‚ 2.0088 β”‚ 0.36837 β”‚ 1.9637 β”‚ … β”‚ 5 β”‚ … β”‚
β”‚ β”‚ β•Ÿ bf276cf β”‚ 4 β”‚ 2.0702 β”‚ 0.30388 β”‚ 2.025 β”‚ … β”‚ 5 β”‚ … β”‚
β”‚ β”‚ β•Ÿ 5ea8327 β”‚ 4 β”‚ 2.0702 β”‚ 0.30388 β”‚ 2.025 β”‚ … β”‚ 5 β”‚ … β”‚
β”‚ β”‚ β•Ÿ bc0cf02 β”‚ 3 β”‚ 2.1338 β”‚ 0.23988 β”‚ 2.0883 β”‚ … β”‚ 5 β”‚ … β”‚
β”‚ β”‚ β•Ÿ f8cf03f β”‚ 2 β”‚ 2.1989 β”‚ 0.17932 β”‚ 2.1542 β”‚ … β”‚ 5 β”‚ … β”‚
Expand All @@ -359,7 +362,7 @@ $ dvc exp show --no-pager --no-timestamp
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”˜
```

Afrer modifing code, data or params the same process can be resumed. DVC
Afrer modifyng code, data or params the same process can be resumed. DVC
recognizes the change and shows it (see experiment `b363267`):

```dvc
Expand All @@ -375,28 +378,27 @@ $ dvc exp show --no-pager --no-timestamp
┃ Experiment ┃ step ┃ loss ┃ accuracy ┃ val_loss ┃ … ┃ epochs ┃ … ┃
┑━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━╇━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇━━━╇━━━━━━━━╇━━━┩
β”‚ workspace β”‚ 13 β”‚ 1.5841 β”‚ 0.69262 β”‚ 1.5381 β”‚ … β”‚ 15 β”‚ … β”‚
β”‚ master β”‚ - β”‚ 5 β”‚ 2.1e-07 β”‚ logs β”‚ … β”‚ 0.124 β”‚ … β”‚
β”‚ master β”‚ - β”‚ - β”‚ - β”‚ - β”‚ … β”‚ 5 β”‚ … β”‚
β”‚ β”‚ β•“ exp-7ff06 β”‚ 13 β”‚ 1.5841 β”‚ 0.69262 β”‚ 1.5381 β”‚ … β”‚ 15 β”‚ … β”‚
β”‚ β”‚ β•Ÿ 6c62fec β”‚ 12 β”‚ 1.6325 β”‚ 0.67248 β”‚ 1.5857 β”‚ … β”‚ 15 β”‚ … β”‚
β”‚ β”‚ β•Ÿ 4baca3c β”‚ 11 β”‚ 1.6817 β”‚ 0.64855 β”‚ 1.6349 β”‚ … β”‚ 15 β”‚ … β”‚
β”‚ β”‚ β•Ÿ b363267 (2b06de7) β”‚ 10 β”‚ 1.7323 β”‚ 0.61925 β”‚ 1.6857 β”‚ … β”‚ 15 β”‚ … β”‚
β”‚ β”‚ β•“ 2b06de7 β”‚ - β”‚ - β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β•Ÿ 205a8d3 β”‚ - β”‚ - β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β•Ÿ dd23d96 β”‚ - β”‚ - β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β•Ÿ 5bb3a1f β”‚ - β”‚ - β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β•Ÿ 6dc5610 β”‚ - β”‚ - β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β•Ÿ a79cf29 β”‚ - β”‚ - β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β•Ÿ bf276cf β”‚ - β”‚ - β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β•Ÿ 5ea8327 β”‚ - β”‚ - β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β•Ÿ bc0cf02 β”‚ - β”‚ - β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β•Ÿ f8cf03f β”‚ - β”‚ - β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β•Ÿ 7575a44 β”‚ - β”‚ - β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”œβ”€β•¨ a72c526 β”‚ - β”‚ - β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β•“ 2b06de7 β”‚ 9 β”‚ 1.7845 β”‚ 0.58125 β”‚ 1.7381 β”‚ … β”‚ 5 β”‚ … β”‚
β”‚ β”‚ β•Ÿ 205a8d3 β”‚ 9 β”‚ 1.7845 β”‚ 0.58125 β”‚ 1.7381 β”‚ … β”‚ 5 β”‚ … β”‚
β”‚ β”‚ β•Ÿ dd23d96 β”‚ 8 β”‚ 1.8369 β”‚ 0.54173 β”‚ 1.7919 β”‚ … β”‚ 5 β”‚ … β”‚
β”‚ β”‚ β•Ÿ 5bb3a1f β”‚ 7 β”‚ 1.8929 β”‚ 0.49108 β”‚ 1.8474 β”‚ … β”‚ 5 β”‚ … β”‚
β”‚ β”‚ β•Ÿ 6dc5610 β”‚ 6 β”‚ 1.951 β”‚ 0.43433 β”‚ 1.9046 β”‚ … β”‚ 5 β”‚ … β”‚
β”‚ β”‚ β•Ÿ a79cf29 β”‚ 5 β”‚ 2.0088 β”‚ 0.36837 β”‚ 1.9637 β”‚ … β”‚ 5 β”‚ … β”‚
β”‚ β”‚ β•Ÿ 5ea8327 β”‚ 4 β”‚ 2.0702 β”‚ 0.30388 β”‚ 2.025 β”‚ … β”‚ 5 β”‚ … β”‚
β”‚ β”‚ β•Ÿ bc0cf02 β”‚ 3 β”‚ 2.1338 β”‚ 0.23988 β”‚ 2.0883 β”‚ … β”‚ 5 β”‚ … β”‚
β”‚ β”‚ β•Ÿ f8cf03f β”‚ 2 β”‚ 2.1989 β”‚ 0.17932 β”‚ 2.1542 β”‚ … β”‚ 5 β”‚ … β”‚
β”‚ β”‚ β•Ÿ 7575a44 β”‚ 1 β”‚ 2.2694 β”‚ 0.12833 β”‚ 2.223 β”‚ … β”‚ 5 β”‚ … β”‚
β”‚ β”œβ”€β•¨ a72c526 β”‚ 0 β”‚ 2.3416 β”‚ 0.0959 β”‚ 2.2955 β”‚ … β”‚ 5 β”‚ … β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”˜
```

Sometimes you might need training the model from scratch. Reset option removes
the checkpoint file before the traning: `dvc exp run --reset`
Sometimes you might need to train the model from scratch. The reset option
removes the checkpoint file before training: `dvc exp run --reset`.

## Metrics logging

Expand All @@ -408,7 +410,7 @@ for metrics collecting and experiment tracking such as sacred, mlflow, weight
and biases, neptune.ai or other.

With DVC 2.0 we are releasing new open-source library
[DVC-Live](https://github.com/iterative/dvclive) that provide functionality for
[DVC-Live](https://github.com/iterative/dvclive) that provides functionality for
tracking model metrics and organizing metrics in simple text files in a way that
DVC can visualize the metrics with navigation in Git histroy. So, DVC can show
you a metrics difference between current model and a model in `master` or any
Expand Down Expand Up @@ -463,7 +465,7 @@ timestamp step accuracy
```

In addition to the continious metrics files you will see the summary metrics
file and html file with the same file prefix. The summary file conteins the
file and html file with the same file prefix. The summary file contains the
result of the latest epoch:

```dvc
Expand All @@ -477,7 +479,7 @@ $ cat logs.json | python -m json.tool
}
```

The html file contains all the visuals for continious metrics as well as the
The html file contains all the visuals for continuous metrics as well as the
summary metrics in a single page:

![](/uploads/images/2021-02-18/dvclive-html.png)
Expand All @@ -490,8 +492,8 @@ each. So, you can monitor model performance in realtime.
DVC repository is NOT required to use the live metrics functionality from the
above. It works independently from DVC.

DVC repository become usefule when the metrics and plots are commited in your
Git repository and you need navigation around the metrics.
DVC repository become useful when the metrics and plots are commited in your Git
repository and you need navigation around the metrics.

Metrics difference between workspace and the last Git commit:

Expand Down
Loading

0 comments on commit 1908fa1

Please sign in to comment.