Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

live: initial docs draft #2227

Merged
merged 31 commits into from
Mar 9, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
36d7286
live: initial docs draft
pared Feb 22, 2021
cbe791d
live: initial usage section
pared Feb 24, 2021
d13ddf0
dvclive docs: add usage with DVC section
pared Mar 2, 2021
edd4138
Update content/docs/dvclive/index.md
pared Mar 3, 2021
dde7364
fixup
pared Mar 3, 2021
2285569
Update content/docs/dvclive/index.md
pared Mar 3, 2021
4e9148c
Update content/docs/dvclive/index.md
pared Mar 3, 2021
8478818
Update content/docs/dvclive/index.md
pared Mar 3, 2021
0f8b215
review refactor
pared Mar 3, 2021
f17fdf9
prettier complains fix
pared Mar 4, 2021
890d9a1
fixes
pared Mar 4, 2021
280123c
Update content/docs/dvclive/dvclive-with-dvc.md
jorgeorpinel Mar 9, 2021
a534665
Update content/docs/dvclive/dvclive-with-dvc.md
jorgeorpinel Mar 9, 2021
3b37309
Update content/docs/sidebar.json
jorgeorpinel Mar 9, 2021
54d78e0
Update content/docs/sidebar.json
jorgeorpinel Mar 9, 2021
cf7fadb
Update content/docs/dvclive/dvclive-with-dvc.md
jorgeorpinel Mar 9, 2021
c7c0dc1
dvclive: capitalize Dvclive
jorgeorpinel Mar 9, 2021
507d991
Restyled by prettier
restyled-commits Mar 9, 2021
f5e4b3e
Merge pull request #2278 from iterative/restyled/live_docs
jorgeorpinel Mar 9, 2021
a3da8ce
Update content/docs/dvclive/index.md
jorgeorpinel Mar 9, 2021
cd8a729
Update content/docs/dvclive/index.md
jorgeorpinel Mar 9, 2021
8b3dcf4
dvclive: usage copy edits
jorgeorpinel Mar 9, 2021
b4f87c1
Restyled by prettier
restyled-commits Mar 9, 2021
6a46050
Merge pull request #2279 from iterative/restyled/live_docs
jorgeorpinel Mar 9, 2021
7c06934
Update content/docs/dvclive/usage.md
jorgeorpinel Mar 9, 2021
05ae874
Update content/docs/dvclive/usage.md
jorgeorpinel Mar 9, 2021
402ef5b
dvclive: DVC copy edits
jorgeorpinel Mar 9, 2021
b0f5481
Restyled by prettier
restyled-commits Mar 9, 2021
1bc36e8
Merge pull request #2280 from iterative/restyled/live_docs
jorgeorpinel Mar 9, 2021
837ba4a
Update content/docs/dvclive/usage.md
pared Mar 9, 2021
1c60f27
fixup
pared Mar 9, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
125 changes: 125 additions & 0 deletions content/docs/dvclive/dvclive-with-dvc.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
# Dvclive with DVC

Even though Dvclive does not require DVC, they can integrate in several useful
ways.

> In this section we will modify the [basic usage example](/doc/dvclive/usage)
> to see how DVC can cooperate with Dvclive module.

```python
# train.py

from keras.datasets import mnist
from keras.models import Sequential
from keras.layers.core import Dense, Activation
from keras.utils import np_utils

def load_data():
(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train = x_train.reshape(60000, 784)
x_test = x_test.reshape(10000, 784)
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
classes = 10
y_train = np_utils.to_categorical(y_train, classes)
y_test = np_utils.to_categorical(y_test, classes)
return (x_train, y_train), (x_test, y_test)

def get_model():
model = Sequential()
model.add(Dense(512, input_dim=784))
model.add(Activation('relu'))

model.add(Dense(10, input_dim=512))

model.add(Activation('softmax'))

model.compile(loss='categorical_crossentropy',
metrics=['accuracy'], optimizer='sgd')
return model


from keras.callbacks import Callback
import dvclive
Comment on lines +44 to +46
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pared can these be on the top of the file for readability?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, shouldn't classes (class MetricsCallback) be defined before functions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should, I thought that including it just above "execution" part makes it easier to copy-paste for user

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I missed this. I'll send a PR with what I meant and request your review @pared 🙂


class MetricsCallback(Callback):
def on_epoch_end(self, epoch: int, logs: dict = None):
logs = logs or {}
for metric, value in logs.items():
dvclive.log(metric, value)
dvclive.next_step()

(x_train, y_train), (x_test, y_test) = load_data()
model = get_model()

# dvclive.init("training_metrics") # Implicit with DVC
model.fit(x_train,
y_train,
validation_data=(x_test, y_test),
batch_size=128,
epochs=3,
callbacks=[MetricsCallback()])
```

Note that when using Dvclive in a DVC project, there is no need for manual
initialization of Dvclive (no `dvclive.init()` call).

Let's use `dvc stage add` to create a stage to wrap this code (don't forget to
`dvc init` first):

```dvc
$ dvc stage add -n train --live training_metrics
-d train.py python train.py
```

`dvc.yaml` will contain a new `train` stage with the Dvclive
[configuration](/doc/dvclive/usage#initial-configuration) (in the `live` field):

```yaml
stages:
train:
cmd: python train.py
deps:
- train.py
live:
training_metrics:
summary: true
html: true
```

The value passed to `--live` (`training_metrics`) became the directory `path`
for Dvclive to write logs in. Other supported command options for DVC
integration:

- `--live-no-summary` - passes `summary=False` to Dvclive.
- `--live-no-html` - passes `html=False` to Dvclive.
Comment on lines +93 to +98
Copy link
Contributor

@jorgeorpinel jorgeorpinel Mar 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 4042082.


> Note that these are convenience CLI options. You can still use
> `dvclive.init()` manually, which it will override `dvc stage add` flags. Just
> be careful to match the `--live` value (CLI) and `path` argument (code).

Run the training with `dvc repro`:

```bash
$ dvc repro train
```

After that's finished, you should see the following content in the project:

```bash
$ ls
dvc.lock training_metrics training_metrics.json
dvc.yaml training_metrics.html train.py
```

If you open `training_metrics.html` in a browser, you'll see a plot for metrics
logged during the model training!

![](/img/dvclive_report.png)

> Dvclive is capable of creating _checkpoint_ signal files used by
> [experiments](/doc/user-guide/experiment-management). See this example
> [repository](https://github.com/iterative/dvc-checkpoints-mnist) to see how.
18 changes: 18 additions & 0 deletions content/docs/dvclive/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# dvclive

[`dvclive`](/doc/dvclive) is an open-source Python library for monitoring the
progress of metrics during training of machine learning models.

Dvclive integrates seamlessly with [DVC](https://dvc.org/) and the logs it
produces can be fed as `dvc plots`. However, `dvc` is not needed to work with
`dvclive` logs, and since they're saved as easily parsable TSV files, you can
use your preferred visualization method.

We have created Dvclive with two principles in mind:

- **No dependencies.** While you can install optional integrations for various
frameworks, the basic `dvclive` installation doesn't have requirements besides
[Python](https://www.python.org/).
- **DVC integration.** `dvc` recognizes when its being used along with
`dvclive`. This enables useful features automatically, like producing model
training summaries, among others.
144 changes: 144 additions & 0 deletions content/docs/dvclive/usage.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
# Usage Guide

We will use sample [MNIST classification](http://yann.lecun.com/exdb/mnist/)
training code in order to see how one can introduce Dvclive into the workflow.

> Note that [keras](https://keras.io/about/#installation-amp-compatibility) is
> required throughout these examples.

```python
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should update the example repo I have with this example. Keras really makes for a clean, minimal example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but on the other hand we need to prepare Callback, while in torch everything nicely goes into the training loop

# train.py
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved

from keras.datasets import mnist
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
from keras.models import Sequential
from keras.layers.core import Dense, Activation
from keras.utils import np_utils

def load_data():
(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train = x_train.reshape(60000, 784)
x_test = x_test.reshape(10000, 784)
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
classes = 10
y_train = np_utils.to_categorical(y_train, classes)
y_test = np_utils.to_categorical(y_test, classes)
return (x_train, y_train), (x_test, y_test)

def get_model():
model = Sequential()
model.add(Dense(512, input_dim=784))
model.add(Activation('relu'))

model.add(Dense(10, input_dim=512))

model.add(Activation('softmax'))

model.compile(loss='categorical_crossentropy',
metrics=['accuracy'], optimizer='sgd')
return model


(x_train, y_train), (x_test, y_test) = load_data()
model = get_model()

model.fit(x_train,
y_train,
validation_data=(x_test, y_test),
batch_size=128,
epochs=3)
```

> You may want to run the code manually to verify that the model gets trained.

In this example we are training the `model` for 3 epochs. Lets use `dvclive` to
log the `accuracy`, `loss`, `validation_accuracy` and `validation_loss` after
each epoch, so that we can observe how the training progresses.

In order to do that, we will provide a
[`Callback`](https://keras.io/api/callbacks/) for the `fit` method call:

```python
from keras.callbacks import Callback
import dvclive
class MetricsCallback(Callback):
def on_epoch_end(self, epoch: int, logs: dict = None):
logs = logs or {}
for metric, value in logs.items():
dvclive.log(metric, value)
dvclive.next_step()
```

On the end of each epoch, this callback will iterate over the gathered metrics
(`logs`) and use the `dvclive.log()` function to record their respective value.
After that we call `dvclive.next_step()` to signal Dvclive that we are done
logging for the current iteration.

And in order to make that work, we need to plug it in with this change:

```diff
+ dvclive.init("training_metrics")
model.fit(x_train,
y_train,
validation_data=(x_test, y_test),
batch_size=128,
- epochs=3)
+ epochs=3,
+ callbacks=[MetricsCallback()])
```

We call `dvclive.init()` first, which tells Dvclive to write metrics under the
diven directory path (in this case `./training_metrics`).

After running the code, the `training_metrics` should be created:

```bash
$ ls
training_metrics training_metrics.json train.py
```

The `*.tsv` files inside have names corresponding to the metrics logged during
training. Note that a `training_metrics.json` file has been created as well.
It's contains information about latest training step. You can prevent its
creation by sending `summary = False` to `dvclive.init()` (see all the
[options](#initial-configuration)).

```bash
$ ls training_metrics
accuracy.tsv loss.tsv val_accuracy.tsv val_loss.tsv
```

Each file contains metrics values logged in each epoch. For example:

```bash
$ cat training_metrics/accuracy.tsv
timestamp step accuracy
1614129197192 0 0.7612833380699158
1614129198031 1 0.8736833333969116
1614129198848 2 0.8907166719436646
```

## Initial configuration

These are the arguments accepted by `dvclive.init()`:

- `path` (**required**) - directory where `dvclive` will write TSV log files

- `step` (`0` by default) - the `step` values in log files will start
incrementing from this value.

- `resume` (`False`) - if set to `True`, Dvclive will try to read the previous
`step` from the `path` dir and start from that point (unless a `step` is
passed explicitly). Subsequent `next_step()` calls will increment the step.

- `summary` (`True`) - upon each `next_step()` call, Dvclive will dump a JSON
file containing all metrics gathered in the last step. This file uses the
following naming: `<path>.json` (`path` being the logging directory passed to
`init()`).

- `html` (`True`) - works only when Dvclive is used alongside DVC. If true, upon
each `next_step()` call, DVC will prepare summary of the training currently
running, with all metrics logged in `path`.
12 changes: 12 additions & 0 deletions content/docs/sidebar.json
Original file line number Diff line number Diff line change
Expand Up @@ -477,5 +477,17 @@
"slug": "cml-with-npm"
}
]
},
{
"label": "Dvclive",
"slug": "dvclive",
"source": "dvclive/index.md",
"children": [
"usage",
{
"label": "Dvclive with DVC",
"slug": "dvclive-with-dvc"
}
]
}
]
Binary file added static/img/dvclive_report.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.