Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

live: initial docs draft #2227

Merged
merged 31 commits into from
Mar 9, 2021
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
36d7286
live: initial docs draft
pared Feb 22, 2021
cbe791d
live: initial usage section
pared Feb 24, 2021
d13ddf0
dvclive docs: add usage with DVC section
pared Mar 2, 2021
edd4138
Update content/docs/dvclive/index.md
pared Mar 3, 2021
dde7364
fixup
pared Mar 3, 2021
2285569
Update content/docs/dvclive/index.md
pared Mar 3, 2021
4e9148c
Update content/docs/dvclive/index.md
pared Mar 3, 2021
8478818
Update content/docs/dvclive/index.md
pared Mar 3, 2021
0f8b215
review refactor
pared Mar 3, 2021
f17fdf9
prettier complains fix
pared Mar 4, 2021
890d9a1
fixes
pared Mar 4, 2021
280123c
Update content/docs/dvclive/dvclive-with-dvc.md
jorgeorpinel Mar 9, 2021
a534665
Update content/docs/dvclive/dvclive-with-dvc.md
jorgeorpinel Mar 9, 2021
3b37309
Update content/docs/sidebar.json
jorgeorpinel Mar 9, 2021
54d78e0
Update content/docs/sidebar.json
jorgeorpinel Mar 9, 2021
cf7fadb
Update content/docs/dvclive/dvclive-with-dvc.md
jorgeorpinel Mar 9, 2021
c7c0dc1
dvclive: capitalize Dvclive
jorgeorpinel Mar 9, 2021
507d991
Restyled by prettier
restyled-commits Mar 9, 2021
f5e4b3e
Merge pull request #2278 from iterative/restyled/live_docs
jorgeorpinel Mar 9, 2021
a3da8ce
Update content/docs/dvclive/index.md
jorgeorpinel Mar 9, 2021
cd8a729
Update content/docs/dvclive/index.md
jorgeorpinel Mar 9, 2021
8b3dcf4
dvclive: usage copy edits
jorgeorpinel Mar 9, 2021
b4f87c1
Restyled by prettier
restyled-commits Mar 9, 2021
6a46050
Merge pull request #2279 from iterative/restyled/live_docs
jorgeorpinel Mar 9, 2021
7c06934
Update content/docs/dvclive/usage.md
jorgeorpinel Mar 9, 2021
05ae874
Update content/docs/dvclive/usage.md
jorgeorpinel Mar 9, 2021
402ef5b
dvclive: DVC copy edits
jorgeorpinel Mar 9, 2021
b0f5481
Restyled by prettier
restyled-commits Mar 9, 2021
1bc36e8
Merge pull request #2280 from iterative/restyled/live_docs
jorgeorpinel Mar 9, 2021
837ba4a
Update content/docs/dvclive/usage.md
pared Mar 9, 2021
1c60f27
fixup
pared Mar 9, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions content/docs/dvclive/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# dvclive

[dvclive](https://cml.dev) is an open-source python library for monitoring
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
[dvclive](https://cml.dev) is an open-source python library for monitoring the
progress of metrics during training of machine learning projects.

This comment was marked as resolved.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

during training of machine learning models

Is this too specific? What's the broader use case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so? Are you asking whether there are non-ML use cases (not likely), or is there something else you think might be too specific?

Copy link
Contributor

@jorgeorpinel jorgeorpinel Mar 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you asking whether there are non-ML use cases (not likely)

Non model-training use cases. E.g. feature extraction or any other stage

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I think it's safe to say that dvclive is a narrower product built specifically for model training.


dvclive is integrated seamlesly with dvc and logs produced by it can be fed to
`dvc plots` command. Even though, one does not need dvc to visualize dvclive
logs, as they are saved into easily parsable tsv format, feel free to apply
custom visualization methods.
pared marked this conversation as resolved.
Show resolved Hide resolved

We have created dvclive with two principles in mind:

- **no dependencies** While you can install optional integrations for various
frameworks, basic dvclive installation does not need anything besides standard
python libs.
- **integration with DVC** DVC is able to recognize when its being used in
tandem with dvclive and is able to provide useful features - like producing
training summary during training.
pared marked this conversation as resolved.
Show resolved Hide resolved
125 changes: 125 additions & 0 deletions content/docs/dvclive/usage-with-dvc.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
# dvclive with DVC

Even though dvclive does not require DVC to function properly, it includes a lot
of integrations with DVC that user might find valuable. In this section we will
modify the [example from previous one](/doc/dvclive/usage) to see how DVC can
cooperate with dvclive.

Let's use the code prepared in previous example and try to make it work with
dvc. Training file `train.py` content:

```python
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mentioned that we might want to link to https://github.com/iterative/dvc-checkpoints-mnist, but alternatively you could incorporate checkpoints in this example. That might be too much to dive right into, but I think most use cases for dvclive with dvc will include checkpoints.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then maybe we need one more section? We integrated the project with DVC and dvclive, lets make some experiments?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the changes to introduce checkpoints would be to write the model out in the callback, and to add the model output to dvc with stage add --checkpoint model .... I don't see checkpoints as necessarily being tied to experiments. It adds versioning of the model output that aligns with the metrics dvclive is tracking.

However, as you mentioned below, let's think about how we can introduce multiple examples to cover all of this. I think the example you have is a great starting point.

from keras.datasets import mnist
from keras.models import Sequential
from keras.layers.core import Dense, Activation
from keras.utils import np_utils

def load_data():
(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train = x_train.reshape(60000, 784)
x_test = x_test.reshape(10000, 784)
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
classes = 10
y_train = np_utils.to_categorical(y_train, classes)
y_test = np_utils.to_categorical(y_test, classes)
return (x_train, y_train), (x_test, y_test)

def get_model():
model = Sequential()
model.add(Dense(512, input_dim=784))
model.add(Activation('relu'))

model.add(Dense(10, input_dim=512))

model.add(Activation('softmax'))

model.compile(loss='categorical_crossentropy',
metrics=['accuracy'], optimizer='sgd')
return model


from keras.callbacks import Callback
import dvclive

class MetricsCallback(Callback):
def on_epoch_end(self, epoch: int, logs: dict = None):
logs = logs or {}
for metric, value in logs.items():
dvclive.log(metric, value)
dvclive.next_step()

(x_train, y_train), (x_test, y_test) = load_data()
model = get_model()

dvclive.init("training_metrics")
model.fit(x_train,
y_train,
validation_data=(x_test, y_test),
batch_size=128,
epochs=3,
callbacks=[MetricsCallback()])
```

DVC provides extensive integration with dvclive. When one is using dvclive in a
project managed by DVC, there is no need for manual initialization of dvclive
inside the code.

So in case of our code we can remove the following line:

```python
dvclive.init("training_metrics")
```

Now, lets use dvc to run the project:

```dvc
$ dvc run -n train --live training_metrics -d train.py python train.py
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to use stage add && repro/exp run here? @jorgeorpinel?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I will introduce that.

```

DVC integration will allow to pass the information that `training_metrics` is
`path` argument for `dvclive.init`. Other supported args for DVC integration:

- `--live-no-summary` - passes `summary=False` into the `dvclive`.
- `--live-no-html` - passes `html=False` into the `dvclive`.

> Note that those `dvc run` params are only convinience methods. If you decide
> to invoke `dvclive.init` manually, the manual call config will override
> provided `run` args. In such case your `path` arg for `dvclive.init` must
> match `--live` argument.
After the training is done you should see following content:

```bash
$ ls

dvc.lock training_metrics training_metrics.json
dvc.yaml training_metrics.html train.py
```

`training_metrics.json` and `training_metrics.html` are there because we did not
provide `--live-no-sumary` nor `--live-no-html`. If you will open
`training_metrics.html` in your browser, you will get plots for metrics logged
during the training.

![](/img/dvclive_report.png)

In `dvc.yaml` there is new stage defined, containing information about the
`dvclive` outputs:

```bash
$ cat dvc.yaml

stages:
train:
cmd: python train.py
deps:
- train.py
live:
training_metrics:
summary: true
html: true
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using stage add above also gives you a reason to move this section up and separate it from the run/repro part and explanation of the dvclive outputs.

160 changes: 160 additions & 0 deletions content/docs/dvclive/usage.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
# Usage
pared marked this conversation as resolved.
Show resolved Hide resolved

We will use sample [MNIST classification](http://yann.lecun.com/exdb/mnist/)
training code in order to see how one can introduce `dvclive` into the workflow.
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
In order to run the example,
[keras](https://keras.io/about/#installation-amp-compatibility) is required.
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved

The training code (`train.py` file):

```python
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should update the example repo I have with this example. Keras really makes for a clean, minimal example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but on the other hand we need to prepare Callback, while in torch everything nicely goes into the training loop

jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers.core import Dense, Activation
from keras.utils import np_utils

def load_data():
(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train = x_train.reshape(60000, 784)
x_test = x_test.reshape(10000, 784)
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
classes = 10
y_train = np_utils.to_categorical(y_train, classes)
y_test = np_utils.to_categorical(y_test, classes)
return (x_train, y_train), (x_test, y_test)

def get_model():
model = Sequential()
model.add(Dense(512, input_dim=784))
model.add(Activation('relu'))

model.add(Dense(10, input_dim=512))

model.add(Activation('softmax'))

model.compile(loss='categorical_crossentropy',
metrics=['accuracy'], optimizer='sgd')
return model


(x_train, y_train), (x_test, y_test) = load_data()
model = get_model()

model.fit(x_train,
y_train,
validation_data=(x_test, y_test),
batch_size=128,
epochs=3)
```

Run the code to verify the training is executing.
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved

In this example we are training the `model` for 3 epochs. Lets use `dvclive` to
log the `accuracy`, `loss`, `validation_accuracy` and `validation_loss` after
each epoch, so that we can observe how our training progresses.
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved

In order to do that, we will need to provide proper
[`Callback`](https://keras.io/api/callbacks/) for `fit` method:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nice, but would it be easier for people getting started to see a manual training loop first rather than a callback? This doesn't need to be a blocker, but maybe something to consider as an enhancement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that if we link your torch example, it will be fine.
As i read manual loop tutorial for keras, it seems to me that we will occlude the main point of the tutorial with step by step execution (pushing data through model, calling optimizer, updating gradients). While in this case we have everything hidden under fit call. Callbacks are not that obvious concept, though I think its easier to understand it than reading through whole training loop.

Still, that makes me think that maybe we need not one example, but more? MNIST classifier targeted for DL frameworks users, and something different for, for example, classic ML practitioners? Maybe something with Iris dataset?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we will need multiple examples. I'm still not sure myself how dvclive would work for "classic" ML or whether that's a useful scenario to consider. For now, let's keep what you have, and if you think there's a natural way to link to the dvc-checkpoints-mnist example, we could have that as an additional use case. Otherwise, we can find a way to add that or a similar example in the future.

jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved

```python
from keras.callbacks import Callback
import dvclive
class MetricsCallback(Callback):
def on_epoch_end(self, epoch: int, logs: dict = None):
logs = logs or {}
for metric, value in logs.items():
dvclive.log(metric, value)
dvclive.next_step()
```

We created callback, that, on the end of each epoch, will iterate over gathered
metrics (`logs`) and use `dvclive.log` function to log their respective value.
After logging the metrics, we call `dvclive.next_step` function to signal
`dvclive` that we are done with metrics logging for current epoch.
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved

In order to make it work with the training code, we need to do one more change,
we need to replace:

```python
model.fit(x_train,
y_train,
validation_data=(x_test, y_test),
batch_size=128,
epochs=10)
```

with:

```python
dvclive.init("training_metrics")
model.fit(x_train,
y_train,
validation_data=(x_test, y_test),
batch_size=128,
epochs=3,
callbacks=[MetricsCallback()])
```
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved

We call `dvclive.init` to tell `dvclive` to write metrics under
`training_metrics` directory. We also provide `callbacks` argument for `fit`
method with newly created callback.
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved

Rerun the code.

After running the code, you can see that `training_metrics` directory has been
created.
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved

```bash
$ ls
training_metrics training_metrics.json train.py
```

`training_metrics` directory contains `*.tsv` files with names respective to
metrics logged during training:
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved

```bash
$ tree training_metrics

training_metrics
├── accuracy.tsv
├── loss.tsv
├── val_accuracy.tsv
└── val_loss.tsv
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
```

Each of the files contains metric values logged in every training step:
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved

```bash
$ cat training_metrics/accuracy.tsv

jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
timestamp step accuracy
1614129197192 0 0.7612833380699158
1614129198031 1 0.8736833333969116
1614129198848 2 0.8907166719436646
```

### Configuring dvclive
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved

Besides `training_metrics `directory, `training_metrics.json` has been created.
It's a file containing information about latest training step. You can prevent
its creation by providing proper `dvclive.init` config flag.

Args supported by `dvclive.init`:

- `path` - directory where `dvclive` will write its outputs
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
- `resume` (`False` by default) - If set to `True`, `dvclive` will try to read
latest `step` from `{path}` dir. Following `next_step` calls will increment
basing on found value.
- `step` (`0`) - If set, the `step` values in logs files will start incrementing
from given value. If provided alongside `resume`, `dvclive` will not try to
find latest `step` in `{path}` and start from `step`.
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
- `summary` (`True`) - upon each `next_step` call `dvclive` will dump a json
file containing all metrics gathered in last step. The json file has the
following name: `{path}.json`.
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
- `html` (`True`) - works only when `dvclive` is used alongside DVC. If true,
upon each `next_step` call, DVC will prepare summary of currently running
training with all metrics logged in `{path}`.
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
Binary file added static/img/dvclive_report.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.