-
Notifications
You must be signed in to change notification settings - Fork 394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
live: initial docs draft #2227
live: initial docs draft #2227
Conversation
Like @dberenbaum mentioned, we did discuss this in some Slack thread. My take was that "LogML" or "TraceML" were the best options. |
@pared Love that you added the new section for dvc usage! I have an example repo in https://github.com/iterative/dvc-checkpoints-mnist that shows how to integrate dvclive with a checkpoints experiment. Maybe it's worth linking that here as an example of how it can be used in place of Thoughts @pared @pmrowla @jorgeorpinel? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a few comments, but this is a great start! I'll reiterate that it's better to have some documentation than none, so don't take any of my comments as blockers.
content/docs/dvclive/index.md
Outdated
training summary during training. When used in DVC project user does not have | ||
to call `dvclive.init` method. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When used in DVC project user does not have
to calldvclive.init
method.
I'd probably leave this out since it's not that much of an incentive. I would either just leave the first sentence or add in some other benefit, like "When used in a DVC project, it can log metrics and create checkpoints during training that are tracked by DVC."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing, I guess the proper place for that is usage-with-dvc
section
|
||
The training code (`train.py` file): | ||
|
||
```python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should update the example repo I have with this example. Keras really makes for a clean, minimal example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but on the other hand we need to prepare Callback
, while in torch
everything nicely goes into the training loop
content/docs/dvclive/usage.md
Outdated
In order to do that, we will need to provide proper | ||
[`Callback`](https://keras.io/api/callbacks/) for `fit` method: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is nice, but would it be easier for people getting started to see a manual training loop first rather than a callback? This doesn't need to be a blocker, but maybe something to consider as an enhancement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that if we link your torch
example, it will be fine.
As i read manual loop tutorial for keras, it seems to me that we will occlude the main point of the tutorial with step by step execution (pushing data through model, calling optimizer, updating gradients). While in this case we have everything hidden under fit
call. Callbacks
are not that obvious concept, though I think its easier to understand it than reading through whole training loop.
Still, that makes me think that maybe we need not one example, but more? MNIST classifier targeted for DL frameworks users, and something different for, for example, classic ML practitioners? Maybe something with Iris dataset?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we will need multiple examples. I'm still not sure myself how dvclive would work for "classic" ML or whether that's a useful scenario to consider. For now, let's keep what you have, and if you think there's a natural way to link to the dvc-checkpoints-mnist example, we could have that as an additional use case. Otherwise, we can find a way to add that or a similar example in the future.
Let's use the code prepared in previous example and try to make it work with | ||
dvc. Training file `train.py` content: | ||
|
||
```python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mentioned that we might want to link to https://github.com/iterative/dvc-checkpoints-mnist, but alternatively you could incorporate checkpoints in this example. That might be too much to dive right into, but I think most use cases for dvclive with dvc will include checkpoints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then maybe we need one more section? We integrated the project with DVC and dvclive, lets make some experiments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the changes to introduce checkpoints would be to write the model out in the callback, and to add the model output to dvc with stage add --checkpoint model ...
. I don't see checkpoints as necessarily being tied to experiments. It adds versioning of the model output that aligns with the metrics dvclive is tracking.
However, as you mentioned below, let's think about how we can introduce multiple examples to cover all of this. I think the example you have is a great starting point.
Now, lets use dvc to run the project: | ||
|
||
```dvc | ||
$ dvc run -n train --live training_metrics -d train.py python train.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to use stage add && repro/exp run
here? @jorgeorpinel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I will introduce that.
In `dvc.yaml` there is new stage defined, containing information about the | ||
`dvclive` outputs: | ||
|
||
```bash | ||
$ cat dvc.yaml | ||
|
||
stages: | ||
train: | ||
cmd: python train.py | ||
deps: | ||
- train.py | ||
live: | ||
training_metrics: | ||
summary: true | ||
html: true | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using stage add
above also gives you a reason to move this section up and separate it from the run/repro
part and explanation of the dvclive outputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once the changes around stage add
are implemented, I think this should be ready to merge.
Let's use the code prepared in previous example and try to make it work with | ||
dvc. Training file `train.py` content: | ||
|
||
```python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the changes to introduce checkpoints would be to write the model out in the callback, and to add the model output to dvc with stage add --checkpoint model ...
. I don't see checkpoints as necessarily being tied to experiments. It adds versioning of the model output that aligns with the metrics dvclive is tracking.
However, as you mentioned below, let's think about how we can introduce multiple examples to cover all of this. I think the example you have is a great starting point.
content/docs/dvclive/usage.md
Outdated
In order to do that, we will need to provide proper | ||
[`Callback`](https://keras.io/api/callbacks/) for `fit` method: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we will need multiple examples. I'm still not sure myself how dvclive would work for "classic" ML or whether that's a useful scenario to consider. For now, let's keep what you have, and if you think there's a natural way to link to the dvc-checkpoints-mnist example, we could have that as an additional use case. Otherwise, we can find a way to add that or a similar example in the future.
content/docs/dvclive/index.md
Outdated
[dvclive](https://cml.dev) is an open-source python library for monitoring | ||
[dvclive](https://cml.dev) is an open-source python library for monitoring the | ||
progress of metrics during training of machine learning projects. |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
during training of machine learning models
Is this too specific? What's the broader use case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so? Are you asking whether there are non-ML use cases (not likely), or is there something else you think might be too specific?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you asking whether there are non-ML use cases (not likely)
Non model-training use cases. E.g. feature extraction or any other stage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. I think it's safe to say that dvclive is a narrower product built specifically for model training.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed the index @pared ☝️. That plus changing sidebar.json + Dave's review should be good enough as a first iteration to merge I think.
Co-authored-by: Jorge Orpinel <[email protected]>
Co-authored-by: Jorge Orpinel <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dvclive + DVC copy edits (will commit):
Restyle live: initial docs draft
I have a few minor questions/notes starting at #2227 (review) (cc @pared please scroll through all the resolved comments) but otherwise LGTM — feel free to merge. |
The value passed to `--live` (`training_metrics`) became the directory `path` | ||
for Dvclive to write logs in. Other supported command options for DVC | ||
integration: | ||
|
||
- `--live-no-summary` - passes `summary=False` to Dvclive. | ||
- `--live-no-html` - passes `html=False` to Dvclive. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Oh yeah also we need to add these to https://dvc.org/doc/command-reference/stage/add#options ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 4042082.
Co-authored-by: Jorge Orpinel <[email protected]>
@jorgeorpinel Also I did a issue to fix up missing entries for stages and stage related commands I am not authorized to merge |
❗ Please read the guidelines in the Contributing to the Documentation list if you make any substantial changes to the documentation or JS engine.
🐛 Please make sure to mention
Fix #issue
(if applicable) in the description of the PR. This causes GitHub to close it automatically when the PR is merged.Please choose to allow us to edit your branch when creating the PR.
Thank you for the contribution - we'll try to review it as soon as possible. 🙏
Matches iterative/dvc/pull/4902