-
Notifications
You must be signed in to change notification settings - Fork 394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Plots #1186
Merged
Plots #1186
Changes from 3 commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
312535d
plot: index
dmpetrov 065db7c
plot: show & diff
dmpetrov 2bf6a04
Add plot to sidebar
dmpetrov 4335227
First feedback fixes
dmpetrov facc7a8
Update content/docs/command-reference/plot/show.md
jorgeorpinel 70339ab
Plot: 2nd round of review
dmpetrov 4b35a28
new plot options & custom templates
dmpetrov 283833a
Update content/docs/command-reference/plot/show.md
jorgeorpinel f388cd6
Update content/docs/command-reference/plot/index.md
jorgeorpinel File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,113 @@ | ||
# plot diff | ||
|
||
Show difference in | ||
[continuous metrics](/doc/command-reference/plot#continous-metrics) by plotting | ||
on a single [plot](/doc/command-reference/plot) different versions of metrics | ||
from the <abbr>DVC repository</abbr> or workspace. | ||
jorgeorpinel marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
## Synopsis | ||
|
||
```usage | ||
usage: dvc plot diff [-h] [-q | -v] [-t [TEMPLATE]] [-d [DATAFILE]] | ||
[-r RESULT] [--no-html] [-f FIELDS] [-o] | ||
[--no-csv-header] | ||
[revisions [revisions ...]] | ||
|
||
positional arguments: | ||
revisions Git revisions to plot from | ||
``` | ||
|
||
## Description | ||
|
||
This command visualize difference between continuous metrics among experiments | ||
in the repository history. Requires that Git is being used to version the | ||
metrics files. | ||
|
||
The metrics file needs to be specified through `--datafile` option. Also, a plot | ||
dmpetrov marked this conversation as resolved.
Show resolved
Hide resolved
|
||
can be customized by [Vega](https://vega.github.io/) templates through option | ||
`--template`. To learn more about the file formats and templates please see | ||
`dvc plot`. | ||
|
||
Run without any revision specified, this command compares metrics currently | ||
presented in the workspace (uncommitted changes) with the latest committed | ||
version. A single specified revision shows the difference between the revision | ||
and the version in the workspace. | ||
|
||
In contrast to many commands such as `git diff`, `dvc metrics diff` and | ||
`dvc prams diff` the plot difference shows all the revisions in a single ouput | ||
and does not limited by two versions. A user can specify as many revisions as | ||
needed. | ||
dmpetrov marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
The files with metrics can be files commited in Git as well as data files under | ||
DVC control. In the case of data files, the file revision is corresponded to Git | ||
revision of [DVC-files](/doc/user-guide/dvc-file-format) that has this file as | ||
an output. | ||
|
||
## Options | ||
|
||
- `-t [TEMPLATE], --template [TEMPLATE]` - File to be injected with data. | ||
|
||
- `-d [DATAFILE], --datafile [DATAFILE]` - Data to be visualized. | ||
|
||
- `-r RESULT, --result RESULT` - Name of the generated file. | ||
|
||
- `--no-html` - Do not wrap vega plot json with HTML. | ||
|
||
- `-f FIELDS, --fields FIELDS` - Choose which fileds or jsonpath to put into | ||
plot. | ||
|
||
- `--no-csv-header` - Provided CSV or TSV datafile does not have a header. | ||
|
||
- `-o, --stdout` - Print plot content to stdout. | ||
|
||
- `-h`, `--help` - prints the usage/help message, and exit. | ||
|
||
- `-q`, `--quiet` - do not write anything to standard output. Exit with 0 if no | ||
problems arise, otherwise 1. | ||
|
||
- `-v`, `--verbose` - displays detailed tracing information. | ||
|
||
## Examples | ||
|
||
The difference between a not commited version of the file and the last commited | ||
one: | ||
dmpetrov marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
```dvc | ||
$ dvc plot diff -d logs.csv | ||
file:///Users/dmitry/src/plot/logs.html | ||
``` | ||
|
||
![](/img/plot_diff_workspace.svg) | ||
|
||
The difference betweeb two specified commits: | ||
dmpetrov marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
```dvc | ||
$ dvc plot diff -d logs.csv HEAD 11c0bf1 | ||
file:///Users/dmitry/src/plot/logs.html | ||
``` | ||
|
||
![](/img/plot_diff.svg) | ||
|
||
The predefined confusion matrix template shows how continuous metrics difference | ||
can be faceted by separate plots: | ||
|
||
```csv | ||
actual,predicted | ||
cat,cat | ||
cat,cat | ||
cat,cat | ||
cat,dog | ||
cat,dinosaur | ||
cat,dinosaur | ||
cat,bird | ||
turtle,dog | ||
turtle,cat | ||
... | ||
``` | ||
|
||
```dvc | ||
$ dvc plot diff -d classes.csv -t confusion_matrix | ||
file:///Users/dmitry/src/test/plot_old/classes.html | ||
``` | ||
|
||
![](/img/plot_diff_confusion.svg) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
@@ -0,0 +1,188 @@ | ||||||||||||||||
# plot | ||||||||||||||||
|
||||||||||||||||
Contains commands to visualize users data stored in structured files like JSON, | ||||||||||||||||
CSV, TSV: [show](/doc/command-reference/plot/show). | ||||||||||||||||
dmpetrov marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||
[diff](/doc/command-reference/plot/diff). | ||||||||||||||||
|
||||||||||||||||
## Synopsis | ||||||||||||||||
|
||||||||||||||||
```usage | ||||||||||||||||
usage: dvc plot [-h] [-q | -v] {show,diff} ... | ||||||||||||||||
positional arguments: | ||||||||||||||||
{show,diff} Use `dvc plot CMD --help` to display command-specific help. | ||||||||||||||||
show Plot data from a file | ||||||||||||||||
dmpetrov marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||
diff Plot changes between commits in the DVC repository, | ||||||||||||||||
or between the last commit and the workspace. | ||||||||||||||||
jorgeorpinel marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||
``` | ||||||||||||||||
|
||||||||||||||||
## Description | ||||||||||||||||
|
||||||||||||||||
DVC provides a set of commands to visualize | ||||||||||||||||
[continuous metrics](/doc/command-reference/plot#continous-metrics) of machine | ||||||||||||||||
learning experiments in <abbr>DVC projects</abbr>. These metrics are usually | ||||||||||||||||
plotted as AUC curves, loss functions, confusion matrixes and other types of | ||||||||||||||||
plots. | ||||||||||||||||
dmpetrov marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||
|
||||||||||||||||
The continuous metrics should be saved in files which are usually created by | ||||||||||||||||
users or generated by user's modeling or data processing code. The plot commands | ||||||||||||||||
can work with these files commited to a repository history, data files | ||||||||||||||||
controlled by DVC or files from workspace. | ||||||||||||||||
jorgeorpinel marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||
|
||||||||||||||||
### Continous metrics | ||||||||||||||||
dmpetrov marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||
|
||||||||||||||||
DVC has two concepts for metrics for representing result of machine learning | ||||||||||||||||
training or data processing: | ||||||||||||||||
|
||||||||||||||||
1. `dvc metrics` to represent scalar numbers such as AUC, true positive rate and | ||||||||||||||||
others. | ||||||||||||||||
2. `dvc plot` to visualize continuous metrics such as AUC curve, loss function, | ||||||||||||||||
confusion matrixes and others. | ||||||||||||||||
dmpetrov marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||
|
||||||||||||||||
Scalar metrics should be stored in a hirarchical files such as JSON and YAML and | ||||||||||||||||
`dvc metrics diff` command can represent difference between the metrics in | ||||||||||||||||
different experiments as a float numbers. Like `AUC` metrics is `0.801807` and | ||||||||||||||||
was increase by `+0.037826` from the previous value: | ||||||||||||||||
|
||||||||||||||||
```dvc | ||||||||||||||||
$ dvc metrics diff | ||||||||||||||||
Path Metric Value Change | ||||||||||||||||
summary.json AUC 0.801807 0.037826 | ||||||||||||||||
``` | ||||||||||||||||
|
||||||||||||||||
In contrast to scalar metrics, continous metrics represents a plot and should be | ||||||||||||||||
stored as an array in JSON file or as a column in CSV or TSV files. The command | ||||||||||||||||
dmpetrov marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||
`dvc plot diff` generates a plot with two versions the metrics: | ||||||||||||||||
dmpetrov marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||
|
||||||||||||||||
```dvc | ||||||||||||||||
$ dvc plot diff -d logs.csv | ||||||||||||||||
file:///Users/dmitry/src/plot/logs.html | ||||||||||||||||
``` | ||||||||||||||||
|
||||||||||||||||
![](/img/plot_auc.svg) | ||||||||||||||||
|
||||||||||||||||
### File formats | ||||||||||||||||
|
||||||||||||||||
Supported file formats for continuous metrics are: JSON, CSV, TSV. DVC expects | ||||||||||||||||
to see an array (or multiple arrays) of _float numbers_ in the file. | ||||||||||||||||
dmpetrov marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||
|
||||||||||||||||
In tabular file formats such as CSV and TSV the array is a column. Plot command | ||||||||||||||||
can generate visuals for a specified column or a set of columns. Like `AUC` | ||||||||||||||||
column: | ||||||||||||||||
Comment on lines
+70
to
+72
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||||
|
||||||||||||||||
``` | ||||||||||||||||
epoch, AUC, loss | ||||||||||||||||
34, 0.91935, 0.0317345 | ||||||||||||||||
35, 0.91913, 0.0317829 | ||||||||||||||||
36, 0.92256, 0.0304632 | ||||||||||||||||
37, 0.92302, 0.0299015 | ||||||||||||||||
``` | ||||||||||||||||
|
||||||||||||||||
In hierarchical file formats such as JSON an array of JSON-objects is expected. | ||||||||||||||||
Plot command can generate visuals for a specified field name or a set of fields | ||||||||||||||||
from the array's object. Like `val_loss` field in the `train` array in this | ||||||||||||||||
example: | ||||||||||||||||
Comment on lines
+82
to
+85
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||||
|
||||||||||||||||
``` | ||||||||||||||||
{ | ||||||||||||||||
"train": [ | ||||||||||||||||
{"val_accuracy": 0.9665, "val_loss": 0.10757}, | ||||||||||||||||
{"val_accuracy": 0.9764, "val_loss": 0.07324}, | ||||||||||||||||
{"val_accuracy": 0.8770, "val_loss": 0.08136}, | ||||||||||||||||
{"val_accuracy": 0.8740, "val_loss": 0.09026}, | ||||||||||||||||
{"val_accuracy": 0.8795, "val_loss": 0.07640}, | ||||||||||||||||
{"val_accuracy": 0.8803, "val_loss": 0.07608}, | ||||||||||||||||
{"val_accuracy": 0.8987, "val_loss": 0.08455} | ||||||||||||||||
] | ||||||||||||||||
} | ||||||||||||||||
``` | ||||||||||||||||
|
||||||||||||||||
### Plot templates | ||||||||||||||||
|
||||||||||||||||
DVC generates plots as HTML files that a user can click and open in a web | ||||||||||||||||
browser. The HTML files contain plots as [Vega-Lite](https://vega.github.io/) | ||||||||||||||||
objects. The files can also be transformed to traditional PNG, JPEG, SVG image | ||||||||||||||||
formats using external tools. | ||||||||||||||||
|
||||||||||||||||
Vega is a declarative, programming language agnostic format of defining plots as | ||||||||||||||||
JSON specification. DVC gives users the ability to change the specification and | ||||||||||||||||
generate plots in the format that fits the best to the users need. At the same | ||||||||||||||||
time, it does not make DVC dependent on user's visualization code or any | ||||||||||||||||
programming language or environment which allows DVC stay programming language | ||||||||||||||||
agnostic. | ||||||||||||||||
|
||||||||||||||||
Plot templates are stored in `.dvc/plot/` directory as json files. A user can | ||||||||||||||||
define it's own templates or modify the existing ones. Please see more details | ||||||||||||||||
in `dvc plot show` and `dvc plot diff`. | ||||||||||||||||
|
||||||||||||||||
## Options | ||||||||||||||||
|
||||||||||||||||
- `-h`, `--help` - prints the usage/help message, and exit. | ||||||||||||||||
|
||||||||||||||||
- `-q`, `--quiet` - do not write anything to standard output. | ||||||||||||||||
|
||||||||||||||||
- `-v`, `--verbose` - displays detailed tracing information. | ||||||||||||||||
|
||||||||||||||||
## Examples | ||||||||||||||||
|
||||||||||||||||
Tabular file `logs.csv` visualization: | ||||||||||||||||
|
||||||||||||||||
``` | ||||||||||||||||
epoch,accuracy,loss,val_accuracy,val_loss | ||||||||||||||||
0,0.9418667,0.19958884770199656,0.9679,0.10217399864746257 | ||||||||||||||||
1,0.9763333,0.07896138601688048,0.9768,0.07310650711813942 | ||||||||||||||||
2,0.98375,0.05241111190887168,0.9788,0.06665669009438716 | ||||||||||||||||
3,0.98801666,0.03681169906261687,0.9781,0.06697812260198989 | ||||||||||||||||
4,0.99111664,0.027362171787042946,0.978,0.07385754839298315 | ||||||||||||||||
5,0.9932333,0.02069501801203781,0.9771,0.08009233058886166 | ||||||||||||||||
6,0.9945,0.017702101902437668,0.9803,0.07830339228538505 | ||||||||||||||||
7,0.9954,0.01396906608727198,0.9802,0.07247738889862157 | ||||||||||||||||
``` | ||||||||||||||||
|
||||||||||||||||
```dvc | ||||||||||||||||
$ dvc plot show logs.csv | ||||||||||||||||
file:///Users/dmitry/src/plot/logs.html | ||||||||||||||||
``` | ||||||||||||||||
|
||||||||||||||||
![](/img/plot_show.svg) | ||||||||||||||||
|
||||||||||||||||
Difference between the current file and the previous commited one: | ||||||||||||||||
|
||||||||||||||||
```dvc | ||||||||||||||||
$ dvc plot diff -d logs.csv HEAD^ | ||||||||||||||||
file:///Users/dmitry/src/plot/logs.html | ||||||||||||||||
``` | ||||||||||||||||
|
||||||||||||||||
![](/img/plot_diff.svg) | ||||||||||||||||
|
||||||||||||||||
Visualize a specific field: | ||||||||||||||||
|
||||||||||||||||
```dvc | ||||||||||||||||
$ dvc plot show --field loss logs.csv | ||||||||||||||||
file:///Users/dmitry/src/plot/logs.html | ||||||||||||||||
``` | ||||||||||||||||
|
||||||||||||||||
![](/img/plot_show_field.svg) | ||||||||||||||||
|
||||||||||||||||
Confusion matrix template is predefined in DVC (file | ||||||||||||||||
`.dvc/plot/confusion_matrix.json`): | ||||||||||||||||
|
||||||||||||||||
```csv | ||||||||||||||||
actual,predicted | ||||||||||||||||
cat,cat | ||||||||||||||||
cat,cat | ||||||||||||||||
cat,cat | ||||||||||||||||
cat,dog | ||||||||||||||||
cat,dinosaur | ||||||||||||||||
cat,dinosaur | ||||||||||||||||
cat,bird | ||||||||||||||||
turtle,dog | ||||||||||||||||
turtle,cat | ||||||||||||||||
... | ||||||||||||||||
``` | ||||||||||||||||
|
||||||||||||||||
```dvc | ||||||||||||||||
$ dvc plot show classes.csv --template confusion_matrix | ||||||||||||||||
file:///Users/dmitry/src/plot/classes.html | ||||||||||||||||
``` | ||||||||||||||||
|
||||||||||||||||
![](/img/plot_show_confusion.svg) |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder whether we want to name them
continuous
. This word applies to functions. What about, for example, confusion matrix? Data for that type of plot is not continuous.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed with @pared . Probably even plain explicit "non-scalar metric" would be better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah. it will be changed. all the terminology around continuous will be removed in the next iteration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.