From de12fcdade01227bacdd504f69d46fd3becaf615 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Pawe=C5=82=20Redzy=C5=84ski?= Date: Tue, 21 Jun 2022 12:15:18 +0200 Subject: [PATCH] cmd-ref: plots: flexible plots docs Related: iterative/dvc#7477 Related: #2956 --- content/docs/command-reference/plots/diff.md | 4 +- content/docs/command-reference/plots/index.md | 258 +++++++++++++++--- .../docs/command-reference/plots/modify.md | 5 +- content/docs/command-reference/plots/show.md | 28 +- content/docs/sidebar.json | 4 + static/img/plots_diff.svg | 2 +- static/img/plots_diff_two_revs.svg | 1 + static/img/plots_show.svg | 2 +- static/img/plots_show_confusion.svg | 2 +- .../img/plots_show_confusion_normalized.svg | 1 + static/img/plots_show_field.svg | 2 +- static/img/plots_show_no_smooth.svg | 2 +- static/img/plots_show_smooth.svg | 2 +- .../img/plots_show_spec_conf_train_test.svg | 1 + static/img/plots_show_spec_default.svg | 1 + .../img/plots_show_spec_multiple_columns.svg | 1 + static/img/plots_show_spec_simple_custom.svg | 1 + 17 files changed, 247 insertions(+), 70 deletions(-) create mode 100644 static/img/plots_diff_two_revs.svg create mode 100644 static/img/plots_show_confusion_normalized.svg create mode 100644 static/img/plots_show_spec_conf_train_test.svg create mode 100644 static/img/plots_show_spec_default.svg create mode 100644 static/img/plots_show_spec_multiple_columns.svg create mode 100644 static/img/plots_show_spec_simple_custom.svg diff --git a/content/docs/command-reference/plots/diff.md b/content/docs/command-reference/plots/diff.md index eb91dc2f925..b89f0c4691c 100644 --- a/content/docs/command-reference/plots/diff.md +++ b/content/docs/command-reference/plots/diff.md @@ -122,11 +122,11 @@ file:///Users/usr/src/dvc_plots/index.html Compare two specific versions (commit hashes, tags, or branches): ```cli -$ dvc plots diff HEAD 0135527 --targets logs.csv +$ dvc plots diff HEAD^ 0135527 --targets logs.csv file:///Users/usr/src/dvc_plots/index.html ``` -![](/img/plots_diff.svg) +![](/img/plots_diff_two_revs.svg) ## Example: Confusion matrix diff --git a/content/docs/command-reference/plots/index.md b/content/docs/command-reference/plots/index.md index 1431587bc41..bb7f2113f80 100644 --- a/content/docs/command-reference/plots/index.md +++ b/content/docs/command-reference/plots/index.md @@ -1,6 +1,6 @@ # plots -A set of commands to visualize and compare _plot metrics_: +A set of commands to visualize and compare _plot data_: [show](/doc/command-reference/plots/show), [diff](/doc/command-reference/plots/diff), and [modify](/doc/command-reference/plots/modify). @@ -12,46 +12,37 @@ usage: dvc plots [-h] [-q | -v] {show,diff,modify} ... positional arguments: COMMAND - show Generate plot from a metrics file. - diff Plot differences in metrics between commits. - modify Modify display properties of data-series plots (has no effect on image-type plots). + show Generate plots from target files or plots definitions from `dvc.yaml` file. + diff Show multiple versions of plot data by plotting them in a single image. + modify Modify display properties of data-series plot outputs (has no effect on image-type plots). ``` -## Types of metrics - -DVC has two concepts for metrics, that represent different results of machine -learning training or data processing: - -1. `dvc metrics` represent **scalar numbers** such as AUC, _true positive rate_, - etc. -2. `dvc plots` can be used to visualize **data series** such as AUC curves, loss - functions, confusion matrices, etc. - ## Description -DVC provides a set of commands to visualize certain metrics of machine learning -experiments as plots. Usual plot examples are AUC curves, loss functions, -confusion matrices, among others. +DVC provides a set of commands to visualize data produced by machine learning +experiments. Usual plot examples are AUC curves, loss functions, confusion +matrices, among others. -This type of metrics files are created by users, or generated by user data -processing code, and can be defined in `dvc.yaml` (`plots` field) for tracking -(optional). +This type of data is created by users, or generated by user data processing +code. -DVC can work with two types of plots files: +## Types of plots + +`dvc plots` is able to visualize two types of data: 1. Data series files, which can be JSON, YAML, CSV or TSV. 2. Image files in JPEG, GIF, or PNG format. -DVC generates plots as static HTML webpages that can be open with a web browser. -They can also be saved as SVG or PNG image files from the browser. +DVC generates visualizations as static HTML webpages that can be open with a web +browser. They can also be saved as SVG or PNG image files from the browser. Data-series plots utilize [Vega-Lite](https://vega.github.io/vega-lite/) for -rendering (declarative JSON grammar for defining graphics). Image-type plots are -rendered using `` tags directly. +rendering (declarative JSON grammar for defining graphics). Images are rendered +using `` tags directly. ## Supported file formats -Image-type plots are included in HTML as-is, without additional processing. +Images are included in HTML as-is, without additional processing. > We recommend to track these source image files with DVC instead of Git, to > prevent the repository from bloating. @@ -96,7 +87,44 @@ names in the `train` array below: } ``` -## Plot templates (data series only) +## Definining a plot + +In order to create visualizations, users need to provide the data and +(optionally) configuration that will help customize the plot. DVC provides two +ways to configure visualizations. Users can mark specific stage +outputs as plot or define plot configuration inside `dvc.yaml` +under `plots` key. + +## Plots definitions + +Plots defined in `dvc.yaml` are especially useful when users want to compare +data from differend data sources residing on the same version of the project. +For example, comparing training versus test results on current branch. + +### Syntax + +In order to define the plot users need to provide data and configuration for the +plot. The plots should be defined in `dvc.yaml` file under `plots` key. Refer to +the [examples](/doc/command-reference/plots#example-simple-plot-definition) for +more syntax insight. + +```yaml +# dvc.yaml +stages: ... + +plots: ... +``` + +## Plot outputs + +When using `dvc run` or `dvc stage add`, instead of using +`--outs/--outs-no-cache` particular outputs can be marked with +`--plots/--plots-no-cache`. This will tell DVC that they are intended for +visualizations. This special type of outputs might come in hand if users want to +visually compare experiments results with other experiments versions. For +example, comparing new experiment with the baseline version of the project. + +## Plot templates (data-series only) Users have the ability to change the way data-series plots are displayed by modifying the [Vega-Lite specification](https://vega.github.io/vega-lite/), thus @@ -165,7 +193,7 @@ header (first row) are equivalent to field names. - `` (optional) - field name to display as the X axis label -## HTML templates +## Custom HTML templates It's possible to supply an HTML file to `dvc plot show` and `dvc plot diff` by using the the `--html-template` option. This allows you to customize the @@ -189,18 +217,14 @@ this feature to render DVC plots without an Internet connection, below. ## Example: Tabular data -We'll use tabular metrics file `logs.csv` for this example: +We'll use tabular data file `logs.csv` for this example: ``` -epoch,accuracy,loss,val_accuracy,val_loss -0,0.9418667,0.19958884770199656,0.9679,0.10217399864746257 -1,0.9763333,0.07896138601688048,0.9768,0.07310650711813942 -2,0.98375,0.05241111190887168,0.9788,0.06665669009438716 -3,0.98801666,0.03681169906261687,0.9781,0.06697812260198989 -4,0.99111664,0.027362171787042946,0.978,0.07385754839298315 -5,0.9932333,0.02069501801203781,0.9771,0.08009233058886166 -6,0.9945,0.017702101902437668,0.9803,0.07830339228538505 -7,0.9954,0.01396906608727198,0.9802,0.07247738889862157 +epoch,loss,accuracy +1,0.19,0.81 +2,0.11,0.89 +3,0.07,0.93 +4,0.04,0.96 ``` Let's plot the last column (default behavior): @@ -222,10 +246,10 @@ file:///Users/usr/src/dvc_plots/index.html ![](/img/plots_diff.svg) -Visualize a specific field: +Visualize a specific field (`loss`) as y. Use `epoch` as x: ```dvc -$ dvc plots show -y loss logs.csv +$ dvc plots show logs.csv -y loss -x epoch file:///Users/usr/src/dvc_plots/index.html ``` @@ -234,7 +258,7 @@ file:///Users/usr/src/dvc_plots/index.html ## Example: Smooth plot In some cases we would like to smooth our plot. In this example we will use a -plot with 1000 data points: +noisy plot with 100 data points: ```dvc $ dvc plots show data.csv @@ -280,7 +304,157 @@ file:///Users/usr/src/dvc_plots/index.html ![](/img/plots_show_confusion.svg) > A confusion matrix [template](/doc/command-reference/plots#plot-templates) is -> predefined in DVC (found in `.dvc/plots/confusion.json`). +> predefined in DVC. + +We can use `confusion_normalized` template to normalize the results: + +```dvc +$ dvc plots show classes.csv --template confusion_normalized -x actual -y predicted +file:///Users/usr/src/dvc_plots/index.html +``` + +![](/img/plots_show_confusion_normalized.svg) + +## Example: simple plot definition + +Let's get back to the `logs.csv` data: + +``` +# logs.csv +epoch,loss,accuracy +1,0.19,0.81 +2,0.11,0.89 +3,0.07,0.93 +4,0.04,0.96 +``` + +Minimal plot definition we can put in `dvc.yaml` is simply data source path +relative to `dvc.yaml` file: + +```yaml +# dvc.yaml +stages: + train: + cmd: echo "Training the model..." + +plots: + logs.csv: +``` + +```dvc +$ dvc plots show +file:///Users/usr/src/dvc_plots/index.html +``` + +![](/img/plots_show_spec_default.svg) + +We can customize it: + +```yaml +# dvc.yaml +stages: + train: + cmd: echo "Training the model..." + +plots: + logs.csv: + x: epoch + y: accuracy + title: Displaying accuracy + x_label: This is epoch + y_label: This is accuracy +``` + +```dvc +$ dvc plots show +file:///Users/usr/src/dvc_plots/index.html +``` + +![](/img/plots_show_spec_simple_custom.svg) + +## Example: multiple data-series plot definition: + +Data in `training_data.csv`: + +```csv +epoch,train_loss,test_loss +1,0.33,0.4 +2,0.3,0.28 +3,0.2,0.25 +4,0.1,0.23 +``` + +```yaml +# dvc.yaml +stages: + train: + cmd: echo "Training the model..." + +plots: + test_vs_train_loss: + x: epoch + y: + training_data.csv: [test_loss, train_loss] + title: Compare loss training versus test +``` + +![](/img/plots_show_spec_multiple_columns.svg) + +## Example: sourcing data from different files + +Lets prepare comparison for confusion matrix data between test set and training +set: + +```csv +# train_classes.csv +actual_class,predicted_class +dog,dog +dog,dog +dog,dog +dog,bird +cat,cat +cat,cat +cat,cat +cat,dog +bird,bird +bird,bird +bird,bird +bird,dog +``` + +```csv +# test_classes.csv +actual_class,predicted_class +dog,dog +dog,dog +dog,cat +bird,bird +bird,bird +bird,cat +cat,cat +cat,cat +cat,bird +``` + +```yaml +# dvc.yaml +stages: + train: + cmd: echo "Training the model..." + +plots: + test_vs_train_confusion: + x: actual_class + y: + train_classes.csv: predicted_class + test_classes.csv: predicted_class + title: Compare test vs train confusion matrix + template: confusion + x_label: Actual class + y_label: Predicted class +``` + +![](/img/plots_show_spec_conf_train_test.svg) ## Example: Offline HTML Template diff --git a/content/docs/command-reference/plots/modify.md b/content/docs/command-reference/plots/modify.md index bbfe226431a..8cc1ba0a36c 100644 --- a/content/docs/command-reference/plots/modify.md +++ b/content/docs/command-reference/plots/modify.md @@ -1,10 +1,11 @@ # plots modify -Modify display properties of [plot metrics](/doc/command-reference/plots) files. +Modify display properties of +[plot outputs](/doc/command-reference/plots#plot-outputs) files. > ⚠️ Note that this command can modify only data-series plots. It has no effect > on image-type plots. See -> [Types of metrics](/doc/command-reference/plots#types-of-metrics). +> [Types of plots](/doc/command-reference/plots#types-of-plots). ## Synopsis diff --git a/content/docs/command-reference/plots/show.md b/content/docs/command-reference/plots/show.md index 16f0b6445a3..0c1ca59675e 100644 --- a/content/docs/command-reference/plots/show.md +++ b/content/docs/command-reference/plots/show.md @@ -143,15 +143,11 @@ file:///Users/usr/src/dvc_plots/index.html We'll use tabular metrics file `logs.csv` for these examples: ``` -epoch,accuracy,loss,val_accuracy,val_loss -0,0.9418667,0.19958884770199656,0.9679,0.10217399864746257 -1,0.9763333,0.07896138601688048,0.9768,0.07310650711813942 -2,0.98375,0.05241111190887168,0.9788,0.06665669009438716 -3,0.98801666,0.03681169906261687,0.9781,0.06697812260198989 -4,0.99111664,0.027362171787042946,0.978,0.07385754839298315 -5,0.9932333,0.02069501801203781,0.9771,0.08009233058886166 -6,0.9945,0.017702101902437668,0.9803,0.07830339228538505 -7,0.9954,0.01396906608727198,0.9802,0.07247738889862157 +epoch,loss,accuracy +1,0.19,0.81 +2,0.11,0.89 +3,0.07,0.93 +4,0.04,0.96 ```
@@ -161,15 +157,11 @@ epoch,accuracy,loss,val_accuracy,val_loss Here's a corresponding `train.tsv` metrics file: ``` -epoch accuracy loss val_accuracy val_loss -0 0.9418667 0.19958884770199656 0.9679 0.10217399864746257 -1 0.9763333 0.07896138601688048 0.9768 0.07310650711813942 -2 0.98375 0.05241111190887168 0.9788 0.06665669009438716 -3 0.988016 0.03681169906261687 0.9781 0.06697812260198989 -4 0.991116 0.027362171787042946 0.978 0.07385754839298315 -5 0.9932333 0.02069501801203781 0.9771 0.08009233058886166 -6 0.9945 0.017702101902437668 0.9803 0.07830339228538505 -7 0.9954 0.01396906608727198 0.9802 0.07247738889862157 +epoch loss accuracy +1 0.19 0.81 +2 0.11 0.89 +3 0.07 0.93 +4 0.04 0.96 ```
diff --git a/content/docs/sidebar.json b/content/docs/sidebar.json index b86020176c8..a93201785b1 100644 --- a/content/docs/sidebar.json +++ b/content/docs/sidebar.json @@ -388,6 +388,10 @@ { "label": "plots modify", "slug": "modify" + }, + { + "label": "plots templates", + "slug": "templates" } ] }, diff --git a/static/img/plots_diff.svg b/static/img/plots_diff.svg index 229ff281298..9e0b88e0025 100644 --- a/static/img/plots_diff.svg +++ b/static/img/plots_diff.svg @@ -1 +1 @@ -01234567index0.000.020.040.060.080.10val_loss0135527HEADrev \ No newline at end of file +0.00.51.01.52.02.53.0step0.760.780.800.820.840.860.880.900.920.940.96accuracyHEAD^workspacerevlogs.csv \ No newline at end of file diff --git a/static/img/plots_diff_two_revs.svg b/static/img/plots_diff_two_revs.svg new file mode 100644 index 00000000000..2d8ff1f58c8 --- /dev/null +++ b/static/img/plots_diff_two_revs.svg @@ -0,0 +1 @@ +0.00.51.01.52.02.53.0step0.740.760.780.800.820.840.860.880.900.920.940.96accuracy0135527HEAD^revlogs.csv \ No newline at end of file diff --git a/static/img/plots_show.svg b/static/img/plots_show.svg index 2e49efef9cb..533b9486a50 100644 --- a/static/img/plots_show.svg +++ b/static/img/plots_show.svg @@ -1 +1 @@ -01234567index0.070.080.090.10val_lossworkspacerev \ No newline at end of file +0.00.51.01.52.02.53.0step0.800.820.840.860.880.900.920.940.96accuracyworkspacerevlogs.csv \ No newline at end of file diff --git a/static/img/plots_show_confusion.svg b/static/img/plots_show_confusion.svg index 7e843281cf3..d72f5762eae 100644 --- a/static/img/plots_show_confusion.svg +++ b/static/img/plots_show_confusion.svg @@ -1 +1 @@ -revbirdcatdinosaurdogturtleactualbirdcatdinosaurdogturtlepredicted362236121362113621136211000000workspace010203040 \ No newline at end of file +revbirdcatdinosaurdogturtlepredictedbirdcatdinosaurdogturtleactual264532265342262723265135265720workspace0510152025classes.csv \ No newline at end of file diff --git a/static/img/plots_show_confusion_normalized.svg b/static/img/plots_show_confusion_normalized.svg new file mode 100644 index 00000000000..199a93e852e --- /dev/null +++ b/static/img/plots_show_confusion_normalized.svg @@ -0,0 +1 @@ +revbirdcatdinosaurdogturtlepredictedbirdcatdinosaurdogturtleactual0.670.090.120.090.050.670.110.090.100.050.790.050.150.050.080.620.110.030.080.130.550.130.170.050.00workspace0.00.20.40.60.81.0classes.csv \ No newline at end of file diff --git a/static/img/plots_show_field.svg b/static/img/plots_show_field.svg index ddaabf710b0..79f17e9f2d0 100644 --- a/static/img/plots_show_field.svg +++ b/static/img/plots_show_field.svg @@ -1 +1 @@ -01234567index0.000.050.100.150.20lossworkspacerev \ No newline at end of file +1.01.52.02.53.03.54.0epoch0.040.060.080.100.120.140.160.180.20lossworkspacerevlogs.csv \ No newline at end of file diff --git a/static/img/plots_show_no_smooth.svg b/static/img/plots_show_no_smooth.svg index bae27f1407d..c23eb82a5d0 100644 --- a/static/img/plots_show_no_smooth.svg +++ b/static/img/plots_show_no_smooth.svg @@ -1 +1 @@ -02004006008001,000index0.00.51.01.5yworkspacerev \ No newline at end of file +0102030405060708090100step−0.050.000.050.100.150.200.250.300.35lossworkspacerevdata.csv \ No newline at end of file diff --git a/static/img/plots_show_smooth.svg b/static/img/plots_show_smooth.svg index 934fc97de0f..4240286f669 100644 --- a/static/img/plots_show_smooth.svg +++ b/static/img/plots_show_smooth.svg @@ -1 +1 @@ -02004006008001,000index0.20.40.60.81.01.2yworkspacerev \ No newline at end of file +0102030405060708090100step0.000.050.100.150.200.25lossworkspacerevdata.csv \ No newline at end of file diff --git a/static/img/plots_show_spec_conf_train_test.svg b/static/img/plots_show_spec_conf_train_test.svg new file mode 100644 index 00000000000..9f2bf7e3a1e --- /dev/null +++ b/static/img/plots_show_spec_conf_train_test.svg @@ -0,0 +1 @@ +revbirdcatdogPredicted classbirdcatdogActual classbirdcatdogActual class212121000test_classes.csv313131000train_classes.csv0123Compare test vs train confusion matrix \ No newline at end of file diff --git a/static/img/plots_show_spec_default.svg b/static/img/plots_show_spec_default.svg new file mode 100644 index 00000000000..5b78e1b365b --- /dev/null +++ b/static/img/plots_show_spec_default.svg @@ -0,0 +1 @@ +0.00.51.01.52.02.53.0step0.800.820.840.860.880.900.920.940.96accuracyworkspacerevdvc.yaml::logs.csv \ No newline at end of file diff --git a/static/img/plots_show_spec_multiple_columns.svg b/static/img/plots_show_spec_multiple_columns.svg new file mode 100644 index 00000000000..24c338ad5d4 --- /dev/null +++ b/static/img/plots_show_spec_multiple_columns.svg @@ -0,0 +1 @@ +1.01.52.02.53.03.54.0epoch0.100.150.200.250.300.350.40ytest_losstrain_lossrevCompare loss training versus test \ No newline at end of file diff --git a/static/img/plots_show_spec_simple_custom.svg b/static/img/plots_show_spec_simple_custom.svg new file mode 100644 index 00000000000..ed69a15a719 --- /dev/null +++ b/static/img/plots_show_spec_simple_custom.svg @@ -0,0 +1 @@ +1.01.52.02.53.03.54.0This is epoch0.800.820.840.860.880.900.920.940.96This is accuracyworkspacerevDisplaying accuracy \ No newline at end of file