From 097d0136d97030d43742f82460977659f7032d21 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Pawe=C5=82=20Redzy=C5=84ski?= Date: Tue, 21 Jun 2022 12:15:18 +0200 Subject: [PATCH 01/24] cmd-ref: plots: flexible plots docs Related: iterative/dvc#7477 Related: #2956 --- content/docs/command-reference/plots/diff.md | 8 +- content/docs/command-reference/plots/index.md | 384 +++++++++++++++--- .../docs/command-reference/plots/modify.md | 13 +- content/docs/command-reference/plots/show.md | 48 +-- .../project-structure/dvcyaml-files.md | 28 ++ static/img/plots_diff.svg | 2 +- static/img/plots_diff_two_revs.svg | 1 + static/img/plots_show.svg | 2 +- static/img/plots_show_confusion.svg | 2 +- .../img/plots_show_confusion_normalized.svg | 1 + static/img/plots_show_field.svg | 2 +- static/img/plots_show_no_smooth.svg | 2 +- static/img/plots_show_smooth.svg | 2 +- .../img/plots_show_spec_conf_train_test.svg | 1 + static/img/plots_show_spec_default.svg | 1 + .../img/plots_show_spec_multiple_columns.svg | 1 + static/img/plots_show_spec_simple_custom.svg | 1 + 17 files changed, 408 insertions(+), 91 deletions(-) create mode 100644 static/img/plots_diff_two_revs.svg create mode 100644 static/img/plots_show_confusion_normalized.svg create mode 100644 static/img/plots_show_spec_conf_train_test.svg create mode 100644 static/img/plots_show_spec_default.svg create mode 100644 static/img/plots_show_spec_multiple_columns.svg create mode 100644 static/img/plots_show_spec_simple_custom.svg diff --git a/content/docs/command-reference/plots/diff.md b/content/docs/command-reference/plots/diff.md index 856168e214..a1b4845d8f 100644 --- a/content/docs/command-reference/plots/diff.md +++ b/content/docs/command-reference/plots/diff.md @@ -1,7 +1,7 @@ # plots diff -Show multiple versions of [plot metrics](/doc/command-reference/plots) by -overlaying them in a single image. This allows to compare them easily. +Show multiple versions of [plots](/doc/command-reference/plots) by overlaying +them in a single image. This allows to compare them easily. ## Synopsis @@ -123,11 +123,11 @@ file:///Users/usr/src/dvc_plots/index.html Compare two specific versions (commit hashes, tags, or branches): ```cli -$ dvc plots diff HEAD 0135527 --targets logs.csv +$ dvc plots diff HEAD^ 0135527 --targets logs.csv file:///Users/usr/src/dvc_plots/index.html ``` -![](/img/plots_diff.svg) +![](/img/plots_diff_two_revs.svg) ## Example: Confusion matrix diff --git a/content/docs/command-reference/plots/index.md b/content/docs/command-reference/plots/index.md index aeffe60685..d8e871b7f0 100644 --- a/content/docs/command-reference/plots/index.md +++ b/content/docs/command-reference/plots/index.md @@ -1,7 +1,7 @@ # plots -A set of commands to visualize and compare _plot metrics_: -[show](/doc/command-reference/plots/show), +A set of commands to visualize and compare data series or images from ML +projects: [show](/doc/command-reference/plots/show), [diff](/doc/command-reference/plots/diff), [modify](/doc/command-reference/plots/modify) and [templates](/doc/command-reference/plots/templates). @@ -13,31 +13,22 @@ usage: dvc plots [-h] [-q | -v] {show,diff,modify,templates} ... positional arguments: COMMAND - show Generate plot from a metrics file. - diff Plot differences in metrics between commits. - modify Modify display properties of data-series plots (has no effect on image-type plots). - templates Write built-in plots templates to a directory (.dvc/plots by default). + show Generate plots from target files or from `plots` + definitions in `dvc.yaml`. + diff Show multiple versions of a plot by overlaying them + in a single image. + modify Modify display properties of data-series plots + defined in stages (has no effect on image plots). + templates Write built-in plots templates to a directory (.dvc/plots by default). ``` -## Types of metrics - -DVC has two concepts for metrics, that represent different results of machine -learning training or data processing: - -1. `dvc metrics` represent **scalar numbers** such as AUC, _true positive rate_, - etc. -2. `dvc plots` can be used to visualize **data series** such as AUC curves, loss - functions, confusion matrices, etc. - ## Description -DVC provides a set of commands to visualize certain metrics of machine learning -experiments as plots. Usual plot examples are AUC curves, loss functions, -confusion matrices, among others. - -This type of metrics files are created by users, or generated by user data -processing code, and can be defined in `dvc.yaml` (`plots` field) for tracking -(optional). +DVC provides a set of commands to visualize data produced by machine learning +projects. Usual plots include AUC curves, loss functions, or confusion matrices, +for example. Plots are a great alternative to `dvc metrics` when working with +multi-dimensional performance data. They also help you present and compare +[experiments](/doc/command-reference/exp) effectively. DVC can work with two types of plots files: @@ -50,17 +41,17 @@ DVC plots from the [VS Code Extension], which includes a special [Plots Dashboard] that corresponds to the features in the `dvc plots` commands. Data-series plots utilize [Vega-Lite](https://vega.github.io/vega-lite/) for -rendering (declarative JSON grammar for defining graphics). Image-type plots are -rendered using `` tags directly. +rendering (declarative JSON grammar for defining graphics). Images are rendered +using `` tags directly. [vs code extension]: https://marketplace.visualstudio.com/items?itemName=Iterative.dvc [plots dashboard]: https://github.com/iterative/vscode-dvc/blob/main/extension/resources/walkthrough/plots.md -## Supported file formats +### Supported file formats -Image-type plots are included in HTML as-is, without additional processing. +Images are included in HTML as-is, without additional processing. > We recommend to track these source image files with DVC instead of Git, to > prevent the repository from bloating. @@ -105,7 +96,144 @@ names in the `train` array below: } ``` -## Plot templates (data series only) +## Configuring a plot + +In order to create visualizations, users need to provide the data and +(optionally) configuration that will help customize the plot. DVC provides two +ways to configure visualizations. Users can mark specific stage +outputs as plot or define plot configuration inside `dvc.yaml` +under `plots` key. + +### Top-level plot definitions + +Plots can be defined in `dvc.yaml` under the `plots` key. Unlike +[stage plots](#stage-plots), these are especially useful when users want to +compare data from different data sources residing on the same version of the +project. For example, comparing training versus test results on current branch. + +### Stage plots + +When using `dvc stage add`, instead of using `--outs/--outs-no-cache` particular +outputs can be marked with `--plots/--plots-no-cache`. This will tell DVC that +they are intended for visualizations. + +Upon running `dvc plots show/diff` DVC will collect stage plots alongside the +top-level plot definitions and display them conforming to their configuration. +Note, that if there are stage plots in the project and they are also used in +some top-level definitions, DVC will create separate rendering for the stage +plots and all definitions using them. + +This special type of outputs might come in hand if users want to visually +compare experiments results with other experiments versions and not bother with +writing top-level plots definitions into the `dvc.yaml`. + +### Syntax (top-level plot definitions only) + +In order to define the plot users need to provide data and an optional +configuration for the plot. The plots should be defined in `dvc.yaml` file under +`plots` key. + +```yaml +# dvc.yaml +stages: ... + +plots: ... +``` + +Every plots has to have its own id. Configuration, if provided, should be a +dictionary. + +In simplest use case, user can provide file path as the plot id and don't +provide configuration at all: + +```yaml +# dvc.yaml +--- +plots: + logs.csv: +``` + +In that case the default behavior will be applied. DVC will take data from +`logs.csv` file and apply `linear` plot +[template](/doc/command-reference/plots#plot-templates) to the last found column +(CSV, TSV files) or field (JSON, YAML). + +We can customize the plot by adding appropriate fields to the configuration: + +```yaml +# dvc.yaml +--- +plots: + confusion_matrix: + y: + confusion_matrix_data.csv: predicted_class + x: actual_class + template: confusion +``` + +In this case we provided `confusion_matrix` as a plot id. It will be displayed +in the plot as a title, unless we override it with `title` field. In this case +we provided data source in `y` axis definition. Data will be sourced from +`confusion_matrix_data.csv`. As `y` axis we will use `predicted_class` field. On +`x` axis we will have `actual_class` field. Note that DVC will assume that +`actual_class` is inside `confusion_matrix_data.csv`. + +We can provide multiple columns/fields from the same file: + +```yaml +#dvc.yaml +--- +plots: + multiple_series: + y: + logs.csv: [accuracy, loss] + x: epoch +``` + +In this case, we will take `accuracy` and `loss` fields and display them agains +`epoch` column, all coming from `logs.csv` file. + +We can source the data from multiple files too: + +```yaml +#dvc.yaml +--- +plots: + multiple_files: + y: + train_logs.csv: accuracy + test_logs.csv: accuracy + x: epoch +``` + +In this case we will plot `accuracy` field from both `train_logs.csv` and +`test_logs.csv` against the `epoch`. Note that both files have to have `epoch` +field. + +### Available configuration fields + +- `x` - field name from which the X axis data comes from. An auto-generated + _step_ field is used by default. It has to be a string. + +- `y` - field name from which the Y axis data comes from. + - Top-level plots: It can be a string, list or dictionary. If its a string or + list, it is assumed that plot id will be the path to the data source. + String, or list elements will be the names of data columns or fields withing + the source file. If this field is a dictionary, it is assumed that its keys + are paths to data sources. The values have to be either strings or lists, + and are treated as column(s)/field(s) within respective files. + - Plot outputs: It is a field name from which the Y axis data comes from. +- `x_label` - X axis label. The X field name is the default. +- `y_label` - Y axis label. If all provided Y entries have the same field name, + this name will be the default, `y` string otherwise. +- `title` - Plot title. Defaults: + - Top-level plots: `path/to/dvc.yaml::plot_id` + - Plot outputs: Path to the file. + +Refer to the [examples](/doc/command-reference/plots#top-level-plots) for more +syntax insight. + +## Plot templates (data-series only) DVC uses [Vega-Lite](https://vega.github.io/vega-lite/) JSON specifications to create plots from user data. A set of built-in _plot templates_ are included. @@ -187,7 +315,7 @@ important fields that DVC adds to the plot data: Refer to [`templates`](/doc/command-reference/plots/templates) command for more information on how to prepare your own template from pre-defined ones. -## HTML templates +## Custom HTML templates It's possible to supply an HTML file to `dvc plot show` and `dvc plot diff` by using the the `--html-template` option. This allows you to customize the @@ -209,20 +337,20 @@ this feature to render DVC plots without an Internet connection, below. - `-v`, `--verbose` - displays detailed tracing information. -## Example: Tabular data +## Examples -We'll use tabular metrics file `logs.csv` for this example: +### Raw data files + +#### Tabular data + +We'll use tabular data file `logs.csv` for this example: ``` -epoch,accuracy,loss,val_accuracy,val_loss -0,0.9418667,0.19958884770199656,0.9679,0.10217399864746257 -1,0.9763333,0.07896138601688048,0.9768,0.07310650711813942 -2,0.98375,0.05241111190887168,0.9788,0.06665669009438716 -3,0.98801666,0.03681169906261687,0.9781,0.06697812260198989 -4,0.99111664,0.027362171787042946,0.978,0.07385754839298315 -5,0.9932333,0.02069501801203781,0.9771,0.08009233058886166 -6,0.9945,0.017702101902437668,0.9803,0.07830339228538505 -7,0.9954,0.01396906608727198,0.9802,0.07247738889862157 +epoch,loss,accuracy +1,0.19,0.81 +2,0.11,0.89 +3,0.07,0.93 +4,0.04,0.96 ``` Let's plot the last column (default behavior): @@ -244,19 +372,19 @@ file:///Users/usr/src/dvc_plots/index.html ![](/img/plots_diff.svg) -Visualize a specific field: +Visualize a specific field (`loss`) as y. Use `epoch` as x: ```dvc -$ dvc plots show -y loss logs.csv +$ dvc plots show logs.csv -y loss -x epoch file:///Users/usr/src/dvc_plots/index.html ``` ![](/img/plots_show_field.svg) -## Example: Smooth plot +#### Smooth plot In some cases we would like to smooth our plot. In this example we will use a -plot with 1000 data points: +noisy plot with 100 data points: ```dvc $ dvc plots show data.csv @@ -274,7 +402,7 @@ file:///Users/usr/src/dvc_plots/index.html ![](/img/plots_show_smooth.svg) -## Example: Confusion matrix +#### Confusion matrix We'll use `classes.csv` for this example: @@ -302,9 +430,171 @@ file:///Users/usr/src/dvc_plots/index.html ![](/img/plots_show_confusion.svg) > A confusion matrix [template](/doc/command-reference/plots#plot-templates) is -> predefined in DVC (found in `.dvc/plots/confusion.json`). +> predefined in DVC. + +We can use `confusion_normalized` template to normalize the results: + +```dvc +$ dvc plots show classes.csv --template confusion_normalized -x actual -y predicted +file:///Users/usr/src/dvc_plots/index.html +``` + +![](/img/plots_show_confusion_normalized.svg) + +### Top-level plots + +#### Simple plot definition + +Let's get back to the `logs.csv` data: + +``` +# logs.csv +epoch,loss,accuracy +1,0.19,0.81 +2,0.11,0.89 +3,0.07,0.93 +4,0.04,0.96 +``` + +Minimal plot configuration we can put in `dvc.yaml` is simply data source path +relative to `dvc.yaml` file: + +```yaml +# dvc.yaml +stages: + train: + cmd: echo "Training the model..." + +plots: + logs.csv: +``` + +```dvc +$ dvc plots show +file:///Users/usr/src/dvc_plots/index.html +``` + +![](/img/plots_show_spec_default.svg) + +We can customize it: + +```yaml +# dvc.yaml +stages: + train: + cmd: echo "Training the model..." + +plots: + logs.csv: + x: epoch + y: accuracy + title: Displaying accuracy + x_label: This is epoch + y_label: This is accuracy +``` + +```dvc +$ dvc plots show +file:///Users/usr/src/dvc_plots/index.html +``` + +![](/img/plots_show_spec_simple_custom.svg) + +#### Multiple data series plot + +Data in `training_data.csv`: + +```csv +epoch,train_loss,test_loss +1,0.33,0.4 +2,0.3,0.28 +3,0.2,0.25 +4,0.1,0.23 +``` + +```yaml +# dvc.yaml +stages: + train: + cmd: echo "Training the model..." + +plots: + test_vs_train_loss: + x: epoch + y: + training_data.csv: [test_loss, train_loss] + title: Compare loss training versus test +``` + +```dvc +$ dvc plots show +file:///Users/usr/src/dvc_plots/index.html +``` + +![](/img/plots_show_spec_multiple_columns.svg) + +#### Sourcing data from different files + +Lets prepare comparison for confusion matrix data between test set and training +set: + +```csv +# train_classes.csv +actual_class,predicted_class +dog,dog +dog,dog +dog,dog +dog,bird +cat,cat +cat,cat +cat,cat +cat,dog +bird,bird +bird,bird +bird,bird +bird,dog +``` + +```csv +# test_classes.csv +actual_class,predicted_class +dog,dog +dog,dog +dog,cat +bird,bird +bird,bird +bird,cat +cat,cat +cat,cat +cat,bird +``` + +```yaml +# dvc.yaml +stages: + train: + cmd: echo "Training the model..." + +plots: + test_vs_train_confusion: + x: actual_class + y: + train_classes.csv: predicted_class + test_classes.csv: predicted_class + title: Compare test vs train confusion matrix + template: confusion + x_label: Actual class + y_label: Predicted class +``` + +```dvc +$ dvc plots show +file:///Users/usr/src/dvc_plots/index.html +``` + +![](/img/plots_show_spec_conf_train_test.svg) -## Example: Offline HTML Template +### Offline HTML Template The plots generated by `dvc plots` uses Vega-Lite JavaScript libraries, and by default these load [online resources](https://vega.github.io/vega/usage/#embed). diff --git a/content/docs/command-reference/plots/modify.md b/content/docs/command-reference/plots/modify.md index bbfe226431..4b87215cc0 100644 --- a/content/docs/command-reference/plots/modify.md +++ b/content/docs/command-reference/plots/modify.md @@ -1,10 +1,10 @@ # plots modify -Modify display properties of [plot metrics](/doc/command-reference/plots) files. +Modify display properties of data-series [plots](/doc/command-reference/plots) +defined in stages. > ⚠️ Note that this command can modify only data-series plots. It has no effect -> on image-type plots. See -> [Types of metrics](/doc/command-reference/plots#types-of-metrics). +> on image-type plots and top-level plot definitions. ## Synopsis @@ -16,7 +16,8 @@ usage: dvc plots modify [-h] [-q | -v] [-t ] [-x ] target positional arguments: - target Metrics file to set properties to + target Plot file to set properties for + (defined at the stage level) ``` ## Description @@ -24,9 +25,9 @@ positional arguments: It might be not convenient for users or automation systems to specify all the _display properties_ (such as `y-label`, `template`, `title`, etc.) each time plots are generated with `dvc plot show` or `dvc plot diff`. This command sets -(or unsets) default display properties for a specific metrics file. +(or unsets) default display properties for a specific plots file. -The path to the metrics file `target` is required. It must be listed in a +The path to the plots file `target` is required. It must be listed in a `dvc.yaml` file (see the `--plots` option of `dvc stage add`). `dvc plots modify` adds the display properties to `dvc.yaml`. diff --git a/content/docs/command-reference/plots/show.md b/content/docs/command-reference/plots/show.md index 07493d3035..05b7bb3315 100644 --- a/content/docs/command-reference/plots/show.md +++ b/content/docs/command-reference/plots/show.md @@ -1,6 +1,6 @@ # plots show -Generate [plot](/doc/command-reference/plots) from a metrics file. +Generate [plot](/doc/command-reference/plots) from a plots file. ## Synopsis @@ -12,15 +12,15 @@ usage: dvc plots show [-h] [-q | -v] [-t ] [-x ] [targets [targets ...]] positional arguments: - targets Metrics files to visualize. + targets Plot files or plot id's from `dvc.yaml` to visualize. Shows all plots by default. ``` ## Description This command provides a quick way to visualize -[certain metrics](/doc/command-reference/plots#supported-file-formats) such as -loss functions, AUC curves, confusion matrices, etc. +[certain data](/doc/command-reference/plots#supported-file-formats) such as loss +functions, AUC curves, confusion matrices, etc. All plots defined in `dvc.yaml` are used by default, but specific plots files can be specified as `targets` (note that targets don't necessarily have to be @@ -28,11 +28,11 @@ defined in `dvc.yaml`). The plot style can be customized with [plot templates](/doc/command-reference/plots#plot-templates), using the -`--template` option. To learn more about metrics file formats and templates -please see `dvc plots`. +`--template` option. To learn more about plots file formats and templates please +see `dvc plots`. -> Note that the default behavior of this command can be modified per metrics -> file with `dvc plots modify`. +> Note that the default behavior of this command can be modified per plots file +> with `dvc plots modify`. ## Options @@ -49,11 +49,11 @@ please see `dvc plots`. auto-generated `index` field is used by default. See [Custom templates](/doc/command-reference/plots#custom-templates) for more information on this `index` field. Column names or numbers are expected for - tabular metrics files. + tabular plots files. - `-y ` - field name from which the Y axis data comes from. The last field found in the `targets` is used by default. Column names or numbers are - expected for tabular metrics files. + expected for tabular plots files. - `--x-label ` - X axis label. The X field name is the default. @@ -144,15 +144,11 @@ file:///Users/usr/src/dvc_plots/index.html We'll use tabular metrics file `logs.csv` for these examples: ``` -epoch,accuracy,loss,val_accuracy,val_loss -0,0.9418667,0.19958884770199656,0.9679,0.10217399864746257 -1,0.9763333,0.07896138601688048,0.9768,0.07310650711813942 -2,0.98375,0.05241111190887168,0.9788,0.06665669009438716 -3,0.98801666,0.03681169906261687,0.9781,0.06697812260198989 -4,0.99111664,0.027362171787042946,0.978,0.07385754839298315 -5,0.9932333,0.02069501801203781,0.9771,0.08009233058886166 -6,0.9945,0.017702101902437668,0.9803,0.07830339228538505 -7,0.9954,0.01396906608727198,0.9802,0.07247738889862157 +epoch,loss,accuracy +1,0.19,0.81 +2,0.11,0.89 +3,0.07,0.93 +4,0.04,0.96 ```
@@ -162,15 +158,11 @@ epoch,accuracy,loss,val_accuracy,val_loss Here's a corresponding `train.tsv` metrics file: ``` -epoch accuracy loss val_accuracy val_loss -0 0.9418667 0.19958884770199656 0.9679 0.10217399864746257 -1 0.9763333 0.07896138601688048 0.9768 0.07310650711813942 -2 0.98375 0.05241111190887168 0.9788 0.06665669009438716 -3 0.988016 0.03681169906261687 0.9781 0.06697812260198989 -4 0.991116 0.027362171787042946 0.978 0.07385754839298315 -5 0.9932333 0.02069501801203781 0.9771 0.08009233058886166 -6 0.9945 0.017702101902437668 0.9803 0.07830339228538505 -7 0.9954 0.01396906608727198 0.9802 0.07247738889862157 +epoch loss accuracy +1 0.19 0.81 +2 0.11 0.89 +3 0.07 0.93 +4 0.04 0.96 ```
diff --git a/content/docs/user-guide/project-structure/dvcyaml-files.md b/content/docs/user-guide/project-structure/dvcyaml-files.md index eeaeb26591..c22b916a63 100644 --- a/content/docs/user-guide/project-structure/dvcyaml-files.md +++ b/content/docs/user-guide/project-structure/dvcyaml-files.md @@ -464,6 +464,34 @@ validation and auto-completion. ⚠️ Note that using the `checkpoint` field in `dvc.yaml` is not compatible with `dvc repro`. +## Top-level plot definitions + +The list of plots contains one or more user-defined +[plots](/doc/command-reference/plots#standalone-plots). Here's an example that +tells DVC that `auc.json` is viable for visualization: + +```yaml +stages: + build: + cmd: python train.py + deps: + - features.csv + outs: + - model.pt + - auc.json + metrics: + - accuracy.txt: + cache: false +plots: + auc.json: + x: fpr + y: tpr +``` + +Note that we didn't have to specify `auc.json` as a plot in the stage. In fact, +top-level `plots` can use any file in the project. [top-level +`plots`]: /doc/command-reference/plots#top-level-plot-definitions + ## dvc.lock file > ⚠️ Avoid editing these files. DVC will create and update them for you. diff --git a/static/img/plots_diff.svg b/static/img/plots_diff.svg index 229ff28129..9e0b88e002 100644 --- a/static/img/plots_diff.svg +++ b/static/img/plots_diff.svg @@ -1 +1 @@ -01234567index0.000.020.040.060.080.10val_loss0135527HEADrev \ No newline at end of file +0.00.51.01.52.02.53.0step0.760.780.800.820.840.860.880.900.920.940.96accuracyHEAD^workspacerevlogs.csv \ No newline at end of file diff --git a/static/img/plots_diff_two_revs.svg b/static/img/plots_diff_two_revs.svg new file mode 100644 index 0000000000..2d8ff1f58c --- /dev/null +++ b/static/img/plots_diff_two_revs.svg @@ -0,0 +1 @@ +0.00.51.01.52.02.53.0step0.740.760.780.800.820.840.860.880.900.920.940.96accuracy0135527HEAD^revlogs.csv \ No newline at end of file diff --git a/static/img/plots_show.svg b/static/img/plots_show.svg index 2e49efef9c..533b9486a5 100644 --- a/static/img/plots_show.svg +++ b/static/img/plots_show.svg @@ -1 +1 @@ -01234567index0.070.080.090.10val_lossworkspacerev \ No newline at end of file +0.00.51.01.52.02.53.0step0.800.820.840.860.880.900.920.940.96accuracyworkspacerevlogs.csv \ No newline at end of file diff --git a/static/img/plots_show_confusion.svg b/static/img/plots_show_confusion.svg index 7e843281cf..d72f5762ea 100644 --- a/static/img/plots_show_confusion.svg +++ b/static/img/plots_show_confusion.svg @@ -1 +1 @@ -revbirdcatdinosaurdogturtleactualbirdcatdinosaurdogturtlepredicted362236121362113621136211000000workspace010203040 \ No newline at end of file +revbirdcatdinosaurdogturtlepredictedbirdcatdinosaurdogturtleactual264532265342262723265135265720workspace0510152025classes.csv \ No newline at end of file diff --git a/static/img/plots_show_confusion_normalized.svg b/static/img/plots_show_confusion_normalized.svg new file mode 100644 index 0000000000..199a93e852 --- /dev/null +++ b/static/img/plots_show_confusion_normalized.svg @@ -0,0 +1 @@ +revbirdcatdinosaurdogturtlepredictedbirdcatdinosaurdogturtleactual0.670.090.120.090.050.670.110.090.100.050.790.050.150.050.080.620.110.030.080.130.550.130.170.050.00workspace0.00.20.40.60.81.0classes.csv \ No newline at end of file diff --git a/static/img/plots_show_field.svg b/static/img/plots_show_field.svg index ddaabf710b..79f17e9f2d 100644 --- a/static/img/plots_show_field.svg +++ b/static/img/plots_show_field.svg @@ -1 +1 @@ -01234567index0.000.050.100.150.20lossworkspacerev \ No newline at end of file +1.01.52.02.53.03.54.0epoch0.040.060.080.100.120.140.160.180.20lossworkspacerevlogs.csv \ No newline at end of file diff --git a/static/img/plots_show_no_smooth.svg b/static/img/plots_show_no_smooth.svg index bae27f1407..c23eb82a5d 100644 --- a/static/img/plots_show_no_smooth.svg +++ b/static/img/plots_show_no_smooth.svg @@ -1 +1 @@ -02004006008001,000index0.00.51.01.5yworkspacerev \ No newline at end of file +0102030405060708090100step−0.050.000.050.100.150.200.250.300.35lossworkspacerevdata.csv \ No newline at end of file diff --git a/static/img/plots_show_smooth.svg b/static/img/plots_show_smooth.svg index 934fc97de0..4240286f66 100644 --- a/static/img/plots_show_smooth.svg +++ b/static/img/plots_show_smooth.svg @@ -1 +1 @@ -02004006008001,000index0.20.40.60.81.01.2yworkspacerev \ No newline at end of file +0102030405060708090100step0.000.050.100.150.200.25lossworkspacerevdata.csv \ No newline at end of file diff --git a/static/img/plots_show_spec_conf_train_test.svg b/static/img/plots_show_spec_conf_train_test.svg new file mode 100644 index 0000000000..9f2bf7e3a1 --- /dev/null +++ b/static/img/plots_show_spec_conf_train_test.svg @@ -0,0 +1 @@ +revbirdcatdogPredicted classbirdcatdogActual classbirdcatdogActual class212121000test_classes.csv313131000train_classes.csv0123Compare test vs train confusion matrix \ No newline at end of file diff --git a/static/img/plots_show_spec_default.svg b/static/img/plots_show_spec_default.svg new file mode 100644 index 0000000000..5b78e1b365 --- /dev/null +++ b/static/img/plots_show_spec_default.svg @@ -0,0 +1 @@ +0.00.51.01.52.02.53.0step0.800.820.840.860.880.900.920.940.96accuracyworkspacerevdvc.yaml::logs.csv \ No newline at end of file diff --git a/static/img/plots_show_spec_multiple_columns.svg b/static/img/plots_show_spec_multiple_columns.svg new file mode 100644 index 0000000000..24c338ad5d --- /dev/null +++ b/static/img/plots_show_spec_multiple_columns.svg @@ -0,0 +1 @@ +1.01.52.02.53.03.54.0epoch0.100.150.200.250.300.350.40ytest_losstrain_lossrevCompare loss training versus test \ No newline at end of file diff --git a/static/img/plots_show_spec_simple_custom.svg b/static/img/plots_show_spec_simple_custom.svg new file mode 100644 index 0000000000..ed69a15a71 --- /dev/null +++ b/static/img/plots_show_spec_simple_custom.svg @@ -0,0 +1 @@ +1.01.52.02.53.03.54.0This is epoch0.800.820.840.860.880.900.920.940.96This is accuracyworkspacerevDisplaying accuracy \ No newline at end of file From 2283123a23d636a4ea20ca8c320e77fa311a482c Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Wed, 20 Jul 2022 19:55:39 -0500 Subject: [PATCH 02/24] Update content/docs/user-guide/project-structure/dvcyaml-files.md --- content/docs/user-guide/project-structure/dvcyaml-files.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/content/docs/user-guide/project-structure/dvcyaml-files.md b/content/docs/user-guide/project-structure/dvcyaml-files.md index c22b916a63..761dafa3a0 100644 --- a/content/docs/user-guide/project-structure/dvcyaml-files.md +++ b/content/docs/user-guide/project-structure/dvcyaml-files.md @@ -489,8 +489,9 @@ plots: ``` Note that we didn't have to specify `auc.json` as a plot in the stage. In fact, -top-level `plots` can use any file in the project. [top-level -`plots`]: /doc/command-reference/plots#top-level-plot-definitions +[top-level `plots`] can use any file in the project. + +[top-level `plots`]: /doc/command-reference/plots#top-level-plot-definitions ## dvc.lock file From 38aceb29c1c53477cc817154908786ea059dc30e Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Wed, 20 Jul 2022 20:33:20 -0500 Subject: [PATCH 03/24] Apply suggestions from code review --- content/docs/command-reference/plots/index.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/content/docs/command-reference/plots/index.md b/content/docs/command-reference/plots/index.md index d8e871b7f0..00a755198a 100644 --- a/content/docs/command-reference/plots/index.md +++ b/content/docs/command-reference/plots/index.md @@ -28,7 +28,7 @@ DVC provides a set of commands to visualize data produced by machine learning projects. Usual plots include AUC curves, loss functions, or confusion matrices, for example. Plots are a great alternative to `dvc metrics` when working with multi-dimensional performance data. They also help you present and compare -[experiments](/doc/command-reference/exp) effectively. +[[experiments]](https://github.com/iterative/dvc.org/pull/3691#pullrequestreview-1024053543) effectively. DVC can work with two types of plots files: @@ -49,6 +49,9 @@ using `` tags directly. [plots dashboard]: https://github.com/iterative/vscode-dvc/blob/main/extension/resources/walkthrough/plots.md +[experiments]: + /doc/user-guide/experiment-management/experiments-overview + ### Supported file formats Images are included in HTML as-is, without additional processing. From 5bf8c5bfd07ee05694e8655f2e1de85232b207a7 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Wed, 20 Jul 2022 20:34:03 -0500 Subject: [PATCH 04/24] Update content/docs/command-reference/plots/index.md --- content/docs/command-reference/plots/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/docs/command-reference/plots/index.md b/content/docs/command-reference/plots/index.md index 00a755198a..b718fada4e 100644 --- a/content/docs/command-reference/plots/index.md +++ b/content/docs/command-reference/plots/index.md @@ -28,7 +28,7 @@ DVC provides a set of commands to visualize data produced by machine learning projects. Usual plots include AUC curves, loss functions, or confusion matrices, for example. Plots are a great alternative to `dvc metrics` when working with multi-dimensional performance data. They also help you present and compare -[[experiments]](https://github.com/iterative/dvc.org/pull/3691#pullrequestreview-1024053543) effectively. +[experiments] effectively. DVC can work with two types of plots files: From f85a35e967395c211f2ec4b90c22020c76f5fb4f Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Pawe=C5=82=20Redzy=C5=84ski?= Date: Thu, 21 Jul 2022 16:46:40 +0200 Subject: [PATCH 05/24] plots: examples: move to subcommands --- content/docs/command-reference/plots/index.md | 264 +----------------- content/docs/command-reference/plots/show.md | 213 ++++++++++++++ .../project-structure/dvcyaml-files.md | 4 +- static/img/plots_diff.svg | 1 - static/img/plots_show_json.svg | 2 +- static/img/plots_show_json_field.svg | 2 +- 6 files changed, 220 insertions(+), 266 deletions(-) delete mode 100644 static/img/plots_diff.svg diff --git a/content/docs/command-reference/plots/index.md b/content/docs/command-reference/plots/index.md index b718fada4e..809871972e 100644 --- a/content/docs/command-reference/plots/index.md +++ b/content/docs/command-reference/plots/index.md @@ -49,8 +49,7 @@ using `` tags directly. [plots dashboard]: https://github.com/iterative/vscode-dvc/blob/main/extension/resources/walkthrough/plots.md -[experiments]: - /doc/user-guide/experiment-management/experiments-overview +[experiments]: /doc/user-guide/experiment-management/experiments-overview ### Supported file formats @@ -257,7 +256,7 @@ DVC has the following built-in plot templates: - `linear` - basic linear plot including cursor interactivity (default) - `simple` - simplest linear template (not interactive); Good base to create - [custom templates]. + [custom template]. - `scatter` - scatter plot - `smooth` - linear plot with LOESS smoothing, see [example](/doc/command-reference/plots#example-smooth-plot) @@ -340,264 +339,7 @@ this feature to render DVC plots without an Internet connection, below. - `-v`, `--verbose` - displays detailed tracing information. -## Examples - -### Raw data files - -#### Tabular data - -We'll use tabular data file `logs.csv` for this example: - -``` -epoch,loss,accuracy -1,0.19,0.81 -2,0.11,0.89 -3,0.07,0.93 -4,0.04,0.96 -``` - -Let's plot the last column (default behavior): - -```dvc -$ dvc plots show logs.csv -file:///Users/usr/src/dvc_plots/index.html -``` - -![](/img/plots_show.svg) - -Difference in this metric between the current project version and the previous -commit: - -```dvc -$ dvc plots diff HEAD^ --targets logs.csv -file:///Users/usr/src/dvc_plots/index.html -``` - -![](/img/plots_diff.svg) - -Visualize a specific field (`loss`) as y. Use `epoch` as x: - -```dvc -$ dvc plots show logs.csv -y loss -x epoch -file:///Users/usr/src/dvc_plots/index.html -``` - -![](/img/plots_show_field.svg) - -#### Smooth plot - -In some cases we would like to smooth our plot. In this example we will use a -noisy plot with 100 data points: - -```dvc -$ dvc plots show data.csv -file:///Users/usr/src/dvc_plots/index.html -``` - -![](/img/plots_show_no_smooth.svg) - -We can use the `-t` option and `smooth` template to make it less noisy: - -```dvc -$ dvc plots show -t smooth data.csv -file:///Users/usr/src/dvc_plots/index.html -``` - -![](/img/plots_show_smooth.svg) - -#### Confusion matrix - -We'll use `classes.csv` for this example: - -``` -actual,predicted -cat,cat -cat,cat -cat,cat -cat,dog -cat,dinosaur -cat,dinosaur -cat,bird -turtle,dog -turtle,cat -... -``` - -Let's visualize it: - -```dvc -$ dvc plots show classes.csv --template confusion -x actual -y predicted -file:///Users/usr/src/dvc_plots/index.html -``` - -![](/img/plots_show_confusion.svg) - -> A confusion matrix [template](/doc/command-reference/plots#plot-templates) is -> predefined in DVC. - -We can use `confusion_normalized` template to normalize the results: - -```dvc -$ dvc plots show classes.csv --template confusion_normalized -x actual -y predicted -file:///Users/usr/src/dvc_plots/index.html -``` - -![](/img/plots_show_confusion_normalized.svg) - -### Top-level plots - -#### Simple plot definition - -Let's get back to the `logs.csv` data: - -``` -# logs.csv -epoch,loss,accuracy -1,0.19,0.81 -2,0.11,0.89 -3,0.07,0.93 -4,0.04,0.96 -``` - -Minimal plot configuration we can put in `dvc.yaml` is simply data source path -relative to `dvc.yaml` file: - -```yaml -# dvc.yaml -stages: - train: - cmd: echo "Training the model..." - -plots: - logs.csv: -``` - -```dvc -$ dvc plots show -file:///Users/usr/src/dvc_plots/index.html -``` - -![](/img/plots_show_spec_default.svg) - -We can customize it: - -```yaml -# dvc.yaml -stages: - train: - cmd: echo "Training the model..." - -plots: - logs.csv: - x: epoch - y: accuracy - title: Displaying accuracy - x_label: This is epoch - y_label: This is accuracy -``` - -```dvc -$ dvc plots show -file:///Users/usr/src/dvc_plots/index.html -``` - -![](/img/plots_show_spec_simple_custom.svg) - -#### Multiple data series plot - -Data in `training_data.csv`: - -```csv -epoch,train_loss,test_loss -1,0.33,0.4 -2,0.3,0.28 -3,0.2,0.25 -4,0.1,0.23 -``` - -```yaml -# dvc.yaml -stages: - train: - cmd: echo "Training the model..." - -plots: - test_vs_train_loss: - x: epoch - y: - training_data.csv: [test_loss, train_loss] - title: Compare loss training versus test -``` - -```dvc -$ dvc plots show -file:///Users/usr/src/dvc_plots/index.html -``` - -![](/img/plots_show_spec_multiple_columns.svg) - -#### Sourcing data from different files - -Lets prepare comparison for confusion matrix data between test set and training -set: - -```csv -# train_classes.csv -actual_class,predicted_class -dog,dog -dog,dog -dog,dog -dog,bird -cat,cat -cat,cat -cat,cat -cat,dog -bird,bird -bird,bird -bird,bird -bird,dog -``` - -```csv -# test_classes.csv -actual_class,predicted_class -dog,dog -dog,dog -dog,cat -bird,bird -bird,bird -bird,cat -cat,cat -cat,cat -cat,bird -``` - -```yaml -# dvc.yaml -stages: - train: - cmd: echo "Training the model..." - -plots: - test_vs_train_confusion: - x: actual_class - y: - train_classes.csv: predicted_class - test_classes.csv: predicted_class - title: Compare test vs train confusion matrix - template: confusion - x_label: Actual class - y_label: Predicted class -``` - -```dvc -$ dvc plots show -file:///Users/usr/src/dvc_plots/index.html -``` - -![](/img/plots_show_spec_conf_train_test.svg) - -### Offline HTML Template +## Example: Offline HTML Template The plots generated by `dvc plots` uses Vega-Lite JavaScript libraries, and by default these load [online resources](https://vega.github.io/vega/usage/#embed). diff --git a/content/docs/command-reference/plots/show.md b/content/docs/command-reference/plots/show.md index 05b7bb3315..5d91220664 100644 --- a/content/docs/command-reference/plots/show.md +++ b/content/docs/command-reference/plots/show.md @@ -195,6 +195,219 @@ $ dvc plots show --no-header logs.csv -y 2 file:///Users/usr/src/dvc_plots/index.html ``` +## Example: Smooth plot + +In some cases we would like to smooth our plot. In this example we will use a +noisy plot with 100 data points: + +```dvc +$ dvc plots show data.csv +file:///Users/usr/src/dvc_plots/index.html +``` + +![](/img/plots_show_no_smooth.svg) + +We can use the `-t` option and `smooth` template to make it less noisy: + +```dvc +$ dvc plots show -t smooth data.csv +file:///Users/usr/src/dvc_plots/index.html +``` + +![](/img/plots_show_smooth.svg) + +## Example: Confusion matrix + +We'll use `classes.csv` for this example: + +``` +actual,predicted +cat,cat +cat,cat +cat,cat +cat,dog +cat,dinosaur +cat,dinosaur +cat,bird +turtle,dog +turtle,cat +... +``` + +Let's visualize it: + +```dvc +$ dvc plots show classes.csv --template confusion -x actual -y predicted +file:///Users/usr/src/dvc_plots/index.html +``` + +![](/img/plots_show_confusion.svg) + +> A confusion matrix [template](/doc/command-reference/plots#plot-templates) is +> predefined in DVC. + +We can use `confusion_normalized` template to normalize the results: + +```dvc +$ dvc plots show classes.csv --template confusion_normalized -x actual -y predicted +file:///Users/usr/src/dvc_plots/index.html +``` + +![](/img/plots_show_confusion_normalized.svg) + +## Example: Top-level plots + +### Simple plot definition + +Let's use the `logs.csv` data: + +``` +# logs.csv +epoch,loss,accuracy +1,0.19,0.81 +2,0.11,0.89 +3,0.07,0.93 +4,0.04,0.96 +``` + +Minimal plot configuration we can put in `dvc.yaml` is simply data source path +relative to `dvc.yaml` file: + +```yaml +# dvc.yaml +stages: + train: + cmd: echo "Training the model..." + +plots: + logs.csv: +``` + +```dvc +$ dvc plots show +file:///Users/usr/src/dvc_plots/index.html +``` + +![](/img/plots_show_spec_default.svg) + +We can customize it: + +```yaml +# dvc.yaml +stages: + train: + cmd: echo "Training the model..." + +plots: + logs.csv: + x: epoch + y: accuracy + title: Displaying accuracy + x_label: This is epoch + y_label: This is accuracy +``` + +```dvc +$ dvc plots show +file:///Users/usr/src/dvc_plots/index.html +``` + +![](/img/plots_show_spec_simple_custom.svg) + +### Multiple data series plot + +Data in `training_data.csv`: + +```csv +epoch,train_loss,test_loss +1,0.33,0.4 +2,0.3,0.28 +3,0.2,0.25 +4,0.1,0.23 +``` + +```yaml +# dvc.yaml +stages: + train: + cmd: echo "Training the model..." + +plots: + test_vs_train_loss: + x: epoch + y: + training_data.csv: [test_loss, train_loss] + title: Compare loss training versus test +``` + +```dvc +$ dvc plots show +file:///Users/usr/src/dvc_plots/index.html +``` + +![](/img/plots_show_spec_multiple_columns.svg) + +### Sourcing data from different files + +Lets prepare comparison for confusion matrix data between test set and training +set: + +```csv +# train_classes.csv +actual_class,predicted_class +dog,dog +dog,dog +dog,dog +dog,bird +cat,cat +cat,cat +cat,cat +cat,dog +bird,bird +bird,bird +bird,bird +bird,dog +``` + +```csv +# test_classes.csv +actual_class,predicted_class +dog,dog +dog,dog +dog,cat +bird,bird +bird,bird +bird,cat +cat,cat +cat,cat +cat,bird +``` + +```yaml +# dvc.yaml +stages: + train: + cmd: echo "Training the model..." + +plots: + test_vs_train_confusion: + x: actual_class + y: + train_classes.csv: predicted_class + test_classes.csv: predicted_class + title: Compare test vs train confusion matrix + template: confusion + x_label: Actual class + y_label: Predicted class +``` + +```dvc +$ dvc plots show +file:///Users/usr/src/dvc_plots/index.html +``` + +![](/img/plots_show_spec_conf_train_test.svg) + ## Example: Vega-Lite specification file In many automation scenarios (like diff --git a/content/docs/user-guide/project-structure/dvcyaml-files.md b/content/docs/user-guide/project-structure/dvcyaml-files.md index 761dafa3a0..8f3f5a682b 100644 --- a/content/docs/user-guide/project-structure/dvcyaml-files.md +++ b/content/docs/user-guide/project-structure/dvcyaml-files.md @@ -467,8 +467,8 @@ validation and auto-completion. ## Top-level plot definitions The list of plots contains one or more user-defined -[plots](/doc/command-reference/plots#standalone-plots). Here's an example that -tells DVC that `auc.json` is viable for visualization: +[plots](/doc/command-reference/plots#top-level-plot-definitions). Here's an +example that tells DVC that `auc.json` is viable for visualization: ```yaml stages: diff --git a/static/img/plots_diff.svg b/static/img/plots_diff.svg deleted file mode 100644 index 9e0b88e002..0000000000 --- a/static/img/plots_diff.svg +++ /dev/null @@ -1 +0,0 @@ -0.00.51.01.52.02.53.0step0.760.780.800.820.840.860.880.900.920.940.96accuracyHEAD^workspacerevlogs.csv \ No newline at end of file diff --git a/static/img/plots_show_json.svg b/static/img/plots_show_json.svg index 0329ff8057..8e353e33d8 100644 --- a/static/img/plots_show_json.svg +++ b/static/img/plots_show_json.svg @@ -1 +1 @@ -0123456index0.080.090.100.11lossworkspacerev \ No newline at end of file +0123456step0.0700.0750.0800.0850.0900.0950.1000.1050.110lossworkspacerevtrain.json \ No newline at end of file diff --git a/static/img/plots_show_json_field.svg b/static/img/plots_show_json_field.svg index e2743f3d79..335396ef92 100644 --- a/static/img/plots_show_json_field.svg +++ b/static/img/plots_show_json_field.svg @@ -1 +1 @@ -0123456index0.880.900.920.940.960.98accuracyworkspacerev \ No newline at end of file +0123456step0.870.880.890.900.910.920.930.940.950.960.970.98accuracyworkspacerevtrain.json \ No newline at end of file From 716bd8be106c7d954a8bac075362b774a1e34471 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Pawe=C5=82=20Redzy=C5=84ski?= Date: Thu, 21 Jul 2022 17:20:48 +0200 Subject: [PATCH 06/24] plots: refactor top-level plots definition --- content/docs/command-reference/plots/index.md | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/content/docs/command-reference/plots/index.md b/content/docs/command-reference/plots/index.md index 809871972e..622dbf769f 100644 --- a/content/docs/command-reference/plots/index.md +++ b/content/docs/command-reference/plots/index.md @@ -108,10 +108,14 @@ under `plots` key. ### Top-level plot definitions -Plots can be defined in `dvc.yaml` under the `plots` key. Unlike -[stage plots](#stage-plots), these are especially useful when users want to -compare data from different data sources residing on the same version of the -project. For example, comparing training versus test results on current branch. +Plots can be defined in `dvc.yaml` under the `plots` key. Unlike [stage plots], +these are especially useful when users want to compare data from different data +sources residing on the same version of the project. For example, comparing +training versus test results on current branch. They also allow to visualize +data from files that are not [stage plots]. They are an abstraction separating +visualization from particular stage output. + +[stage plots]: #stage-plots ### Stage plots From 1f3ae00046775c5527361904c307ea9cc8cc7562 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Pawe=C5=82=20Redzy=C5=84ski?= Date: Fri, 22 Jul 2022 16:23:10 +0200 Subject: [PATCH 07/24] plots: review refactor --- content/docs/command-reference/metrics/index.md | 10 ---------- content/docs/command-reference/plots/index.md | 7 ++++--- content/docs/command-reference/plots/show.md | 13 +++++++++---- content/docs/command-reference/plots/templates.md | 2 +- .../user-guide/project-structure/dvcyaml-files.md | 2 +- 5 files changed, 15 insertions(+), 19 deletions(-) diff --git a/content/docs/command-reference/metrics/index.md b/content/docs/command-reference/metrics/index.md index 25b9170426..e10c26eb7e 100644 --- a/content/docs/command-reference/metrics/index.md +++ b/content/docs/command-reference/metrics/index.md @@ -15,16 +15,6 @@ positional arguments: diff Show changes in metrics between commits. ``` -## Types of metrics - -DVC has two concepts for metrics, that represent different results of machine -learning training or data processing: - -1. `dvc metrics` represent **scalar numbers** such as AUC, _true positive rate_, - etc. -2. `dvc plots` can be used to visualize **data series** such as AUC curves, loss - functions, confusion matrices, etc. - ## Description In order to follow the performance of machine learning experiments, DVC has the diff --git a/content/docs/command-reference/plots/index.md b/content/docs/command-reference/plots/index.md index 622dbf769f..d57da7b248 100644 --- a/content/docs/command-reference/plots/index.md +++ b/content/docs/command-reference/plots/index.md @@ -98,7 +98,7 @@ names in the `train` array below: } ``` -## Configuring a plot +## Defining a plot In order to create visualizations, users need to provide the data and (optionally) configuration that will help customize the plot. DVC provides two @@ -236,8 +236,9 @@ field. - Top-level plots: `path/to/dvc.yaml::plot_id` - Plot outputs: Path to the file. -Refer to the [examples](/doc/command-reference/plots#top-level-plots) for more -syntax insight. +Refer to the [`show` command] documentation for examples. + +[`show` command]: /doc/command-reference/plots/show#example-top-level-plots ## Plot templates (data-series only) diff --git a/content/docs/command-reference/plots/show.md b/content/docs/command-reference/plots/show.md index 5d91220664..4df6c54454 100644 --- a/content/docs/command-reference/plots/show.md +++ b/content/docs/command-reference/plots/show.md @@ -1,6 +1,9 @@ # plots show -Generate [plot](/doc/command-reference/plots) from a plots file. +Generate [plot](/doc/command-reference/plots) from a plots file or `plots` +[top-level definition] from `dvc.yaml`. + +[top-level definition]: /doc/command-reference/plots#top-level-plot-definitions ## Synopsis @@ -22,9 +25,11 @@ This command provides a quick way to visualize [certain data](/doc/command-reference/plots#supported-file-formats) such as loss functions, AUC curves, confusion matrices, etc. -All plots defined in `dvc.yaml` are used by default, but specific plots files -can be specified as `targets` (note that targets don't necessarily have to be -defined in `dvc.yaml`). +All plots defined in `dvc.yaml` are used by default, but specific plots files or +[top-level plots] id's can be specified as `targets` (note that target files +don't necessarily have to be defined in `dvc.yaml`). + +[top-level plots]: /doc/command-reference/plots#top-level-plot-definitions The plot style can be customized with [plot templates](/doc/command-reference/plots#plot-templates), using the diff --git a/content/docs/command-reference/plots/templates.md b/content/docs/command-reference/plots/templates.md index 65f3731fad..88ae897864 100644 --- a/content/docs/command-reference/plots/templates.md +++ b/content/docs/command-reference/plots/templates.md @@ -33,7 +33,7 @@ Note that templates can only be used with [data-series plots]. [plot templates]: https://dvc.org/doc/command-reference/plots#plot-templates-data-series-only [vega-lite specification]: https://vega.github.io/vega-lite/ -[data-series plots]: /doc/command-reference/plots#types-of-metrics +[data-series plots]: /doc/command-reference/plots#description ## Options diff --git a/content/docs/user-guide/project-structure/dvcyaml-files.md b/content/docs/user-guide/project-structure/dvcyaml-files.md index 8f3f5a682b..5917ee7419 100644 --- a/content/docs/user-guide/project-structure/dvcyaml-files.md +++ b/content/docs/user-guide/project-structure/dvcyaml-files.md @@ -426,7 +426,7 @@ These are the fields that are accepted in each stage: | `cmd` | (Required) One or more commands executed by the stage (may contain either a single value or a list). Commands are executed sequentially until all are finished or until one of them fails (see `dvc repro`). | | `wdir` | Working directory for the stage command to run in (relative to the file's location). Any paths in other fields are also based on this. It defaults to `.` (the file's location). | | `deps` | List of dependency paths of this stage (relative to `wdir`). | -| `outs` | List of output paths of this stage (relative to `wdir`). These can contain certain optional [subfields](#output-subfields). | +| `outs` | List of output paths of this stage (relative to `wdir`). These can contain certain optional [subfields](#output-subfields). Any output is viable data source for [top-level plots](/doc/command-reference/plots#top-level-plot-definitions). | | `params` | List of parameter dependency keys (field names) to track from `params.yaml` (in `wdir`). The list may also contain other parameters file names, with a sub-list of the param names to track in them. | | `metrics` | List of [metrics files](/doc/command-reference/metrics), and optionally, whether or not this metrics file is cached (`true` by default). See the `--metrics-no-cache` (`-M`) option of `dvc run`. | | `plots` | List of [plot metrics](/doc/command-reference/plots), and optionally, their default configuration (subfields matching the options of `dvc plots modify`), and whether or not this plots file is cached ( `true` by default). See the `--plots-no-cache` option of `dvc run`. | From 984102fb24b2a1fa1c66d5aa9d2ebcab53131fd1 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Tue, 26 Jul 2022 14:56:56 -0500 Subject: [PATCH 08/24] Update content/docs/command-reference/plots/index.md --- content/docs/command-reference/plots/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/docs/command-reference/plots/index.md b/content/docs/command-reference/plots/index.md index d57da7b248..84f0d22ab3 100644 --- a/content/docs/command-reference/plots/index.md +++ b/content/docs/command-reference/plots/index.md @@ -261,7 +261,7 @@ DVC has the following built-in plot templates: - `linear` - basic linear plot including cursor interactivity (default) - `simple` - simplest linear template (not interactive); Good base to create - [custom template]. + [custom templates]. - `scatter` - scatter plot - `smooth` - linear plot with LOESS smoothing, see [example](/doc/command-reference/plots#example-smooth-plot) From eef0e2f9784a08625bd68f923273faeb06e78ef8 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Perez Date: Tue, 26 Jul 2022 14:02:49 -0600 Subject: [PATCH 09/24] ref: fix a link (2/2) per https://github.com/iterative/dvc.org/pull/3691#pullrequestreview-1051540087 --- content/docs/command-reference/plots/index.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/content/docs/command-reference/plots/index.md b/content/docs/command-reference/plots/index.md index 84f0d22ab3..3b5e41a036 100644 --- a/content/docs/command-reference/plots/index.md +++ b/content/docs/command-reference/plots/index.md @@ -48,7 +48,6 @@ using `` tags directly. https://marketplace.visualstudio.com/items?itemName=Iterative.dvc [plots dashboard]: https://github.com/iterative/vscode-dvc/blob/main/extension/resources/walkthrough/plots.md - [experiments]: /doc/user-guide/experiment-management/experiments-overview ### Supported file formats @@ -268,7 +267,7 @@ DVC has the following built-in plot templates: - `confusion` - confusion matrix, see [example](/doc/command-reference/plots#example-confusion-matrix) -[custom template]: https://dvc.org/doc/command-reference/plots/templates +[custom templates]: https://dvc.org/doc/command-reference/plots/templates - `confusion_normalized` - confusion matrix with values normalized to <0, 1> range From 4a25f0c12ad4db3259d4aa12ed65b19198bfee47 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Perez Date: Tue, 26 Jul 2022 14:27:01 -0600 Subject: [PATCH 10/24] ref: remove concept of type of metrics --- content/docs/command-reference/metrics/index.md | 6 +++--- content/docs/command-reference/plots/modify.md | 5 ++++- 2 files changed, 7 insertions(+), 4 deletions(-) diff --git a/content/docs/command-reference/metrics/index.md b/content/docs/command-reference/metrics/index.md index e10c26eb7e..0ffa120024 100644 --- a/content/docs/command-reference/metrics/index.md +++ b/content/docs/command-reference/metrics/index.md @@ -22,9 +22,9 @@ ability to mark a certain stage outputs as metrics. These metrics are project-specific floating-point or integer values e.g. AUC, ROC, false positives, etc. -This type of metrics files are typically generated by user data processing code, -and are tracked using the `-m` (`--metrics`) and `-M` (`--metrics-no-cache`) -options of `dvc stage add`. +Metrics files are typically generated by user data processing code, and are +tracked using the `-m` (`--metrics`) and `-M` (`--metrics-no-cache`) options of +`dvc stage add`. In contrast to `dvc plots`, these metrics should be stored in hierarchical files. Unlike its `dvc plots` counterpart, `dvc metrics diff` can report the diff --git a/content/docs/command-reference/plots/modify.md b/content/docs/command-reference/plots/modify.md index 4b87215cc0..185d13bb55 100644 --- a/content/docs/command-reference/plots/modify.md +++ b/content/docs/command-reference/plots/modify.md @@ -4,7 +4,10 @@ Modify display properties of data-series [plots](/doc/command-reference/plots) defined in stages. > ⚠️ Note that this command can modify only data-series plots. It has no effect -> on image-type plots and top-level plot definitions. +> on image-type plots or any [top-level plot definitions]. + +[top-level plot definitions]: + /doc/command-reference/plots#top-level-plot-definitions ## Synopsis From 88e591405a29c61273686cda96d928878511ca13 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Perez Date: Tue, 26 Jul 2022 14:37:15 -0600 Subject: [PATCH 11/24] ref: term "plots files" (consistency) --- content/docs/command-reference/plots/modify.md | 2 +- content/docs/command-reference/plots/show.md | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/content/docs/command-reference/plots/modify.md b/content/docs/command-reference/plots/modify.md index 185d13bb55..14006bfa70 100644 --- a/content/docs/command-reference/plots/modify.md +++ b/content/docs/command-reference/plots/modify.md @@ -19,7 +19,7 @@ usage: dvc plots modify [-h] [-q | -v] [-t ] [-x ] target positional arguments: - target Plot file to set properties for + target Plots file to set properties for (defined at the stage level) ``` diff --git a/content/docs/command-reference/plots/show.md b/content/docs/command-reference/plots/show.md index 4df6c54454..a71dd68af2 100644 --- a/content/docs/command-reference/plots/show.md +++ b/content/docs/command-reference/plots/show.md @@ -15,8 +15,8 @@ usage: dvc plots show [-h] [-q | -v] [-t ] [-x ] [targets [targets ...]] positional arguments: - targets Plot files or plot id's from `dvc.yaml` to visualize. - Shows all plots by default. + targets Plots files or plot id's from `dvc.yaml` to + visualize. Shows all plots by default. ``` ## Description From 725b0b94bd4fcc07878992a8913473efc67f97b9 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Perez Date: Tue, 26 Jul 2022 14:39:53 -0600 Subject: [PATCH 12/24] ref: wrap `plots index` usage block --- content/docs/command-reference/plots/index.md | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/content/docs/command-reference/plots/index.md b/content/docs/command-reference/plots/index.md index 3b5e41a036..dec2bd1912 100644 --- a/content/docs/command-reference/plots/index.md +++ b/content/docs/command-reference/plots/index.md @@ -13,13 +13,14 @@ usage: dvc plots [-h] [-q | -v] {show,diff,modify,templates} ... positional arguments: COMMAND - show Generate plots from target files or from `plots` + show Generate plots from target files or from `plots` definitions in `dvc.yaml`. - diff Show multiple versions of a plot by overlaying them + diff Show multiple versions of a plot by overlaying them in a single image. - modify Modify display properties of data-series plots + modify Modify display properties of data-series plots defined in stages (has no effect on image plots). - templates Write built-in plots templates to a directory (.dvc/plots by default). + templates Write built-in plots templates to a directory + (.dvc/plots by default). ``` ## Description From 5ab3dbcfebac921e82e1b0a5a3da0c28f6d81086 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Perez Date: Wed, 27 Jul 2022 04:23:41 -0600 Subject: [PATCH 13/24] plots: top-level plots edits --- content/docs/command-reference/plots/index.md | 43 +++++++++---------- .../docs/command-reference/plots/modify.md | 5 +-- content/docs/command-reference/plots/show.md | 8 ++-- .../project-structure/dvcyaml-files.md | 15 ++++--- 4 files changed, 35 insertions(+), 36 deletions(-) diff --git a/content/docs/command-reference/plots/index.md b/content/docs/command-reference/plots/index.md index dec2bd1912..c043963304 100644 --- a/content/docs/command-reference/plots/index.md +++ b/content/docs/command-reference/plots/index.md @@ -98,24 +98,12 @@ names in the `train` array below: } ``` -## Defining a plot +## Defining plots In order to create visualizations, users need to provide the data and (optionally) configuration that will help customize the plot. DVC provides two -ways to configure visualizations. Users can mark specific stage -outputs as plot or define plot configuration inside `dvc.yaml` -under `plots` key. - -### Top-level plot definitions - -Plots can be defined in `dvc.yaml` under the `plots` key. Unlike [stage plots], -these are especially useful when users want to compare data from different data -sources residing on the same version of the project. For example, comparing -training versus test results on current branch. They also allow to visualize -data from files that are not [stage plots]. They are an abstraction separating -visualization from particular stage output. - -[stage plots]: #stage-plots +ways to configure visualizations. Users can mark specific stage +outputs as plots or define top-level `plots` in `dvc.yaml`. ### Stage plots @@ -124,16 +112,27 @@ outputs can be marked with `--plots/--plots-no-cache`. This will tell DVC that they are intended for visualizations. Upon running `dvc plots show/diff` DVC will collect stage plots alongside the -top-level plot definitions and display them conforming to their configuration. +[top-level plot definitions] and display them conforming to their configuration. Note, that if there are stage plots in the project and they are also used in some top-level definitions, DVC will create separate rendering for the stage plots and all definitions using them. This special type of outputs might come in hand if users want to visually compare experiments results with other experiments versions and not bother with -writing top-level plots definitions into the `dvc.yaml`. +writing top-level plot definitions into the `dvc.yaml`. -### Syntax (top-level plot definitions only) +[top-level plot definitions]: #top-level-plots + +### Top-level plots + +Plots can be defined in `dvc.yaml` under the `plots` key. Unlike [stage plots], +these are especially useful when users want to compare data from different data +sources residing on the same version of the project, for example comparing +training versus test results on current branch. They also allow to visualize +data from files that are not [stage plots]. These are an abstraction separating +visualization from particular stage output. Let's look at their syntax next. + +[stage plots]: #stage-plots In order to define the plot users need to provide data and an optional configuration for the plot. The plots should be defined in `dvc.yaml` file under @@ -146,10 +145,10 @@ stages: ... plots: ... ``` -Every plots has to have its own id. Configuration, if provided, should be a +Every plots has to have its own ID. Configuration, if provided, should be a dictionary. -In simplest use case, user can provide file path as the plot id and don't +In simplest use case, user can provide file path as the plot ID and don't provide configuration at all: ```yaml @@ -177,7 +176,7 @@ plots: template: confusion ``` -In this case we provided `confusion_matrix` as a plot id. It will be displayed +In this case we provided `confusion_matrix` as a plot ID. It will be displayed in the plot as a title, unless we override it with `title` field. In this case we provided data source in `y` axis definition. Data will be sourced from `confusion_matrix_data.csv`. As `y` axis we will use `predicted_class` field. On @@ -223,7 +222,7 @@ field. - `y` - field name from which the Y axis data comes from. - Top-level plots: It can be a string, list or dictionary. If its a string or - list, it is assumed that plot id will be the path to the data source. + list, it is assumed that plot ID will be the path to the data source. String, or list elements will be the names of data columns or fields withing the source file. If this field is a dictionary, it is assumed that its keys are paths to data sources. The values have to be either strings or lists, diff --git a/content/docs/command-reference/plots/modify.md b/content/docs/command-reference/plots/modify.md index 14006bfa70..718a6098ea 100644 --- a/content/docs/command-reference/plots/modify.md +++ b/content/docs/command-reference/plots/modify.md @@ -4,10 +4,9 @@ Modify display properties of data-series [plots](/doc/command-reference/plots) defined in stages. > ⚠️ Note that this command can modify only data-series plots. It has no effect -> on image-type plots or any [top-level plot definitions]. +> on image-type plots or any [top-level plot] definitions. -[top-level plot definitions]: - /doc/command-reference/plots#top-level-plot-definitions +[top-level plot]: /doc/command-reference/plots#top-level-plots ## Synopsis diff --git a/content/docs/command-reference/plots/show.md b/content/docs/command-reference/plots/show.md index a71dd68af2..8e86751b81 100644 --- a/content/docs/command-reference/plots/show.md +++ b/content/docs/command-reference/plots/show.md @@ -3,7 +3,7 @@ Generate [plot](/doc/command-reference/plots) from a plots file or `plots` [top-level definition] from `dvc.yaml`. -[top-level definition]: /doc/command-reference/plots#top-level-plot-definitions +[top-level definition]: /doc/command-reference/plots#top-level-plots ## Synopsis @@ -26,10 +26,10 @@ This command provides a quick way to visualize functions, AUC curves, confusion matrices, etc. All plots defined in `dvc.yaml` are used by default, but specific plots files or -[top-level plots] id's can be specified as `targets` (note that target files -don't necessarily have to be defined in `dvc.yaml`). +[top-level plot] IDs can be specified as `targets` (note that target files don't +necessarily have to be defined in `dvc.yaml`). -[top-level plots]: /doc/command-reference/plots#top-level-plot-definitions +[top-level plot]: /doc/command-reference/plots#top-level-plots The plot style can be customized with [plot templates](/doc/command-reference/plots#plot-templates), using the diff --git a/content/docs/user-guide/project-structure/dvcyaml-files.md b/content/docs/user-guide/project-structure/dvcyaml-files.md index 5917ee7419..77aefb46f0 100644 --- a/content/docs/user-guide/project-structure/dvcyaml-files.md +++ b/content/docs/user-guide/project-structure/dvcyaml-files.md @@ -426,7 +426,7 @@ These are the fields that are accepted in each stage: | `cmd` | (Required) One or more commands executed by the stage (may contain either a single value or a list). Commands are executed sequentially until all are finished or until one of them fails (see `dvc repro`). | | `wdir` | Working directory for the stage command to run in (relative to the file's location). Any paths in other fields are also based on this. It defaults to `.` (the file's location). | | `deps` | List of dependency paths of this stage (relative to `wdir`). | -| `outs` | List of output paths of this stage (relative to `wdir`). These can contain certain optional [subfields](#output-subfields). Any output is viable data source for [top-level plots](/doc/command-reference/plots#top-level-plot-definitions). | +| `outs` | List of stage output paths (relative to `wdir`). These can contain optional [subfields](#output-subfields). Some output may be viable data sources for [top-level plots]. | | `params` | List of parameter dependency keys (field names) to track from `params.yaml` (in `wdir`). The list may also contain other parameters file names, with a sub-list of the param names to track in them. | | `metrics` | List of [metrics files](/doc/command-reference/metrics), and optionally, whether or not this metrics file is cached (`true` by default). See the `--metrics-no-cache` (`-M`) option of `dvc run`. | | `plots` | List of [plot metrics](/doc/command-reference/plots), and optionally, their default configuration (subfields matching the options of `dvc plots modify`), and whether or not this plots file is cached ( `true` by default). See the `--plots-no-cache` option of `dvc run`. | @@ -448,6 +448,8 @@ validation and auto-completion. > See also > [How to Merge Conflicts](/doc/user-guide/how-to/merge-conflicts#dvcyaml). +[top-level plots]: /doc/command-reference/plots#top-level-plots + ### Output subfields > These include a subset of the fields in `.dvc` file @@ -466,9 +468,8 @@ validation and auto-completion. ## Top-level plot definitions -The list of plots contains one or more user-defined -[plots](/doc/command-reference/plots#top-level-plot-definitions). Here's an -example that tells DVC that `auc.json` is viable for visualization: +The list of `plots` contains one or more user-defined `dvc plots`. Here's an +example that makes output `auc.json` viable for visualization: ```yaml stages: @@ -488,10 +489,10 @@ plots: y: tpr ``` -Note that we didn't have to specify `auc.json` as a plot in the stage. In fact, -[top-level `plots`] can use any file in the project. +Note that we didn't have to specify `auc.json` as a [plot in the stage]. In +fact, [top-level plots] can use any file in the project. -[top-level `plots`]: /doc/command-reference/plots#top-level-plot-definitions +[plot in the stage]: /doc/command-reference/plots#stage-plots ## dvc.lock file From d73e90921454b6b53e5c49893081ee40cb4ce857 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Perez Date: Wed, 27 Jul 2022 04:38:19 -0600 Subject: [PATCH 14/24] plots: improve motivation for top-level plots per https://github.com/iterative/dvc.org/pull/3691#pullrequestreview-1045840235 --- content/docs/command-reference/plots/index.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/content/docs/command-reference/plots/index.md b/content/docs/command-reference/plots/index.md index c043963304..976db5c237 100644 --- a/content/docs/command-reference/plots/index.md +++ b/content/docs/command-reference/plots/index.md @@ -125,12 +125,12 @@ writing top-level plot definitions into the `dvc.yaml`. ### Top-level plots -Plots can be defined in `dvc.yaml` under the `plots` key. Unlike [stage plots], -these are especially useful when users want to compare data from different data -sources residing on the same version of the project, for example comparing -training versus test results on current branch. They also allow to visualize -data from files that are not [stage plots]. These are an abstraction separating -visualization from particular stage output. Let's look at their syntax next. +Plots can also be defined in a top-level `plots` key in `dvc.yaml`. Unlike +[stage plots], these definitions let you overlay plots from different data +sources, for example training vs. test results (on the current project version). +Conversely, you can create multiple plots from a single source file. You can +also any plot file in the project, regardless of whether it's a stage outputs. +This creates a separation between visualization and outputs. [stage plots]: #stage-plots From 921b00c24d8d00e9f26d9c90b1fd6fa218f528f0 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Perez Date: Wed, 27 Jul 2022 04:47:09 -0600 Subject: [PATCH 15/24] ref: edit `plots show` desc --- content/docs/command-reference/plots/show.md | 20 ++++++++++++-------- 1 file changed, 12 insertions(+), 8 deletions(-) diff --git a/content/docs/command-reference/plots/show.md b/content/docs/command-reference/plots/show.md index 8e86751b81..8a360b9c66 100644 --- a/content/docs/command-reference/plots/show.md +++ b/content/docs/command-reference/plots/show.md @@ -15,7 +15,7 @@ usage: dvc plots show [-h] [-q | -v] [-t ] [-x ] [targets [targets ...]] positional arguments: - targets Plots files or plot id's from `dvc.yaml` to + targets Plots files or plot IDs from `dvc.yaml` to visualize. Shows all plots by default. ``` @@ -29,15 +29,19 @@ All plots defined in `dvc.yaml` are used by default, but specific plots files or [top-level plot] IDs can be specified as `targets` (note that target files don't necessarily have to be defined in `dvc.yaml`). -[top-level plot]: /doc/command-reference/plots#top-level-plots +The plot style can be customized with [plot templates], using the `--template` +option. To learn more about plots file formats and templates, see `dvc plots`. + + -The plot style can be customized with -[plot templates](/doc/command-reference/plots#plot-templates), using the -`--template` option. To learn more about plots file formats and templates please -see `dvc plots`. +The default behavior of this command can be modified per [stage plot] with +`dvc plots modify`. -> Note that the default behavior of this command can be modified per plots file -> with `dvc plots modify`. + + +[plot templates]: /doc/command-reference/plots#plot-templates +[top-level plot]: /doc/command-reference/plots#top-level-plots +[stage plot]: /doc/command-reference/plots#stage-plots ## Options From e295298cb9dbf72da896fdf6d1547e9647365cb1 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Perez Date: Wed, 27 Jul 2022 04:52:18 -0600 Subject: [PATCH 16/24] ref: return plot template examples from `plots show` to index --- content/docs/command-reference/plots/index.md | 63 +++++++++++++++++++ content/docs/command-reference/plots/show.md | 60 ------------------ 2 files changed, 63 insertions(+), 60 deletions(-) diff --git a/content/docs/command-reference/plots/index.md b/content/docs/command-reference/plots/index.md index 976db5c237..3d8465d13b 100644 --- a/content/docs/command-reference/plots/index.md +++ b/content/docs/command-reference/plots/index.md @@ -392,3 +392,66 @@ $ dvc config plots.html_template plots/mypage.html Note that the path supplied to `dvc config plots.html_template` is relative to `.dvc/` directory. + +## Example: Smooth plot + +In some cases we would like to smooth our plot. In this example we will use a +noisy plot with 100 data points: + +```dvc +$ dvc plots show data.csv +file:///Users/usr/src/dvc_plots/index.html +``` + +![](/img/plots_show_no_smooth.svg) + +We can use the `-t` (`--template`) option and `smooth` template to make it less +noisy: + +```dvc +$ dvc plots show -t smooth data.csv +file:///Users/usr/src/dvc_plots/index.html +``` + +![](/img/plots_show_smooth.svg) + +## Example: Confusion matrix + +We'll use `classes.csv` for this example: + +``` +actual,predicted +cat,cat +cat,cat +cat,cat +cat,dog +cat,dinosaur +cat,dinosaur +cat,bird +turtle,dog +turtle,cat +... +``` + +Let's visualize it: + +```dvc +$ dvc plots show classes.csv --template confusion \ + -x actual -y predicted +file:///Users/usr/src/dvc_plots/index.html +``` + +![](/img/plots_show_confusion.svg) + +> A confusion matrix [template](/doc/command-reference/plots#plot-templates) is +> predefined in DVC. + +We can use `confusion_normalized` template to normalize the results: + +```dvc +$ dvc plots show classes.csv -t confusion_normalized + -x actual -y predicted +file:///Users/usr/src/dvc_plots/index.html +``` + +![](/img/plots_show_confusion_normalized.svg) diff --git a/content/docs/command-reference/plots/show.md b/content/docs/command-reference/plots/show.md index 8a360b9c66..ccc86c3d2f 100644 --- a/content/docs/command-reference/plots/show.md +++ b/content/docs/command-reference/plots/show.md @@ -204,66 +204,6 @@ $ dvc plots show --no-header logs.csv -y 2 file:///Users/usr/src/dvc_plots/index.html ``` -## Example: Smooth plot - -In some cases we would like to smooth our plot. In this example we will use a -noisy plot with 100 data points: - -```dvc -$ dvc plots show data.csv -file:///Users/usr/src/dvc_plots/index.html -``` - -![](/img/plots_show_no_smooth.svg) - -We can use the `-t` option and `smooth` template to make it less noisy: - -```dvc -$ dvc plots show -t smooth data.csv -file:///Users/usr/src/dvc_plots/index.html -``` - -![](/img/plots_show_smooth.svg) - -## Example: Confusion matrix - -We'll use `classes.csv` for this example: - -``` -actual,predicted -cat,cat -cat,cat -cat,cat -cat,dog -cat,dinosaur -cat,dinosaur -cat,bird -turtle,dog -turtle,cat -... -``` - -Let's visualize it: - -```dvc -$ dvc plots show classes.csv --template confusion -x actual -y predicted -file:///Users/usr/src/dvc_plots/index.html -``` - -![](/img/plots_show_confusion.svg) - -> A confusion matrix [template](/doc/command-reference/plots#plot-templates) is -> predefined in DVC. - -We can use `confusion_normalized` template to normalize the results: - -```dvc -$ dvc plots show classes.csv --template confusion_normalized -x actual -y predicted -file:///Users/usr/src/dvc_plots/index.html -``` - -![](/img/plots_show_confusion_normalized.svg) - ## Example: Top-level plots ### Simple plot definition From 20fb2989bfc897c3dbc77f885b02e156ffe087c5 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Perez Date: Wed, 27 Jul 2022 04:59:34 -0600 Subject: [PATCH 17/24] guide: move top-lv plot mention from stage entry to desc --- .../project-structure/dvcyaml-files.md | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/content/docs/user-guide/project-structure/dvcyaml-files.md b/content/docs/user-guide/project-structure/dvcyaml-files.md index 77aefb46f0..c991f58268 100644 --- a/content/docs/user-guide/project-structure/dvcyaml-files.md +++ b/content/docs/user-guide/project-structure/dvcyaml-files.md @@ -53,7 +53,19 @@ changed to decide whether the stage requires re-execution (see `dvc status`). If it writes files or dirs, they can be defined as outputs (`outs`). DVC will track them going forward (similar to using `dvc add`). -> See the full stage entry [specification](#stage-entries). + + +Output files may be viable data sources for [top-level plots]. + +[top-level plots]: /doc/command-reference/plots#top-level-plots + + + + + +See the full stage entry [specification](#stage-entries). + + ### Parameter dependencies @@ -426,7 +438,7 @@ These are the fields that are accepted in each stage: | `cmd` | (Required) One or more commands executed by the stage (may contain either a single value or a list). Commands are executed sequentially until all are finished or until one of them fails (see `dvc repro`). | | `wdir` | Working directory for the stage command to run in (relative to the file's location). Any paths in other fields are also based on this. It defaults to `.` (the file's location). | | `deps` | List of dependency paths of this stage (relative to `wdir`). | -| `outs` | List of stage output paths (relative to `wdir`). These can contain optional [subfields](#output-subfields). Some output may be viable data sources for [top-level plots]. | +| `outs` | List of stage output paths (relative to `wdir`). These can contain optional [subfields](#output-subfields). | | `params` | List of parameter dependency keys (field names) to track from `params.yaml` (in `wdir`). The list may also contain other parameters file names, with a sub-list of the param names to track in them. | | `metrics` | List of [metrics files](/doc/command-reference/metrics), and optionally, whether or not this metrics file is cached (`true` by default). See the `--metrics-no-cache` (`-M`) option of `dvc run`. | | `plots` | List of [plot metrics](/doc/command-reference/plots), and optionally, their default configuration (subfields matching the options of `dvc plots modify`), and whether or not this plots file is cached ( `true` by default). See the `--plots-no-cache` option of `dvc run`. | @@ -448,8 +460,6 @@ validation and auto-completion. > See also > [How to Merge Conflicts](/doc/user-guide/how-to/merge-conflicts#dvcyaml). -[top-level plots]: /doc/command-reference/plots#top-level-plots - ### Output subfields > These include a subset of the fields in `.dvc` file From 5a762d64bf53a2d4b6e765613d0d3101bc25eadb Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Perez Date: Wed, 27 Jul 2022 05:09:38 -0600 Subject: [PATCH 18/24] ref: clean up new `plots show` examples --- content/docs/command-reference/plots/show.md | 36 ++++++------------- .../project-structure/dvcyaml-files.md | 13 ++++--- 2 files changed, 18 insertions(+), 31 deletions(-) diff --git a/content/docs/command-reference/plots/show.md b/content/docs/command-reference/plots/show.md index ccc86c3d2f..8563dafd95 100644 --- a/content/docs/command-reference/plots/show.md +++ b/content/docs/command-reference/plots/show.md @@ -208,10 +208,9 @@ file:///Users/usr/src/dvc_plots/index.html ### Simple plot definition -Let's use the `logs.csv` data: +Let's work with the following `logs.csv` data: ``` -# logs.csv epoch,loss,accuracy 1,0.19,0.81 2,0.11,0.89 @@ -219,14 +218,12 @@ epoch,loss,accuracy 4,0.04,0.96 ``` -Minimal plot configuration we can put in `dvc.yaml` is simply data source path -relative to `dvc.yaml` file: +The minimal plot configuration we can put in `dvc.yaml` is the data source path: ```yaml -# dvc.yaml stages: train: - cmd: echo "Training the model..." + cmd: ... plots: logs.csv: @@ -239,14 +236,9 @@ file:///Users/usr/src/dvc_plots/index.html ![](/img/plots_show_spec_default.svg) -We can customize it: +We can also customize it: ```yaml -# dvc.yaml -stages: - train: - cmd: echo "Training the model..." - plots: logs.csv: x: epoch @@ -275,12 +267,9 @@ epoch,train_loss,test_loss 4,0.1,0.23 ``` -```yaml -# dvc.yaml -stages: - train: - cmd: echo "Training the model..." +Plot definition in `dvc.yaml`: +```yaml plots: test_vs_train_loss: x: epoch @@ -298,11 +287,10 @@ file:///Users/usr/src/dvc_plots/index.html ### Sourcing data from different files -Lets prepare comparison for confusion matrix data between test set and training -set: +Lets prepare a comparison for confusion matrix data between the +`train_classes.csv` and a `test_classes.csv` datasets (below): ```csv -# train_classes.csv actual_class,predicted_class dog,dog dog,dog @@ -319,7 +307,6 @@ bird,dog ``` ```csv -# test_classes.csv actual_class,predicted_class dog,dog dog,dog @@ -332,12 +319,9 @@ cat,cat cat,bird ``` -```yaml -# dvc.yaml -stages: - train: - cmd: echo "Training the model..." +In `dvc.yaml`: +```yaml plots: test_vs_train_confusion: x: actual_class diff --git a/content/docs/user-guide/project-structure/dvcyaml-files.md b/content/docs/user-guide/project-structure/dvcyaml-files.md index c991f58268..02e818ad61 100644 --- a/content/docs/user-guide/project-structure/dvcyaml-files.md +++ b/content/docs/user-guide/project-structure/dvcyaml-files.md @@ -478,8 +478,11 @@ validation and auto-completion. ## Top-level plot definitions -The list of `plots` contains one or more user-defined `dvc plots`. Here's an -example that makes output `auc.json` viable for visualization: +The list of `plots` contains one or more user-defined `dvc plots` (paths +relative to the location of `dvc.yaml`). + +This example that makes output `auc.json` viable for visualization, configuring +keys `fpr` and `tpr` as X and Y axis, respectively: ```yaml stages: @@ -499,10 +502,10 @@ plots: y: tpr ``` -Note that we didn't have to specify `auc.json` as a [plot in the stage]. In -fact, [top-level plots] can use any file in the project. +Note that we didn't have to specify `auc.json` as a [plot output] in the stage. +In fact, [top-level plots] can use any file found in the project. -[plot in the stage]: /doc/command-reference/plots#stage-plots +[plot output]: /doc/command-reference/plots#stage-plots ## dvc.lock file From aac654ee94323243edf9194374b0515482657984 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Perez Date: Wed, 27 Jul 2022 05:19:38 -0600 Subject: [PATCH 19/24] ref: more copy edits around plots --- content/docs/command-reference/plots/index.md | 24 ++++++++----------- content/docs/command-reference/plots/show.md | 2 +- 2 files changed, 11 insertions(+), 15 deletions(-) diff --git a/content/docs/command-reference/plots/index.md b/content/docs/command-reference/plots/index.md index 3d8465d13b..714c50cca2 100644 --- a/content/docs/command-reference/plots/index.md +++ b/content/docs/command-reference/plots/index.md @@ -112,27 +112,23 @@ outputs can be marked with `--plots/--plots-no-cache`. This will tell DVC that they are intended for visualizations. Upon running `dvc plots show/diff` DVC will collect stage plots alongside the -[top-level plot definitions] and display them conforming to their configuration. -Note, that if there are stage plots in the project and they are also used in -some top-level definitions, DVC will create separate rendering for the stage -plots and all definitions using them. +[top-level plots](#top-level-plots) and display them conforming to their +configuration. Note, that if there are stage plots in the project and they are +also used in some top-level definitions, DVC will create separate rendering for +the stage plots and all definitions using them. This special type of outputs might come in hand if users want to visually compare experiments results with other experiments versions and not bother with -writing top-level plot definitions into the `dvc.yaml`. - -[top-level plot definitions]: #top-level-plots +writing top-level plot definitions in `dvc.yaml`. ### Top-level plots Plots can also be defined in a top-level `plots` key in `dvc.yaml`. Unlike -[stage plots], these definitions let you overlay plots from different data -sources, for example training vs. test results (on the current project version). -Conversely, you can create multiple plots from a single source file. You can -also any plot file in the project, regardless of whether it's a stage outputs. -This creates a separation between visualization and outputs. - -[stage plots]: #stage-plots +[stage plots](#stage-plots), these definitions let you overlay plots from +different data sources, for example training vs. test results (on the current +project version). Conversely, you can create multiple plots from a single source +file. You can also any plot file in the project, regardless of whether it's a +stage outputs. This creates a separation between visualization and outputs. In order to define the plot users need to provide data and an optional configuration for the plot. The plots should be defined in `dvc.yaml` file under diff --git a/content/docs/command-reference/plots/show.md b/content/docs/command-reference/plots/show.md index 8563dafd95..c7656b1965 100644 --- a/content/docs/command-reference/plots/show.md +++ b/content/docs/command-reference/plots/show.md @@ -34,7 +34,7 @@ option. To learn more about plots file formats and templates, see `dvc plots`. -The default behavior of this command can be modified per [stage plot] with +The default behavior of this command can be modified per [stage plot] file with `dvc plots modify`. From 764654c66aa211d512e7803d43fad20e1854e9ba Mon Sep 17 00:00:00 2001 From: Dave Berenbaum Date: Wed, 27 Jul 2022 15:12:25 -0400 Subject: [PATCH 20/24] Update content/docs/user-guide/project-structure/dvcyaml-files.md --- content/docs/user-guide/project-structure/dvcyaml-files.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/docs/user-guide/project-structure/dvcyaml-files.md b/content/docs/user-guide/project-structure/dvcyaml-files.md index 02e818ad61..8f1034aa94 100644 --- a/content/docs/user-guide/project-structure/dvcyaml-files.md +++ b/content/docs/user-guide/project-structure/dvcyaml-files.md @@ -481,7 +481,7 @@ validation and auto-completion. The list of `plots` contains one or more user-defined `dvc plots` (paths relative to the location of `dvc.yaml`). -This example that makes output `auc.json` viable for visualization, configuring +This example makes output `auc.json` viable for visualization, configuring keys `fpr` and `tpr` as X and Y axis, respectively: ```yaml From 8cc50bc87443aa97286815080c1a2f9690428072 Mon Sep 17 00:00:00 2001 From: Dave Berenbaum Date: Wed, 27 Jul 2022 15:19:30 -0400 Subject: [PATCH 21/24] Update content/docs/command-reference/plots/index.md --- content/docs/command-reference/plots/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/docs/command-reference/plots/index.md b/content/docs/command-reference/plots/index.md index 714c50cca2..77bcb04ed7 100644 --- a/content/docs/command-reference/plots/index.md +++ b/content/docs/command-reference/plots/index.md @@ -117,7 +117,7 @@ configuration. Note, that if there are stage plots in the project and they are also used in some top-level definitions, DVC will create separate rendering for the stage plots and all definitions using them. -This special type of outputs might come in hand if users want to visually +This special type of outputs might come in handy if users want to visually compare experiments results with other experiments versions and not bother with writing top-level plot definitions in `dvc.yaml`. From f0c88a88fc50ccb24507d05572aec1a7d7ae939f Mon Sep 17 00:00:00 2001 From: Dave Berenbaum Date: Wed, 27 Jul 2022 15:21:14 -0400 Subject: [PATCH 22/24] Update content/docs/command-reference/plots/index.md --- content/docs/command-reference/plots/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/docs/command-reference/plots/index.md b/content/docs/command-reference/plots/index.md index 77bcb04ed7..b067362e44 100644 --- a/content/docs/command-reference/plots/index.md +++ b/content/docs/command-reference/plots/index.md @@ -127,7 +127,7 @@ Plots can also be defined in a top-level `plots` key in `dvc.yaml`. Unlike [stage plots](#stage-plots), these definitions let you overlay plots from different data sources, for example training vs. test results (on the current project version). Conversely, you can create multiple plots from a single source -file. You can also any plot file in the project, regardless of whether it's a +file. You can also use any plot file in the project, regardless of whether it's a stage outputs. This creates a separation between visualization and outputs. In order to define the plot users need to provide data and an optional From 839fd63a607fd987f35daa8ae949c159e0c2a38d Mon Sep 17 00:00:00 2001 From: Dave Berenbaum Date: Wed, 27 Jul 2022 15:24:08 -0400 Subject: [PATCH 23/24] Update content/docs/command-reference/plots/index.md --- content/docs/command-reference/plots/index.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/content/docs/command-reference/plots/index.md b/content/docs/command-reference/plots/index.md index b067362e44..0a365749e6 100644 --- a/content/docs/command-reference/plots/index.md +++ b/content/docs/command-reference/plots/index.md @@ -141,10 +141,10 @@ stages: ... plots: ... ``` -Every plots has to have its own ID. Configuration, if provided, should be a +Every plot has to have its own ID. Configuration, if provided, should be a dictionary. -In simplest use case, user can provide file path as the plot ID and don't +In the simplest use case, a user can provide the file path as the plot ID and not provide configuration at all: ```yaml From 22e13f1da7b2b95c9f24ec832460ffbb31671139 Mon Sep 17 00:00:00 2001 From: "Restyled.io" Date: Wed, 27 Jul 2022 19:25:06 +0000 Subject: [PATCH 24/24] Restyled by prettier --- content/docs/command-reference/plots/index.md | 8 ++++---- .../docs/user-guide/project-structure/dvcyaml-files.md | 4 ++-- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/content/docs/command-reference/plots/index.md b/content/docs/command-reference/plots/index.md index 0a365749e6..8e8a100efc 100644 --- a/content/docs/command-reference/plots/index.md +++ b/content/docs/command-reference/plots/index.md @@ -127,8 +127,8 @@ Plots can also be defined in a top-level `plots` key in `dvc.yaml`. Unlike [stage plots](#stage-plots), these definitions let you overlay plots from different data sources, for example training vs. test results (on the current project version). Conversely, you can create multiple plots from a single source -file. You can also use any plot file in the project, regardless of whether it's a -stage outputs. This creates a separation between visualization and outputs. +file. You can also use any plot file in the project, regardless of whether it's +a stage outputs. This creates a separation between visualization and outputs. In order to define the plot users need to provide data and an optional configuration for the plot. The plots should be defined in `dvc.yaml` file under @@ -144,8 +144,8 @@ plots: ... Every plot has to have its own ID. Configuration, if provided, should be a dictionary. -In the simplest use case, a user can provide the file path as the plot ID and not -provide configuration at all: +In the simplest use case, a user can provide the file path as the plot ID and +not provide configuration at all: ```yaml # dvc.yaml diff --git a/content/docs/user-guide/project-structure/dvcyaml-files.md b/content/docs/user-guide/project-structure/dvcyaml-files.md index 8f1034aa94..29521e0bc6 100644 --- a/content/docs/user-guide/project-structure/dvcyaml-files.md +++ b/content/docs/user-guide/project-structure/dvcyaml-files.md @@ -481,8 +481,8 @@ validation and auto-completion. The list of `plots` contains one or more user-defined `dvc plots` (paths relative to the location of `dvc.yaml`). -This example makes output `auc.json` viable for visualization, configuring -keys `fpr` and `tpr` as X and Y axis, respectively: +This example makes output `auc.json` viable for visualization, configuring keys +`fpr` and `tpr` as X and Y axis, respectively: ```yaml stages: