Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

guide: detailed plots spec for x and y fields #4188

Merged
merged 1 commit into from
Dec 16, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
109 changes: 74 additions & 35 deletions content/docs/user-guide/project-structure/dvcyaml-files.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,23 +47,8 @@ are defined at the file level and include all parameters in the file. See
The list of `plots` contains one or more user-defined `dvc plots`
configurations. Every plot must have a unique ID, which may be either a file or
directory path (relative to the location of `dvc.yaml`) or an arbitrary string.
dberenbaum marked this conversation as resolved.
Show resolved Hide resolved
If the ID is an arbitrary string, a data source must be provided in the `y`
field (`x` data source is always optional and cannot be the only data source
provided). Optional configuration fields can be provided as well.

Here's an example plotting ROC and precision-recall curves on the same plot:

```yaml
plots:
- roc_vs_prc:
y:
precision_recall.json: precision
roc.json: tpr
x:
precision_recall.json: recall
roc.json: fpr
title: ROC vs Precision-Recall
```
If the ID is an arbitrary string, a file path must be provided in the `y` field
(`x` file path is always optional and cannot be the only path provided).

<admon icon="book">

Expand All @@ -76,34 +61,88 @@ Refer to [Visualizing Plots] and `dvc plots show` for more examples.

### Available configuration fields

- `y` - source from which the Y axis data comes from:
- `y` - source for the Y axis data:

- **Top-level plots** (_string, list, dict_):

- Top-level plots: accepts string, list, or dictionary (like
`data_source_path: column/field name`).
If plot ID is a path, one or more column/field names is expected. For
example:

- Plot outputs: column/field name found in the source plots file.
```yaml
plots:
- regression_hist.csv:
y: mean_squared_error
- classifier_hist.csv:
y: [acc, loss]
```

- `x` (string) - source from which the X axis data comes from. An auto-generated
_step_ field is used by default.
If plot ID is an arbitrary string, a dictionary of file paths mapped to
column/field names is expected. For example:

- Top-level plots: multiple `x` values are supported, but only if they match
the number of `y` values and are specified as a dictionary (list is not
supported).
```yaml
plots:
- train_val_test:
y:
train.csv: [train_acc, val_acc]
test.csv: test_acc
```

- Plot outputs: column/field name found in the source plots file.
- **Plot outputs** (_string_): one column/field name.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here


- `y_label` (string) - Y axis label. If all `y` data sources have the same field
name, that will be the default. Otherwise, it's "y".
- `x` - source for the X axis data. An auto-generated _step_ field is used by
default.

- `x_label` (string) - X axis label. If all `y` data sources have the same field
name, that will be the default. Otherwise, it's "x".
- **Top-level plots** (_string, dict_):

- `title` (string) - header for the plot(s). Defaults:
If plot ID is a path, one column/field name is expected. For example:

- Top-level plots: `path/to/dvc.yaml::plot_id`
- Plot outputs: `path/to/data.csv`
```yaml
plots:
- classifier_hist.csv:
y: [acc, loss]
x: epoch
```

- `template` (string) - [plot template]. Defaults to `linear`.
If plot ID is an arbitrary string, `x` may either be one column/field name,
or a dictionary of file paths each mapped to one column/field name (the
number of column/field names must match the number in `y`).

```yaml
plots:
- train_val_test: # single x
y:
train.csv: [train_acc, val_acc]
test.csv: test_acc
x: epoch
- roc_vs_prc: # x dict
y:
precision_recall.json: precision
Comment on lines +106 to +119
Copy link
Contributor

@jorgeorpinel jorgeorpinel Dec 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example is pretty long. Maybe we can summarize? And/or collapse the entire Plots/Available fields section (with <details>). For now

roc.json: tpr
x:
precision_recall.json: recall
roc.json: fpr
- confusion: # different x and y paths
y:
dir/preds.csv: predicted
x:
dir/actual.csv: actual
template: confusion
```

- **Plot outputs** (_string_): one column/field name.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here


- `y_label` (_string_) - Y axis label. If all `y` data sources have the same
field name, that will be the default. Otherwise, it's "y".

- `x_label` (_string_) - X axis label. If all `y` data sources have the same
field name, that will be the default. Otherwise, it's "x".

- `title` (_string_) - header for the plot(s). Defaults:

- **Top-level plots**: `path/to/dvc.yaml::plot_id`
- **Plot outputs**: `path/to/data.csv`

- `template` (_string_) - [plot template]. Defaults to `linear`.

[plot template]:
https://dvc.org/doc/user-guide/experiment-management/visualizing-plots#plot-templates-data-series-only
Expand Down