Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add targets/stages argument to dvc exp show #1

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
217 changes: 217 additions & 0 deletions proposals/exp-show-target.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,217 @@
# Add targets/stages argument to dvc exp show

- Enhancement Proposal PR: (leave this empty)
- Contributors: dberenbaum, (add your Github handle)

# Summary
dberenbaum marked this conversation as resolved.
Show resolved Hide resolved

The `dvc exp show` command shows a comparison of the results of different
experiments along with the parameters and metrics values for each experiment.
The command's output currently includes:
* All values that are defined in `params.yaml`.
* All values in other referenced parameters files in any DVC stage.
* All values in any referenced metrics files in any DVC stage.

This proposal allows users to see only parameters and metrics relevant to
specified pipeline stages.
dberenbaum marked this conversation as resolved.
Show resolved Hide resolved

See https://github.com/iterative/dvc/issues/5451 for more background.

# Motivation

`dvc exp show` is the main view by which users may compare experiments, and it's
becoming useful for other purposes, like comparing multiple commits (see
Comment on lines +22 to +23
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's becoming useful for other purposes, like comparing multiple commits

Maybe we should discuss that too/separately (as a proposal) 🙂 Should this be a top-level command? Perhaps dvc compare (--exp[eriments])

Although, dvc compare is also the name I proposed to unify the different diff subcommands at the top-level (another possibly interesting proposal doc).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discuss that too/separately (as a proposal)

Meta: how do we pick between / when to we move from https://github.com/iterative/dvc/discussions to https://github.com/iterative/enhancement-proposals ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meta: how do we pick between / when to we move from https://github.com/iterative/dvc/discussions to https://github.com/iterative/enhancement-proposals ?

Yes, I'm struggling with that already! Let's discuss at retro.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This really belongs in a separate discussion, but it seems like we will eventually want to unify all of the diff commands.

exp diff exists as a convenience wrapper for params + metrics diff, but all of the individual dvc ... diff commands (including dvc diff itself) already support comparing any git commit or reference (including shortened DVC experiment names). It would probably make more sense to just have one diff command and where you can specify which output (data/metrics/params) you want to include/exclude (and nothing about this would be specific to experiments).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://discord.com/channels/485586884165107732/563406153334128681/822464119906369606).

It's also likely the busiest DVC output:

```console
┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━
┃ Experiment ┃ Created ┃ auc ┃ prepare.split ┃ prepare.seed ┃ featurize.max_features ┃ featurize.ngrams ┃ train.seed ┃ train.n_estimators ┃ topic_modeling.seed ┃ top
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━
│ workspace │ - │ 0.51625 │ 0.2 │ 20170428 │ 1500 │ 2 │ 20170428 │ 10 │ 20170428 │ 10
│ topic_modeling │ 04:41 PM │ 0.51625 │ 0.2 │ 20170428 │ 1500 │ 2 │ 20170428 │ 10 │ 20170428 │ 10
│ master │ Feb 17, 2021 │ 0.51625 │ 0.2 │ 20170428 │ 1500 │ 2 │ 20170428 │ 10 │ - │ -
│ └── exp-56d11 │ 04:40 PM │ 0.51625 │ 0.2 │ 20170428 │ 1500 │ 2 │ 20170428 │ 10 │ 20170428 │ 10
│ 263732a │ Feb 11, 2021 │ 0.5674 │ 0.2 │ 20170428 │ 1500 │ 2 │ - │ 100 │ - │ -
│ b297969 │ Feb 11, 2021 │ 0.5674 │ 0.2 │ 20170428 │ 1500 │ 2 │ - │ 100 │ - │ -
│ 4edd930 │ Feb 11, 2021 │ 0.5674 │ 0.2 │ 20170428 │ 1500 │ 2 │ - │ 100 │ - │ -
│ 72ed9cd │ Nov 15, 2020 │ 0.5674 │ 0.2 │ 20170428 │ 1500 │ 2 │ 20170428 │ 50 │ - │ -
│ ├── exp-47dfa │ Feb 17, 2021 │ 0.51625 │ 0.2 │ 20170428 │ 1500 │ 2 │ 20170428 │ 10 │ - │ -
│ └── exp-44136 │ Feb 17, 2021 │ 0.5674 │ 0.2 │ 20170428 │ 1500 │ 2 │ 20170428 │ 50 │ - │ -
│ f8e9d93 │ Nov 15, 2020 │ 0.54175 │ 0.2 │ 20170428 │ 1500 │ 2 │ 20170428 │ 50 │ - │ -
```

This table view is from a simplified example and is still missing several
columns, so this view can become cluttered quickly. Eliminating irrelevant
information is a priority.

There are several ways in which the included parameters might be irrelevant:
* Parameters files may include values that aren't tracked by DVC at all (for
example, debug level).
Comment on lines +49 to +51
Copy link
Contributor

@jorgeorpinel jorgeorpinel Mar 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for metrics, I think (should they be mentioned?).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that in case of metrics we don't have this problem: iterative/dvc#5451 (comment)
because metrics are files while params are loaded as files while they should be loaded as file entries/lines.

Copy link
Contributor

@jorgeorpinel jorgeorpinel Mar 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So is the earlier bullet about metrics correct?

The command's output currently includes: ...

  • All values in any referenced metrics files in any DVC stage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All metrics are outputs from a DVC stage, which means that:

  • Unlike parameters, metrics are always tracked by some DVC stage.
  • Metrics still may be irrelevant to a particular DVC stage, as mentioned in the bullets below.

* The parameters may be used in a different pipeline branch (for example, the
`topic_modeling` stage is in a separate branch of the pipeline from the
`evaluate` stage, which produces the relevant experiment metrics).
* The parameters may be used in a downstream pipeline branch (for example, the
post-processing `combine` stage parameters aren't relevant to experiment
evaluation).

```console
+-------------------+
| data/data.xml.dvc |
+-------------------+
*
*
*
+---------+
| prepare |
+---------+
*
*
*
+-----------+
| featurize |
**+-----------+**
**** * ****
**** * ***
** * ***
+-------+ * **
| train | ** *
+-------+ * *
** ** *
** ** *
* * *
+----------+ +----------------+
| evaluate | | topic_modeling |
+----------+ +----------------+
*** ****
* ***
** **
+---------+
| combine |
+---------+
```

There are no apparent reasons that users want to see untracked parameters in
`dvc exp show`. Unlike `dvc params diff` and `dvc metrics diff`, `dvc exp show`
dberenbaum marked this conversation as resolved.
Show resolved Hide resolved
is not applicable to arbitrary files. Likewise, there are no apparent reasons
that users want to see parameters or metrics from irrelevant stages of their
pipelines. Users could keep untracked parameters in other files if they don't
want them shown in `dvc exp show`, and they could exclude parameters or metrics
from irrelevant stages, but DVC already has enough information to understand
that these should not be shown.

# Detailed design

`dvc exp show` could take stages as (positional?) arguments, similar to `dvc exp
run`. The help message could be similar:

```console
positional arguments:
stages Stages for which to print experiment info. 'dvc.yaml'
by default.
Comment on lines +111 to +112
Copy link
Contributor

@jorgeorpinel jorgeorpinel Mar 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd go with targets for consistency with existing commands (e.g. repro), and since dvc.yaml files can be targets too.

Suggested change
stages Stages for which to print experiment info. 'dvc.yaml'
by default.
usage: dvc experiments show [-h] [-q | -v] [-a] [-T] [-A] [-n <num>]
...
[targets [<target> ...]]
targets Stages to include. 'dvc.yaml' by default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about dvc params/metrics diff --targets, which use targets to mean filenames and not stages?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking of existing commands that target stages (again like repro which predate params and metrics BTW). But yeah --targets for some reason was designed to specifically accept file names and not param/metric stages, maybe something else to reconsider but seems unrelated here 🙂

```

If no stages are passed, parameters and metrics defined in any stage would be
shown. This differs from current behavior only because untracked parameters
Comment on lines +115 to +116
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defined in any stage

Any stage in ./dvc.yaml only or any stage anywhere (like repro -P?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I think any stage anywhere is better, but we need to work on terminology for this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the default targets for repro etc is just ./dvc.yaml

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks for pointing that out. In that case, I guess we should have a default of ./dvc.yaml and include a -P, --all-pipelines option. Seems like maybe I need to go through the repro options and determine all that should apply here.

would be excluded.

If stages are passed, parameters and metrics defined anywhere in the pipeline
for those stages would be shown.
Comment on lines +119 to +120
Copy link
Contributor

@jorgeorpinel jorgeorpinel Mar 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just for those specific stages? Wouldn't mind incorporating the --[up/down]stream options to this proposal TBH. Full makeover 💇 💅

Copy link
Contributor

@jorgeorpinel jorgeorpinel Mar 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or --single[-item] as mentioned a bit further down.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is out of scope for now

The proposal format seems a bit long anyway, so why not cover as much as possible? Reuse the motivation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proposal format seems a bit long anyway, so why not cover as much as possible? Reuse the motivation

True, I'm used to trying to limit scope for issues, but one advantage of this format is to holistically discuss related issues in one cohesive doc. Thoughts on --single[-item]?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on --single[-item]?

If the idea here is to remove noise from the very busy exp show table maybe single-item should be the default and have --upstream instead (or some other names). No strong opinion though.

Would also throw in --downstream for consistency in either case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, my initial reaction is to follow the repro conventions, which would do upstream by default, but also include:

  • -s, --single-item
  • -p, --pipeline
  • --downstream

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No strong opinion


Using the above examples, `dvc exp show evaluate` would be equivalent to using
the `--exclude-params` and `exclude-metrics` options to exclude all parameters
and metrics for the `topic_modeling` and `combine` stages:

```console
┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━┓
┃ Experiment ┃ Created ┃ auc ┃ prepare.split ┃ prepare.seed ┃ featurize.max_features ┃ featurize.ngrams ┃ train.seed ┃ train.n_estimators ┃ train.random_state ┃ … ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━┩
│ workspace │ - │ 0.51625 │ 0.2 │ 20170428 │ 1500 │ 2 │ 20170428 │ 10 │ - │ - │
│ topic_modeling │ Mar 18, 2021 │ 0.51625 │ 0.2 │ 20170428 │ 1500 │ 2 │ 20170428 │ 10 │ - │ - │
│ ├── exp-e2784 │ 12:15 PM │ 0.51625 │ 0.2 │ 20170428 │ 1500 │ 2 │ 20170428 │ 10 │ - │ - │
│ └── exp-44136 │ 10:29 AM │ 0.51625 │ 0.2 │ 20170428 │ 1500 │ 2 │ 20170428 │ 10 │ - │ - │
│ master │ Feb 17, 2021 │ 0.51625 │ 0.2 │ 20170428 │ 1500 │ 2 │ 20170428 │ 10 │ - │ - │
│ └── exp-56d11 │ Mar 18, 2021 │ 0.51625 │ 0.2 │ 20170428 │ 1500 │ 2 │ 20170428 │ 10 │ - │ - │
│ 263732a │ Feb 11, 2021 │ 0.5674 │ 0.2 │ 20170428 │ 1500 │ 2 │ - │ 100 │ 20170428 │ - │
│ b297969 │ Feb 11, 2021 │ 0.5674 │ 0.2 │ 20170428 │ 1500 │ 2 │ - │ 100 │ 20170428 │ - │
│ 4edd930 │ Feb 11, 2021 │ 0.5674 │ 0.2 │ 20170428 │ 1500 │ 2 │ - │ 100 │ 20170428 │ - │
│ 72ed9cd │ Nov 15, 2020 │ 0.5674 │ 0.2 │ 20170428 │ 1500 │ 2 │ 20170428 │ 50 │ - │ - │
│ ├── exp-47dfa │ Feb 17, 2021 │ 0.51625 │ 0.2 │ 20170428 │ 1500 │ 2 │ 20170428 │ 10 │ - │ - │
│ └── exp-44136 │ Feb 17, 2021 │ 0.5674 │ 0.2 │ 20170428 │ 1500 │ 2 │ 20170428 │ 50 │ - │ - │
│ f8e9d93 │ Nov 15, 2020 │ 0.54175 │ 0.2 │ 20170428 │ 1500 │ 2 │ 20170428 │ 50 │ - │ - │
│ 377c988 │ Nov 15, 2020 │ 0.54175 │ 0.2 │ 20170428 │ 500 │ 1 │ 20170428 │ 50 │ - │ - │
│ 27d4e7c │ Nov 15, 2020 │ - │ 0.2 │ 20170428 │ 500 │ 1 │ 20170428 │ 50 │ - │ - │
│ 8bf2091 │ Nov 15, 2020 │ - │ 0.2 │ 20170428 │ 500 │ 1 │ 20170428 │ 50 │ - │ - │
│ ff9e2fa │ Nov 15, 2020 │ - │ 0.2 │ 20170428 │ 500 │ 1 │ 20170428 │ 50 │ - │ - │
│ 11e1705 │ Nov 15, 2020 │ - │ 0.2 │ 20170428 │ 1500 │ 2 │ 20170428 │ 10 │ - │ - │
│ a82585e │ Nov 15, 2020 │ - │ 0.2 │ 20170428 │ 1500 │ 2 │ 20170428 │ 10 │ - │ - │
│ c778a78 │ Nov 15, 2020 │ - │ 0.2 │ 20170428 │ 1500 │ 2 │ 20170428 │ 10 │ - │ - │
│ 551082e │ Nov 15, 2020 │ - │ 0.2 │ 20170428 │ 1500 │ 2 │ 20170428 │ 10 │ - │ - │
│ 15bef96 │ Nov 15, 2020 │ - │ 0.2 │ 20170428 │ 1500 │ 2 │ 20170428 │ 10 │ - │ - │
└────────────────┴──────────────┴─────────┴───────────────┴──────────────┴────────────────────────┴──────────────────┴────────────┴────────────────────┴────────────────────┴───┘
```

This table includes parameters and metrics from the `evaluate` stage and all
stages upstream of it. It's possible to also have a `--single-item` argument to
exclude upstream changes, but this is out of scope for now.

# How We Teach This

See [Detailed design](#detailed-design) for a proposed help message.

This is consistent with other commands, and it's only one additional argument to
an existing command, so it likely doesn't require much teaching. It might be
Comment on lines +159 to +164
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this section I think something v useful would be to list the specific docs that could be affected by this (and how), and possibly new ones too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW lmk if you'd like me to propose a list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙏 That would be great!

necessary to explicitly add that parameters and metrics from upstream stages
will be shown.

# Drawbacks

This proposal may require changes to how DVC collects and tracks parameters and
metrics. Implementation could break existing functionality, and it is
inconsistent with the current behaviors of `dvc params`, `dvc metrics`, and `dvc
exp diff` commands.
Comment on lines +172 to +173
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be an advantage. Otherwise why have all these different options if they do the same thing?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be an advantage.

I agree though we need to remember that we released a major version recently, and this would be kind of a breaking change. We don't know how many users actually want to see current behavior, and if changing it won't make users lose trust in our development process.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I guess its addressed down below.


Further, this proposal adds to an already crowded set of `dvc exp show` options,
Copy link
Contributor

@jorgeorpinel jorgeorpinel Mar 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another related-discussion/proposal: can we simplify the flags? E.g. merge the --include, --exclude, and --sort ones (may require them to accept 2 CLI args which I'm not sure is possible). E.g. exp show --inc :all auc,rank --sort p.tsv:epochs asc would include all params and just those 2 metrics, while inverse sorting by the epochs param.

Copy link
Contributor

@jorgeorpinel jorgeorpinel Mar 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or at least shorten the names. In any case, not sure it's so crowded in terms of functionality, since these 6 flags are all for similar purposes. Similar to dvc plots show which also has a bunch of display-related options.

and there is no single way to show parameters and metrics that's going to work
for every use case. For example, this proposal does nothing to filter out
parameters for upstream stages, which may frequently be irrelevant if only the
stages near the end of the pipeline change during experimentation.

# Alternatives

There are many alternative methods to refine the output of `dvc exp show`:

* Current functionality: `exp show` already has options to include/exclude any
column, so the flexibility to get the same output as suggested here already
exists.
Comment on lines +185 to +187
Copy link
Contributor

@jorgeorpinel jorgeorpinel Mar 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But DVC still has to read all those stages and values whereas with targets the operation could actually improve performance-wise (perhaps).

* [Wildcards](https://github.com/iterative/dvc/issues/5642): Expand the
include/exclude options to accept wildcard/glob/regex arguments for more
flexibility to exclude an entire file or section of a file like
`exclude-params=params.json:local_config*`.
* Save configuration options for `dvc exp show` so they can be reused and
modified without typing out all options at the command line each time.
Comment on lines +192 to +193
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another great idea. BTW also similar to plots show/diff (plots templates).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just yesterday we talked about being able to group the plots into "report files".

* [CSV](https://github.com/iterative/dvc/issues/5446): Add a tsv/csv output
format option in `dvc exp show` to enable users to manipulate the table with
other tools.

This proposal does not preclude any of these options. The current output
will continue to be available except for untracked parameters. The other
alternatives may still be implemented.

Each of the alternatives provide flexibility to modify the `dvc exp show`
outputs, which is important but not the goal of this proposal. Instead, the
proposed feature uses the information DVC has to make smart decisions about what
Comment on lines +203 to +204
Copy link
Contributor

@jorgeorpinel jorgeorpinel Mar 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like this def. covers the discussion about default behavior then, which I think is good (but may need clarification earlier on in the doc).

to show and save the users from needing to make modifications that would always
be desirable.

# Unresolved questions

* What about consistency with `parameters`, `metrics`, and `exp diff` commands?
Comment on lines +208 to +210
Copy link
Contributor

@jorgeorpinel jorgeorpinel Mar 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meta: maybe merge this section with Drawbacks?

Should these also be modified or left as is? As mentioned above, `parameters`
and `metrics` in particular may be used on arbitrary files outside of DVC
projects.
* Should experiments themselves be excluded based on what stages were run? Using
the example above, if a user ran both `dvc exp run evaluate` and `dvc exp run
topic_modeling` experiments, should all of these experiments show when running
`dvc exp show evaluate`?