Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exp show: Allow for wildcard patterns in include and exclude params/m… #5720

Merged
merged 8 commits into from
Apr 5, 2021

Conversation

daavoo
Copy link
Contributor

@daavoo daavoo commented Mar 28, 2021

…etrics

Fixes #5642

Related doc PR: iterative/dvc.org#2334

Copy link
Contributor

@pmrowla pmrowla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question I have about this is how we should handle matching across dictionary levels.

Currently, we support matching against nested dictionary levels using the . syntax to separate nested keys. So for a params.yaml like in our example-get-started

prepare:
  split: 0.20
  seed: 20170428

featurize:
  max_features: 3000
  ngrams: 2

train:
  seed: 20170428
  n_est: 100
  min_split: 64

You can currently do --include-params prepare to match everything in prepare.*.

in master:

dvc exp show --no-pager --include-params prepare
┏━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Experiment    ┃ Created      ┃     auc ┃ prepare.split ┃ prepare.seed ┃
┡━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ workspace     │ -            │ 0.57756 │ 0.2           │ 20170428     │
│ master        │ Nov 11, 2020 │ 0.57756 │ 0.2           │ 20170428     │
...

I'm not sure what the best way for us to handle globbing would be - should it only match within a single nesting level, or should it match the entire string?

For reference, this PR currently only matches inside one nesting level (and not the entire string), so we get:

dvc exp show --no-pager --include-params \*split
ERROR: failed to show experiments - '*split' does not match any known params
dvc exp show --no-pager --include-params \*.split                                   ⏎
┏━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Experiment    ┃ Created      ┃     auc ┃ prepare.split ┃
┡━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ workspace     │ -            │ 0.57756 │ 0.2           │
│ master        │ Nov 11, 2020 │ 0.57756 │ 0.2           │
...

In this case, *split is only compared against the top level keys (prepare, featurize, train) and does not match anything. Using *.split matches * against the top level keys, and then split against the next level of keys (and matches prepare.split)

@pmrowla
Copy link
Contributor

pmrowla commented Mar 30, 2021

Keeping this as-is for simplicity's sake and requiring separate glob patterns per-nesting level (separated by .) for matching within nested keys is fine with me, but we would have be sure to document it. @dberenbaum @jorgeorpinel thoughts?

One concern here would be that if I have multiple levels of nesting like

a.b.foo
a.b.bar
c.d.foo
c.d.bar
e.f.foo
e.f.bar

and wanted "any key ending with foo", you would have to use *.*.foo, with one * for every level of keys rather than just doing *foo. This would potentially be a bigger issue in the event that I don't have a consistent level of nesting across all of my keys, like

a.foo
b.c.foo
d.e.f.foo

With this structure, there is currently no glob pattern which would match "any key ending with foo".

@pmrowla
Copy link
Contributor

pmrowla commented Mar 30, 2021

So on second thought, maybe it would be best to drop the current method of allowing prepare to match everything nested beneath prepare., and only support matching via glob patterns (against the entire string). So the user would have to explicitly use prepare* or prepare.* to get the current --include-params prepare behavior.

@daavoo
Copy link
Contributor Author

daavoo commented Mar 30, 2021

So on second thought, maybe it would be best to drop the current method of allowing prepare to match everything nested beneath prepare., and only support matching via glob patterns (against the entire string). So the user would have to explicitly use prepare* or prepare.* to get the current --include-params prepare behavior.

I personally consider this the best solution from the perspectives of both user experience (easier to define patterns for complicated nested params) and maintenance (it will simplify the code) . I did not implemented it because, as you have noted, it introduces a backwards incompatible change and felt like that was out of the scope in a first P.R (I have no problem with implementing the full string matching approach if you so choose).

@daavoo
Copy link
Contributor Author

daavoo commented Mar 30, 2021

One question I have about this is how we should handle matching across dictionary levels.

Currently, we support matching against nested dictionary levels using the . syntax to separate nested keys. So for a params.yaml like in our example-get-started

prepare:
  split: 0.20
  seed: 20170428

featurize:
  max_features: 3000
  ngrams: 2

train:
  seed: 20170428
  n_est: 100
  min_split: 64

You can currently do --include-params prepare to match everything in prepare.*.

in master:

dvc exp show --no-pager --include-params prepare
┏━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Experiment    ┃ Created      ┃     auc ┃ prepare.split ┃ prepare.seed ┃
┡━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ workspace     │ -            │ 0.57756 │ 0.2           │ 20170428     │
│ master        │ Nov 11, 2020 │ 0.57756 │ 0.2           │ 20170428     │
...

I'm not sure what the best way for us to handle globbing would be - should it only match within a single nesting level, or should it match the entire string?

For reference, this PR currently only matches inside one nesting level (and not the entire string), so we get:

dvc exp show --no-pager --include-params \*split
ERROR: failed to show experiments - '*split' does not match any known params
dvc exp show --no-pager --include-params \*.split                                   ⏎
┏━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Experiment    ┃ Created      ┃     auc ┃ prepare.split ┃
┡━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ workspace     │ -            │ 0.57756 │ 0.2           │
│ master        │ Nov 11, 2020 │ 0.57756 │ 0.2           │
...

In this case, *split is only compared against the top level keys (prepare, featurize, train) and does not match anything. Using *.split matches * against the top level keys, and then split against the next level of keys (and matches prepare.split)

I will add a test case for nested params as that use case is not currently covered.

@dberenbaum
Copy link
Collaborator

Yeah, I prefer getting rid of the current approach and simplifying to a full string pattern match. The current way has some nice benefits for exact matches to filenames, sections, etc., but it's less intuitive and flexible. I don't think backwards incompatibility is a big deal since it's an experimental feature and I don't think current behavior is even documented.

My other question would be whether fnmatch is preferable to regex? fnmatch is probably simpler for most, but I'm sure some users will claim to want the power of regex (for example, if they have some convention like making certain parameters all caps).

@pmrowla
Copy link
Contributor

pmrowla commented Mar 31, 2021

I think keeping it simple w/fnmatch is best for now. If/when we get enough users asking for regex we can consider it at that point.

@daavoo you can go ahead and make the changes to drop the existing behavior and only do the glob matching against the full param string

@daavoo
Copy link
Contributor Author

daavoo commented Mar 31, 2021

My other question would be whether fnmatch is preferable to regex? fnmatch is probably simpler for most, but I'm sure some users will claim to want the power of regex (for example, if they have some convention like making certain parameters all caps).

I think keeping it simple w/fnmatch is best for now. If/when we get enough users asking for regex we can consider it at that point.

Unless I am missing something, replacing fnmatch with regex would be an easy change to make, when the claim comes.

@daavoo daavoo requested a review from pmrowla March 31, 2021 07:26
Copy link
Contributor

@pmrowla pmrowla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good for the most part, just a couple minor notes.

dvc/command/experiments.py Outdated Show resolved Hide resolved
dvc/command/experiments.py Show resolved Hide resolved
@pmrowla pmrowla requested a review from dberenbaum March 31, 2021 08:44
Co-authored-by: Peter Rowlands (변기호) <[email protected]>
Copy link
Collaborator

@dberenbaum dberenbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pmrowla
Copy link
Contributor

pmrowla commented Apr 2, 2021

@daavoo can you update the docs PR to reflect the path discussion (the wildcard pattern will not be applied to path: prefix)? Once that's done we can merge this PR

@daavoo
Copy link
Contributor Author

daavoo commented Apr 5, 2021

@daavoo can you update the docs PR to reflect the path discussion (the wildcard pattern will not be applied to path: prefix)? Once that's done we can merge this PR

Done: iterative/dvc.org#2334

@pmrowla pmrowla merged commit c634577 into iterative:master Apr 5, 2021
@daavoo daavoo deleted the exp-show-allow-wildcard-pattern branch April 9, 2021 10:21
@efiop efiop added the feature is a feature label Apr 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature is a feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow for wildcard patterns in include and exclude params/metrics with dvc exp show
4 participants