-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
exp show: show metrics that include separator #9819
Conversation
cc @d-miketa |
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #9819 +/- ##
==========================================
+ Coverage 90.81% 90.86% +0.05%
==========================================
Files 471 471
Lines 35929 35945 +16
Branches 5194 5196 +2
==========================================
+ Hits 32630 32663 +33
+ Misses 2715 2697 -18
- Partials 584 585 +1
☔ View full report in Codecov by Sentry. |
for path in param_names: | ||
if sort_name in param_names[path]: | ||
matches.add((path, sort_name, "params")) | ||
if sort_by in param_names[path]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to support it in --set-param
or are we implying that this will only be supported for exp show / metrics?
(in the current fix we are also supporting params in the exp show)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's worth discussing, but I figured even this limited fix is better than the current situation.
dvc/repo/experiments/show.py
Outdated
path, _, sort_name = sort_by.rpartition(":") | ||
path, _, sort_name = sort_by.partition(":") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needed to address #9819 (comment), but not sure if there was a reason for using rpartition
(cc @pmrowla)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can have colons in filenames.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't mean to block this.
For the record, would be good to test against Studio
@daavoo Can we add a test in Studio rather than do it manually? |
Let's get approval from @pmrowla to make sure it doesn't break some expectation in the existing code. |
If it does, let's please add a regression test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really a problem with the path:metric_or_param
addressing in general, it breaks as soon as you have a colon in either the path or the metric/param name.
IMO we should just explicitly disallow the use of colons in filename + metric/param names for now. If this is really something users want then we need to have actual support for escaping the colon in CLI flags, so you would have to do --sort-by=metrics\:file.json:my\:metric
(this affects exp run --set-param
too)
dvc/repo/experiments/show.py
Outdated
path, _, sort_name = sort_by.rpartition(":") | ||
path, _, sort_name = sort_by.partition(":") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can have colons in filenames.
@pmrowla @dberenbaum can we instead check, for all the supported suffixes, if |
We could, but it's still naive given that DVC allows yaml metrics or params files to have any file extension (we explicitly treat |
@pmrowla Updated the logic so it should handle |
for more information, see https://pre-commit.ci
dvc/repo/experiments/show.py
Outdated
for split_num in range(1, len(parts)): | ||
path = sep.join(parts[:split_num]) | ||
sort_name = sep.join(parts[split_num:]) | ||
if path in metric_names and sort_name in metric_names[path]: | ||
matches.add((path, sort_name, "metrics")) | ||
if path in param_names and sort_name in param_names[path]: | ||
matches.add((path, sort_name, "params")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would have to go from range(0, len(parts))
, it has to cover the case where there is no path, and only a metric containing a colon
IMO this is also more confusing than requiring escaping, and I think there are still edge cases where the user could have filenames and metric names that overlap and break this kind of behavior
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would have to go from
range(0, len(parts))
, it has to cover the case where there is no path, and only a metric containing a colon
Thanks, fixed.
IMO this is also more confusing than requiring escaping, and I think there are still edge cases where the user could have filenames and metric names that overlap and break this kind of behavior
My thoughts on this PR:
- It requires nothing of the user
- Edge cases will show the same error about being ambiguous that shows now unless I miss something
- Managing escape characters between how they get read in the user's shell and how we handle them seems more confusing to me (and more for the user to understand)
- Don't see much harm since the implementation already exists
Do you feel it's a blocker and escaping is the only way to go?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that the existing behavior is already broken, we can merge this.
But my objection is more that this same problem affects every other command/flag combination that uses path:metric_or_param
addressing (like exp run --set-param
and probably also stage add --params
). I would prefer to only have to fix this once, in a consistent and re-usable way. This PR is just patching over one symptom of a broader underlying problem
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also in the same vein, I'm not sure that we actually document that .
is an invalid character in metric or parameter names, but for all intents and purposes we don't support it because we treat .
as the dictionary level separator in metric/param addressing (even though it is valid to have string keys that contain .
in yaml/json). It would be easier for us to just treat :
the same way and disallow it entirely in metric/param names
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Valid points. As mentioned in the issue description, I don't see much harm in doing this unless/until we forbid these characters or have a better solution like escaping, so I'm going to merge but open an issue to discuss whether we should allow it.
Fixes
dvc exp show --sort-by
when:
separator is included in metric name (see https://iterativeai.slack.com/archives/C03JS2V4MQU/p1691415243711839). Not sure if we should explicitly allow this, but I don't see any harm with this fix.