dvc metrics diff #2995

dmpetrov · 2019-12-22T02:24:29Z

Today, we can track metrics but metrics become much more valuable when you can see differences/improvements over time (commits/branches). A new dvc metrics diff command is needed.

$ dvc metrics diff HEAD^^
        metr.json:
		{
		    "top1-error": 0.0385,
		    "top5-error": 0.039221
		}

Open question: What should we do about not float/integer metrics? Let's don't support (ignore) them. Any other ideas?

Note, we should deal with float formatting carefully - we don't want to see diff values like 0.0001624000000000000001.

Also, an easy-to-parse output option is required:

$ dvc metrics diff HEAD^^ --to-json
{
    {
        "file": "metr.json"
	"changed": {
            "top1-error": {
                "old": 0.0385,
                "new": 0.0384824,
                "diff": 0.0000176
            },
        "removed": {
            "top5-error": {
                "old": 0.039221
            }
        }
        "added": {
            "loss": {
                "new": 0.0384824
            }
        }
}

The text was updated successfully, but these errors were encountered:

efiop · 2019-12-22T10:52:07Z

A workaround would be pretty trivial though. dvc get from HEAD^^, dvc get from current HEAD and run old plain diff on it.

dmpetrov · 2019-12-23T00:02:50Z

@efiop yeap! Plain diff won't work, unfortunately - the order of metrics might easily change and you see a total mess. Also, we need a nice looking diff numbers, not old and new values.

But agree, we just need a nice looking shortcut for that.

DavidGOrtega · 2019-12-26T10:15:51Z

Are you guys contemplating in the specs every stage of the pipeline?
Something like this:

{
      "train": {
        "train_time": "3d 8h 23m 15s",
        "memory_consume": "8Gb"
      },
      "eval": {
        "inference_time": 0.001,
        "memory_consume": "124Mb",

        "top1-error": 0.0385,
        "top5-error": 0.039221
      }
    }

pared · 2019-12-30T16:59:25Z

@dmpetrov

Open question: What should we do about not float/integer metrics? Let's don't support (ignore) them. Any other ideas?

For starters, I would not support them, but I think at some point we could add functionality allowing the user to define how to calculate the difference between particular metrics, by executing some custom-defined method. That could resemble how we are dealing with summon right now.
It sounds kind of complicated as for now.

Question from me:
Do we want to support diff only for 2 revs ("old" and "new")? Or will we want to support revs ranges at some point (eg dvc metrics diff HEAD~10 for observing changes during last 10 iterations)? If so, the notion of "old", "new" and "diff" might have to be changed.

dmpetrov · 2020-01-02T16:48:16Z

That could resemble how we are dealing with summon right now.

👍 I have the same thoughts - we need a "custom" metrics file to support multiple formats (our basic json + csv are not enough). Probably, different types of metrics can be represented as separate summoning-objects (not necessary metrics inside summoning object).

Metrics are a bit more complicated than just numbers. There are numbers, numbers with сonfidence level, confusion matrixes and so one. It is not easy to find a single way of dealing with them. Even for numbers metrics, it is important to know if you are minimizing or maximizing it (is +013 good or bad?) to visualize properly and find the "best one".

Do we want to support diff only for 2 revs

Only two. The same idea as git diff.

DavidGOrtega · 2020-01-02T19:30:34Z

Question from me:
Do we want to support diff only for 2 revs ("old" and "new")? Or will we want to support revs ranges at some point (eg dvc metrics diff HEAD~10 for observing changes during last 10 iterations)? If so, the notion of "old", "new" and "diff" might have to be changed.

One of the best features would be observability. One of the best features of dvc is having all the metrics all together. I would vote for having that somehow.

pared · 2020-01-03T09:39:12Z

@DavidGOrtega @dmpetrov Well, that seems like a conflict "behave as git" vs "this might be useful in ML use-case". I guess we can always think about implementing another new metrics command (--range flag for diff could also be option, but I think it would not make too much sense, if we are going to stick to old, new notion for diff)

dmpetrov · 2020-01-03T16:30:28Z

@DavidGOrtega I'd appreciate if you could provide a use case (with a command example) when it will be helpful (and not easily replicable with multiple dvc diff)

DavidGOrtega · 2020-01-03T18:34:45Z

@dmpetrov is your question is regarding my question

Are you guys contemplating in the specs every stage of the pipeline?

Im just only asking without any intention of going towards it, was only to have a better picture.

If your question is why "all together" observability would be one of the best features is to have a general overview of the experiment. Conceptually an experiment can have many permutations of data, parameters and even implementation but at the end what should matter is how it performs according to the metrics that you want to measure.
The comparison between the last two may not mean that you have the best performance despite it has improved compared with the last one.

So, yes, you could be able to do multiple diffs but for an experiment with many trials that would be difficult to handle.

jorgeorpinel · 2020-01-27T05:47:59Z

Hi. Is this closed by #3051? (docs on their way too: iterative/dvc.org/pull/933)

p.s. I know I'm late to this party but it seems more like a dvc metrics delta to me. diff typically shows the base value removed and then the current value added (doesn't compute an arithmetic difference).

jorgeorpinel · 2020-01-27T06:09:01Z

A workaround would be pretty trivial...
agree, we just need a nice looking shortcut

Please note that we already provided a short script for this same workaround some time ago in #770 (comment) ! @efiop @dmpetrov

Even for numbers metrics, it is important to know if you are minimizing or maximizing it (is +013 good or bad?)

Agree. For certain numeric scales and ranges a simple B-A calculation may yield no meaning. For these cases maybe add --min or --max flags (or --compare=min/max/etc) so it just tells you which version has the best metric (and its value)?

efiop · 2020-01-29T14:44:21Z

@jorgeorpinel I was expecting some follow up requirements on this one, but looks like everything mentioned in this ticket is already implemented. So let's close it for now. Thanks for the heads up!

p.s. I know I'm late to this party but it seems more like a dvc metrics delta to me. diff typically shows the base value removed and then the current value added (doesn't compute an arithmetic difference).

But this diff is a metrics diff, direct comparison with git diff is not quite correct here. Plus, git diff shows before and after without the difference simply because it is working with strings and we, in dvc, have an ability to tell if something is a number. API part is actually closer to git diff, as it has both old and new, but for CLI users don't want to see the old value, they just want to see the change. I'm sure there will be some followups here, after we receive some feedback.

dmpetrov · 2020-02-02T23:22:32Z

it seems more like a dvc metrics delta to me. diff typically shows the base value removed and then the current value added (doesn't compute an arithmetic difference).

@jorgeorpinel Re the ‘git diff’ - it is a problem of Git which just cannot quantify the difference. If they can they would show numbers instead of line-by-line diff 😀

But really, if number diff does not work for dvc metrics it means the metrics were not defined properly. We need to introduce a more strict requirements for metric files.

dmpetrov added the feature request Requesting a new feature label Dec 22, 2019

efiop added the p2-medium Medium priority, should be done, but less important label Dec 22, 2019

dmpetrov mentioned this issue Dec 23, 2019

dvc metrics diff for summon/catalog artifacts #2997

Closed

efiop added the research label Dec 24, 2019

dmpetrov added the product: VSCode Integration with VSCode extension label Dec 24, 2019

weekly-digest bot mentioned this issue Dec 29, 2019

Weekly Digest (22 December, 2019 - 29 December, 2019) #3013

Closed

efiop self-assigned this Jan 2, 2020

efiop mentioned this issue Jan 4, 2020

metrics: introduce diff #3051

Merged

7 tasks

jorgeorpinel mentioned this issue Jan 27, 2020

diff: clean up output for changed files #2982

Closed

7 tasks

efiop closed this as completed Jan 29, 2020

dmpetrov mentioned this issue Feb 22, 2020

ML experiments and hyperparameters tuning #2799

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dvc metrics diff #2995

dvc metrics diff #2995

dmpetrov commented Dec 22, 2019

efiop commented Dec 22, 2019

dmpetrov commented Dec 23, 2019

DavidGOrtega commented Dec 26, 2019 •

edited

Loading

pared commented Dec 30, 2019

dmpetrov commented Jan 2, 2020

DavidGOrtega commented Jan 2, 2020

pared commented Jan 3, 2020

dmpetrov commented Jan 3, 2020

DavidGOrtega commented Jan 3, 2020 •

edited

Loading

jorgeorpinel commented Jan 27, 2020 •

edited

Loading

jorgeorpinel commented Jan 27, 2020 •

edited

Loading

efiop commented Jan 29, 2020

dmpetrov commented Feb 2, 2020

dvc metrics diff #2995

dvc metrics diff #2995

Comments

dmpetrov commented Dec 22, 2019

efiop commented Dec 22, 2019

dmpetrov commented Dec 23, 2019

DavidGOrtega commented Dec 26, 2019 • edited Loading

pared commented Dec 30, 2019

dmpetrov commented Jan 2, 2020

DavidGOrtega commented Jan 2, 2020

pared commented Jan 3, 2020

dmpetrov commented Jan 3, 2020

DavidGOrtega commented Jan 3, 2020 • edited Loading

jorgeorpinel commented Jan 27, 2020 • edited Loading

jorgeorpinel commented Jan 27, 2020 • edited Loading

efiop commented Jan 29, 2020

dmpetrov commented Feb 2, 2020

DavidGOrtega commented Dec 26, 2019 •

edited

Loading

DavidGOrtega commented Jan 3, 2020 •

edited

Loading

jorgeorpinel commented Jan 27, 2020 •

edited

Loading

jorgeorpinel commented Jan 27, 2020 •

edited

Loading