Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dvc metrics diff #2995

Closed
dmpetrov opened this issue Dec 22, 2019 · 13 comments
Closed

dvc metrics diff #2995

dmpetrov opened this issue Dec 22, 2019 · 13 comments
Assignees
Labels
feature request Requesting a new feature p2-medium Medium priority, should be done, but less important product: VSCode Integration with VSCode extension research

Comments

@dmpetrov
Copy link
Member

Today, we can track metrics but metrics become much more valuable when you can see differences/improvements over time (commits/branches). A new dvc metrics diff command is needed.

$ dvc metrics diff HEAD^^
        metr.json:
		{
		    "top1-error": 0.0385,
		    "top5-error": 0.039221
		}

Open question: What should we do about not float/integer metrics? Let's don't support (ignore) them. Any other ideas?

Note, we should deal with float formatting carefully - we don't want to see diff values like 0.0001624000000000000001.

Also, an easy-to-parse output option is required:

$ dvc metrics diff HEAD^^ --to-json
{
    {
        "file": "metr.json"
	"changed": {
            "top1-error": {
                "old": 0.0385,
                "new": 0.0384824,
                "diff": 0.0000176
            },
        "removed": {
            "top5-error": {
                "old": 0.039221
            }
        }
        "added": {
            "loss": {
                "new": 0.0384824
            }
        }
}
@dmpetrov dmpetrov added the feature request Requesting a new feature label Dec 22, 2019
@efiop efiop added the p2-medium Medium priority, should be done, but less important label Dec 22, 2019
@efiop
Copy link
Contributor

efiop commented Dec 22, 2019

A workaround would be pretty trivial though. dvc get from HEAD^^, dvc get from current HEAD and run old plain diff on it.

@dmpetrov
Copy link
Member Author

@efiop yeap! Plain diff won't work, unfortunately - the order of metrics might easily change and you see a total mess. Also, we need a nice looking diff numbers, not old and new values.

But agree, we just need a nice looking shortcut for that.

@efiop efiop added the research label Dec 24, 2019
@dmpetrov dmpetrov added the product: VSCode Integration with VSCode extension label Dec 24, 2019
@DavidGOrtega
Copy link

DavidGOrtega commented Dec 26, 2019

Are you guys contemplating in the specs every stage of the pipeline?
Something like this:

{
      "train": {
        "train_time": "3d 8h 23m 15s",
        "memory_consume": "8Gb"
      },
      "eval": {
        "inference_time": 0.001,
        "memory_consume": "124Mb",

        "top1-error": 0.0385,
        "top5-error": 0.039221
      }
    }

@pared
Copy link
Contributor

pared commented Dec 30, 2019

@dmpetrov

Open question: What should we do about not float/integer metrics? Let's don't support (ignore) them. Any other ideas?

For starters, I would not support them, but I think at some point we could add functionality allowing the user to define how to calculate the difference between particular metrics, by executing some custom-defined method. That could resemble how we are dealing with summon right now.
It sounds kind of complicated as for now.

Question from me:
Do we want to support diff only for 2 revs ("old" and "new")? Or will we want to support revs ranges at some point (eg dvc metrics diff HEAD~10 for observing changes during last 10 iterations)? If so, the notion of "old", "new" and "diff" might have to be changed.

@efiop efiop self-assigned this Jan 2, 2020
@dmpetrov
Copy link
Member Author

dmpetrov commented Jan 2, 2020

That could resemble how we are dealing with summon right now.

👍 I have the same thoughts - we need a "custom" metrics file to support multiple formats (our basic json + csv are not enough). Probably, different types of metrics can be represented as separate summoning-objects (not necessary metrics inside summoning object).

Metrics are a bit more complicated than just numbers. There are numbers, numbers with сonfidence level, confusion matrixes and so one. It is not easy to find a single way of dealing with them. Even for numbers metrics, it is important to know if you are minimizing or maximizing it (is +013 good or bad?) to visualize properly and find the "best one".

Do we want to support diff only for 2 revs

Only two. The same idea as git diff.

@DavidGOrtega
Copy link

Question from me:
Do we want to support diff only for 2 revs ("old" and "new")? Or will we want to support revs ranges at some point (eg dvc metrics diff HEAD~10 for observing changes during last 10 iterations)? If so, the notion of "old", "new" and "diff" might have to be changed.

One of the best features would be observability. One of the best features of dvc is having all the metrics all together. I would vote for having that somehow.

@pared
Copy link
Contributor

pared commented Jan 3, 2020

@DavidGOrtega @dmpetrov Well, that seems like a conflict "behave as git" vs "this might be useful in ML use-case". I guess we can always think about implementing another new metrics command (--range flag for diff could also be option, but I think it would not make too much sense, if we are going to stick to old, new notion for diff)

@dmpetrov
Copy link
Member Author

dmpetrov commented Jan 3, 2020

@DavidGOrtega I'd appreciate if you could provide a use case (with a command example) when it will be helpful (and not easily replicable with multiple dvc diff)

@DavidGOrtega
Copy link

DavidGOrtega commented Jan 3, 2020

@dmpetrov is your question is regarding my question

Are you guys contemplating in the specs every stage of the pipeline?

Im just only asking without any intention of going towards it, was only to have a better picture.

If your question is why "all together" observability would be one of the best features is to have a general overview of the experiment. Conceptually an experiment can have many permutations of data, parameters and even implementation but at the end what should matter is how it performs according to the metrics that you want to measure.
The comparison between the last two may not mean that you have the best performance despite it has improved compared with the last one.

So, yes, you could be able to do multiple diffs but for an experiment with many trials that would be difficult to handle.

@efiop efiop mentioned this issue Jan 4, 2020
7 tasks
@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Jan 27, 2020

Hi. Is this closed by #3051? (docs on their way too: iterative/dvc.org/pull/933)

p.s. I know I'm late to this party but it seems more like a dvc metrics delta to me. diff typically shows the base value removed and then the current value added (doesn't compute an arithmetic difference).

@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Jan 27, 2020

A workaround would be pretty trivial...
agree, we just need a nice looking shortcut

Please note that we already provided a short script for this same workaround some time ago in #770 (comment) ! @efiop @dmpetrov

Even for numbers metrics, it is important to know if you are minimizing or maximizing it (is +013 good or bad?)

Agree. For certain numeric scales and ranges a simple B-A calculation may yield no meaning. For these cases maybe add --min or --max flags (or --compare=min/max/etc) so it just tells you which version has the best metric (and its value)?

@efiop
Copy link
Contributor

efiop commented Jan 29, 2020

@jorgeorpinel I was expecting some follow up requirements on this one, but looks like everything mentioned in this ticket is already implemented. So let's close it for now. Thanks for the heads up!

p.s. I know I'm late to this party but it seems more like a dvc metrics delta to me. diff typically shows the base value removed and then the current value added (doesn't compute an arithmetic difference).

But this diff is a metrics diff, direct comparison with git diff is not quite correct here. Plus, git diff shows before and after without the difference simply because it is working with strings and we, in dvc, have an ability to tell if something is a number. API part is actually closer to git diff, as it has both old and new, but for CLI users don't want to see the old value, they just want to see the change. I'm sure there will be some followups here, after we receive some feedback.

@efiop efiop closed this as completed Jan 29, 2020
@dmpetrov
Copy link
Member Author

dmpetrov commented Feb 2, 2020

it seems more like a dvc metrics delta to me. diff typically shows the base value removed and then the current value added (doesn't compute an arithmetic difference).

@jorgeorpinel Re the ‘git diff’ - it is a problem of Git which just cannot quantify the difference. If they can they would show numbers instead of line-by-line diff 😀

But really, if number diff does not work for dvc metrics it means the metrics were not defined properly. We need to introduce a more strict requirements for metric files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Requesting a new feature p2-medium Medium priority, should be done, but less important product: VSCode Integration with VSCode extension research
Projects
None yet
Development

No branches or pull requests

5 participants