Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Back-fill metrics #4210

Closed
jonilaserson opened this issue Jul 15, 2020 · 5 comments
Closed

Back-fill metrics #4210

jonilaserson opened this issue Jul 15, 2020 · 5 comments
Labels
feature request Requesting a new feature p2-medium Medium priority, should be done, but less important research

Comments

@jonilaserson
Copy link

Say I am maintaining some data using dvc, and at some point decide I want to have a metric showing some data statistics (i.e. track how many positive samples I have). So I create a pipeline that computes this metric. How do I back-fill it to previous commits? The goal is to plot a graph showing this metric at different stages of the project.

Specifically: If the commit history is A->B->master, I know I can checkout data-commit 'A' and run the pipeline, but I won't be able to save the metric output for commit 'A' in the context of commit 'A', right? At most I will be able to commit it in a new commit (A') whose parent is 'A'. It would have been better if I could have committed it to 'A' directly. Why? Because A' is not an ancestor of 'master', so it's not naturally included in the development of my data.

@pmrowla
Copy link
Contributor

pmrowla commented Jul 16, 2020

More context: https://discuss.dvc.org/t/fill-back-metrics/441/3

@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Jul 21, 2020

  • I think for starters metrics diff should accept multiple revisions (like plots diff). Having that, we can solve this with Git:

How do I back-fill it to previous commits?

Since the metrics file didn't exist in previous commits, one way is by using git cherry-pick the commit that introduces the metrics-generating stage (let's call it commit C) into all the previous commits of interest. Then run dvc metrics diff A' B' C

More detailed explanation in https://discuss.dvc.org/t/fill-back-metrics/441/4
Did you try that @jonilaserson? (But it's only supported by plots diff at the moment.)

This is limited to a relatively small number of commits, though.

@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Jul 21, 2020

A more advanced solution (not exclusive with the previous one), would be for DVC to actually back-fill the metrics to previous commits by itself, by trying to run the metrics-generating stage (in commit C) on top of all the previous commits indicated to the command — by the way, what would the syntax look? Should it accept revision ranges (as mentioned in #1691 (comment))?

But it would only support workspace versions where all the dependencies for this metrics-generating stage exist, possibly just skipping the commits where that's not the case.

@jonilaserson
Copy link
Author

Thanks, I'll give it a shot.

@efiop
Copy link
Contributor

efiop commented Dec 8, 2023

Closing as stale.

@efiop efiop closed this as not planned Won't fix, can't repro, duplicate, stale Dec 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Requesting a new feature p2-medium Medium priority, should be done, but less important research
Projects
None yet
Development

No branches or pull requests

4 participants