-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support push/pull/metrics/gc, etc across different commits #1691
Comments
@andrethrill Would you like to compare two specific commits or just the dynamics of your metrics changing across a range of commit? The latter one is probably more suitable for a graphical tool, like tensorboard or something. Or are you looking for a CLI way of doing that, using different filters (e.g. find max metric across N commits)? |
Hi @efiop ! I'm aware of TensorBoard but that's not exactly what I was talking about. I would like to have a way of running a few consecutive different experiments and see their metrics. Just like
If that were to be supported it would be great of course. But for what I'm talking about, just looking at the output in the same form as in |
@andrethrill Ah, so something like |
Exactly @efiop ! And/or some other nice variations of it: |
@andrethrill AFAIK |
dvc metrics show
across different commits
@efiop indeed, I was not thinking from git perspective. The syntax would have to be different :) |
@andrethrill @efiop It seems the ability to Anyhow, I would find the ability to |
Since the logic behind all the commands is similar it's probably make sense to implement it for all commands that support -T, -a options now. |
What about to The syntax of course would be different from git. Something like This approach also solves the issue when you have several local commits and in each commit the single data file tracked by DVC has been overriden. Current implementation of |
@andrethrill @brbarkley @nik123
|
My 2 cents on this.
|
@shcheklein I agree that for simplicity it would be much better to implement EDIT: let me clarify that I am talking here specifically about |
I would also like to start another discussion about
The problem with this approach is that coma is viable character to be included in branch name. So this edge case would break currently considered approach. The other way to do that would be I think we cannot expect users to name branches in a way that would be convinient for us, do you agree? EDIT: as discussed with @Suor, we not necesarily need to use coma as separator, git forbids some characters in branch names, like colons. |
Possible solution: require providing revisions after parsing targets, that would make parsing multiple targets and multiple revisions possible. |
We are using |
@Suor I agree, especially that its short and understandable. |
My thinking was - is it possible to derive from the string that is passed to |
@shcheklein looking through documentation, I think closest example would be using refspec: What do you mean by |
Yep, either it's a commit, branch, tag or a list of those. |
@shcheklein do we actually need to know what it is? AFAIK |
It seems to me, that we need to decide which way we go with implementation of this feature.
I think we should go with the last one, because its faster to use that first one, and does not introduce some strong assumption as do the second approach (I mean requiring passing |
@pared 1 and 2 are tied together. metrics and pull/push/etc should have (if it is feasible) the same syntax for working with references. Unless we decide to redesign it of course, but I don't see the point of that just yet. I totally agree with you, that most would probably just want to have an ability to do something with last N commits or something, so we need to give that syntax a bit of though, which might actually change the approach with --revs. We've discussed that the second approach (the one that is requiring passing revs after targets) is absolutely terrible, just forget about it 🙂
@shcheklein I agree with @pared , this is a terrible idea, git doesn't distinguish between those so neither should we, especially just to adopt some joining syntax. I would much rather go with
@pared This makes a lot of sense to me from user perspective, but I would probably go with something like
I'm not sure |
@efiop by saying |
any update on this issue? I see it have been declared "important" but also removed from "In progress"....Would love to have this! |
It seems to me that what the user wanted to accomplish (dvc metrics show accross different commits -- making small parameter changes and checking the metrics for these parameter values) can be implemented more easily and cleaner with directories for each experiment. In general, let's say that the user has a table with parameters and their values. He can write a script that for each parameter values creates a new experiment directory and (re)produces the results. Then he stores on the table all the results (metrics), removes all the experiment directories (cleanup), and commits on Git this table that contains the parameter values and the corresponding results. This is much cleaner than making a small commit for each parameter value and considering each commit as an experiment. Regarding the other idea of limiting the output of |
@yfarjoun Sorry for such a huge delay. We've introduced required changes for internal brancher, as well as introduced non-official hidden |
Btw, if anyone would be willing to give a shot contributing a patch for this, we will be happy to help 🙂 |
thanks for the update. no need to apologize, I just wanted to make sure you know that this is still a desired feature! |
To give a new user's perspective on the issue (talking about
... so the actual behaviour of |
And yet another confusing and missing option to push multiple commits I believe - iterative/dvc.org#1087 ... may be also make sense to have |
Is this feature still in plans? I ended up with little workaround for pushing data among various commits. I simply added git hook at
Of course it noticeably increases time for each commit but it also solves my problem with data synchronization. I hope it would help someone else but me. |
Hi! Resurrecting this discussion 🧟 (per a support question related to deep learning: having to pick a winner from 500K epochs, and it's definitely not the last one): Specifically on metrics diff commands, refer to #4211: But what about accepting standard Git commit ranges? (Both
|
I don't think this issue is really related to that discussion. Epoch is not the result of the run, so there is no commit or model for each of those. In current terms it might be a datapoint in some plot or simply an intermediate state, which might be saved or not upon users wish. |
I think you're right with respect to that particular user's support case. Still I think this idea is worth considering for some of our commands:
|
p.s. add |
Now that we have experiment flags like |
Currently
dvc metrics show
can show metric values across different branches (-a
) and different tags (-T
).Can you consider supporting showing different metric values across different commits in the same branch?
The background of this is (simplified example): say I'm currently training a model, where I'm changing a certain parameter,
param1
(for instance, number of trees in a forest). The way I probably would like to work is to find a first value forparam1
, commit the current state, continue changingparam1
and continue committing the successive states that I consider worth saving. At some point I would like to look back and identify the setup that gave me the best results.The way DVC currently works forces me to create a new branch/tag for each trial I want to keep track of, and this seems a bit overwhelming.
Depending on how different the experiments I'm running are and their level of granularity I could decide how to keep track of them (new commits VS new branches/tags).
Notes:
The text was updated successfully, but these errors were encountered: