-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
status
: add --dvc-only
& --outs-only
flags
#5895
Comments
status
: add --dvc-only --outs-only flagsstatus
: add --dvc-only
--outs-only
flags
status
: add --dvc-only
--outs-only
flagsstatus
: add --dvc-only --outs-only flags
status
: add --dvc-only --outs-only flagsstatus
: add --dvc-only
--outs-only
flags
status
: add --dvc-only
--outs-only
flagsstatus
: add --dvc-only
& --outs-only
flags
What about |
Hey @dberenbaum, I've checked out It also provides more granular support for directories which we have decided to not pursue right now. Do you think the performance of Thanks, |
I think your initial instinct was probably right that It seem like there are two ways in which
Are both of these correct, and is one a bigger issue than the other? |
That is correct. Both points are equally important and interrelated. What I mean by this is that as the return format is not what we need we have to post-process which makes the performance worse. Hope this helps, let me know if you need anything further. As we move forwards I'm going to try and push as much "data transformation" as possible upstream and back into the cli. Would it be helpful if I first ask the question in a ticket like this and then start to make contributions? Or would you rather that the core team focuses on making these changes? Could be a question for @shcheklein as well but LMK what you think. |
Yeah, we might need to step back and think about the approach here because unlike #5712 , it seem like this (and other issues like #5881) no longer really generate value for regular users of dvc (and may actually add confusion by introducing extra flags/commands). Would it be possible and make sense to use Python to directly access the dvc api? It wouldn't take much to get what you're asking for here if it doesn't need to be tied to a CLI output: >>> import collections, itertools
>>> from dvc.api import Repo
>>> outs = itertools.chain.from_iterable(stage.outs for stage in Repo().stages)
>>> status_list = [out.status() for out in outs if out.status()]
>>> status_list
[{'data/features': 'modified'}, {'model.pkl': 'modified'}, {'data/data.xml': 'deleted'}]
>>> status_dict = dict(collections.ChainMap(*status_list))
>>> status_dict
{'data/data.xml': 'deleted', 'model.pkl': 'modified', 'data/features': 'modified'} |
No, we can't use DVC API directly (it's JS to Python). And since there is no DVC API yet, it would make the extension depend on the implementation details? Probably, not the best thing to have when (unlike Studio) we don't have that much of a control to fix it quick.
Fair enough, how about providing additional info in the JSON response so that we can distinguish DVC-tracked and Git-tracked outputs? That should be enough to start, @mattseddon ? Also, Besides, of course, reviewing the whole output one more time before the release - since it's the first time we'll start to depend on it. |
I also don't think the extension would be the right place to put this implementation.
Any improvement is good and I'm happy to take small steps in the right direction. However, there is one other thing that I ran into this week that lends itself to use needing to take a different approach, I mentioned it on a PR but it's more applicable to bring it up here 👇🏻 Should we be concerned that Screen.Recording.2021-05-03.at.11.55.49.am.movI see these as the current options for how to proceed (long term):
Would be good to chat through what you guys think, definitely something to talk about in planning @shcheklein |
Yes, it looks to me like it is optimized to collect info for a specific target only. @skshetry can confirm. Does that make the other questions irrelevant for now? I'm lost on whether |
It's expected from the DVC perspective. It's a really good questions to show those (
it might be enough to try for now, I think. @mattseddon can correct me if I'm wrong here. |
Sorry, @mattseddon, I somehow missed your comment. As @shcheklein said, that's a great point. It's probably not obvious enough that
If the DVC SCM view is about showing which files have changes that have not yet been tracked/cached by DVC, then I think you want |
@mattseddon No rush since I know you are still probably working through the design, but whenever you are confident that you do or don't need this feature, let us know so we can either prioritize or close this. |
Thanks @dberenbaum. I have been working in the experiments space this week and we haven't done any further review on the trees but will let you know (or close) when we get back to it. |
@dberenbaum based on iterative/vscode-dvc#318 (comment) I don't think that this is either a priority or even a want for the vs code extension at the moment. If anything we would probably want to make changes to |
My current understand is that the
status
command is focused on pipelines. In the VS Code project we currently only need information on the dvc tracked files (for use in our custom SCM view and file decoration).Ideally, we would like to have a way to limit the output to
changed outs
which are tracked by DVC.My suggestion for doing this would be to add two flags
--dvc-only
and--outs-only
.It would also be good to change the shape of the output (in this instance) to return either a list of dicts. I.e:
or one giant dict:
Each have their own benefits, happy to discuss and / or work around what you guys think is best.
Also, I am aware that adding flags and changing the shape of status might not be the right answer and in fact we need a new command. Very much open to that as well. Just wanted to get a discussion started.
Please reach out to me if you need any clarification at all.
Thank you
The text was updated successfully, but these errors were encountered: