Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

status: support outputs as targets [qa] #4191

Closed
jorgeorpinel opened this issue Jul 10, 2020 · 18 comments · Fixed by #4433
Closed

status: support outputs as targets [qa] #4191

jorgeorpinel opened this issue Jul 10, 2020 · 18 comments · Fixed by #4433
Labels
feature request Requesting a new feature p2-medium Medium priority, should be done, but less important

Comments

@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Jul 10, 2020

Bug Report

UPDATE: Jump to #4191 (comment)


Looks like dvc status is recursive by default now (maybe always was) from some testing I just did:

λ dvc status
foo.dvc:
        changed outs:
                modified:           foo
data\raw.dvc:
        changed outs:
                modified:           data\raw

λ dvc status -R data/
data\raw.dvc:
        changed outs:
                modified:           data\raw
  1. So it's really just useful to limit the status to specific directories (not really about recursion).
  2. Why not just support dirs AND FILES as targets of the command, like all other commands that take targets (I think)?

Relater to iterative/dvc.org#1384 (comment)

  1. In fact this is already supported by status, but only in remote mode (-r or -c options), which is kind of confusing (and complicates the docs).

Please provide information about your setup

DVC version: 1.1.2
Python version: 3.7.5
Platform: Windows-10-10.0.18362-SP0
Binary: True
Package: exe
Supported remotes: azure, gdrive, gs, hdfs, http, https, s3, ssh, oss
Cache: reflink - not supported, hardlink - supported, symlink - not supported
Filesystem type (cache directory): ('NTFS', 'C:\\')
Repo: dvc, git
Filesystem type (workspace): ('NTFS', 'C:\\')
@jorgeorpinel jorgeorpinel added the discussion requires active participation to reach a conclusion label Jul 10, 2020
@efiop
Copy link
Contributor

efiop commented Jul 10, 2020

@jorgeorpinel Non -c status just doesn't support granularity right now.

@efiop
Copy link
Contributor

efiop commented Jul 10, 2020

Closing in favor of #2180

@efiop efiop closed this as completed Jul 10, 2020
@efiop
Copy link
Contributor

efiop commented Jul 10, 2020

To clarify:

Why not just support dirs AND FILES as targets of the command, like all other commands that take targets (I think)?

It does support files as targets, but not files(or subdirs) in tracked directories (this is what -c/push/pull/etc support).

@jorgeorpinel
Copy link
Contributor Author

It does support files as targets

@efiop so is this a bug? 👇

λ ls foo*
foo  foo.dvc
λ dvc status foo
ERROR: failed to obtain data status - 'dvc.yaml' does not exist.

@efiop
Copy link
Contributor

efiop commented Jul 11, 2020

@jorgeorpinel Yes, looks like a bug. Reopening. Thanks!

@efiop efiop reopened this Jul 11, 2020
@efiop efiop changed the title status: what's the use of -R now? why not full granularity? [qa] status: support file granularity Jul 11, 2020
@efiop efiop changed the title status: support file granularity status: support outputs as targets Jul 11, 2020
@efiop efiop added feature request Requesting a new feature p2-medium Medium priority, should be done, but less important and removed discussion requires active participation to reach a conclusion labels Jul 11, 2020
@jorgeorpinel jorgeorpinel added bug Did we break something? and removed feature request Requesting a new feature labels Jul 11, 2020
jorgeorpinel added a commit to iterative/dvc.org that referenced this issue Jul 11, 2020
adding tracked files/dirs and removing -R
rel iterative/dvc#4191
jorgeorpinel added a commit to iterative/dvc.org that referenced this issue Jul 11, 2020
@jorgeorpinel

This comment has been minimized.

@jorgeorpinel

This comment has been minimized.

@jorgeorpinel
Copy link
Contributor Author

jorgeorpinel commented Jul 13, 2020

Another related question: dvc repro also has targets but only accepts stage and .dvc file names, right? (Otherwise, -R would also seem obsolete there.) If so, should it accept file/dir names (and support granularity)? Prob not.

@efiop
Copy link
Contributor

efiop commented Jul 13, 2020

@jorgeorpinel -R is not obsolete, it searches recursively for dvc files. But when you specify target explicitly, it finds 1 dvc file that it belongs too, so it is not the same.

@jorgeorpinel
Copy link
Contributor Author

OK thanks. So dvc repro also supports tracked files and thus granularity already? Will double check and include in #1384 then.

What about run, (un)freeze, remove, unprotect, update, and metrics/plots show? Will ask in the appropriate issue...

@efiop
Copy link
Contributor

efiop commented Jul 13, 2020

@jorgeorpinel tracked files and granularity is not the same thing. When we were talking about push/pull we were talking about being able to pull specific file (or subdir) in a tracked dataset (dvc add data_dir). In case of repro we support addressing stages by outputs, which is a different thing.

@jorgeorpinel
Copy link
Contributor Author

Got it. So I think we need to clarify when file/dir targets support granularity and when they don't, after all. I'll add a note in all the sync-related ones (in #1384) both in the description and in Specific target examples.

@jorgeorpinel jorgeorpinel changed the title status: support outputs as targets status: support outputs as targets [qa] Jul 14, 2020
@karajan1001
Copy link
Contributor

It does support files as targets

@efiop so is this a bug? 👇

λ ls foo*
foo  foo.dvc
λ dvc status foo
ERROR: failed to obtain data status - 'dvc.yaml' does not exist.

@jorgeorpinel , @efiop
Supporting outputs as targets is a feature, not a bug I think?

image

Targets in dvc status are stages just the same as what in dvc repo. But the error message ERROR: failed to obtain data status - 'dvc.yaml' does not exist. is a bug. In a previous version (0.94.1), it used to be
image

So the solution is either to support outputs as targets or to improve the message.

@efiop efiop added feature request Requesting a new feature and removed bug Did we break something? labels Aug 18, 2020
@efiop
Copy link
Contributor

efiop commented Aug 18, 2020

@karajan1001 Adjusted the labels, thanks! 🙂

Since 1.0 we've changed the defaults, hence why it looks for dvc.yaml first. targets are shared by status and status -c, hence the confusion.

@karajan1001
Copy link
Contributor

@efiop
In a local status mode, we use Repo.collect to collect stages

dvc/dvc/repo/status.py

Lines 28 to 33 in 32b5b33

def _local_status(self, targets=None, with_deps=False, recursive=False):
if targets:
stages = cat(
self.collect(t, with_deps=with_deps, recursive=recursive)
for t in targets
)

While in a cloud status mode we are using Repo.collect_granular

dvc/dvc/repo/__init__.py

Lines 378 to 382 in 32b5b33

pairs = cat(
self.collect_granular(
target, recursive=recursive, with_deps=with_deps
)
for target in targets

dvc/dvc/repo/__init__.py

Lines 340 to 352 in 32b5b33

def used_cache(
self,
targets=None,
all_branches=False,
with_deps=False,
all_tags=False,
all_commits=False,
remote=None,
force=False,
jobs=None,
recursive=False,
used_run_cache=None,
):

dvc/dvc/repo/status.py

Lines 79 to 89 in 32b5b33

used = self.used_cache(
targets,
all_branches=all_branches,
all_tags=all_tags,
all_commits=all_commits,
with_deps=with_deps,
force=True,
remote=remote,
jobs=jobs,
recursive=recursive,
)

dvc/dvc/repo/status.py

Lines 40 to 50 in 32b5b33

def _cloud_status(
self,
targets=None,
jobs=None,
remote=None,
all_branches=False,
with_deps=False,
all_tags=False,
recursive=False,
all_commits=False,
):

And if the one stage and one output have the same name, the stage would win. This will prevent the users from selecting those outputs which have the same name with stages. (Before version 1.0, stages name will always have a .dvc suffix which prevents this problem)

image

@karajan1001
Copy link
Contributor

dvc/dvc/repo/__init__.py

Lines 302 to 304 in 32b5b33

# parsing is ambiguous when it does not have a colon
# or if it's not a dvcfile, as it can be a stage name
# in `dvc.yaml` or, an output in a stage.

Wow, had been considered before. stage is preferred to output

@skshetry
Copy link
Member

skshetry commented Aug 19, 2020

@karajan1001, there's a workaround: dvc status ./<filename> for now. 🙂

But, clearly, it's not implemented for status, only for -c.

@skshetry
Copy link
Member

And, we don't recommend to have a stage name same as outputs.

karajan1001 added a commit to karajan1001/dvc that referenced this issue Aug 20, 2020
fix iterative#4191
1. Add a related test which would fail on current version.
pared pushed a commit that referenced this issue Aug 25, 2020
* Outputs as target supporting for `dvc status`

fix #4191
1. Add a related test which would fail on current version.

* Pass the tests

1. add deps to the tests.
2. make change to pass the tests.

* Update tests/func/test_status.py

Co-authored-by: Saugat Pachhai <[email protected]>

* Solve some change request.

Co-authored-by: Saugat Pachhai <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Requesting a new feature p2-medium Medium priority, should be done, but less important
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants