Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

diff: --hide-missing doesn't seem to work #7620

Closed
kalebo opened this issue Apr 22, 2022 · 1 comment
Closed

diff: --hide-missing doesn't seem to work #7620

kalebo opened this issue Apr 22, 2022 · 1 comment
Labels
A: status Related to the dvc diff/list/status diff/show Related to the diff/show feature regression Ohh, we broke something :-(

Comments

@kalebo
Copy link

kalebo commented Apr 22, 2022

Bug Report

Description

The documentation seems to suggest that I can list only dvc files that have been explicitly checked out, ignoring any that don't exist in the local cache. This is would be very useful in my particular workflow as I am primarily doing partial checkouts of specific datasets in DVC. E.g., fetching only a specific directory of dvc files.

However, the issue is that --hide-missing doesn't seem to work. This was confirmed by Gao on the discord server.

Reproduce

  • dvc add and push several files.
  • git commit and push to a remote
  • clone the git repo on a new machine or on the same machine after clearing the local cache.
  • dvc pull a single specific file
  • run dvc diff and note that all the unpulled files are marked as deleted
  • run dvc diff --hide-missing and note that the diff is identical to the previous command

Expected

The expected behavior is that the is no output with --hide-missing as the unpulled files are hidden and the one explicitly pulled file has not been modified.

Environment information

DVC version: 2.9.4 (pip)
---------------------------------
Platform: Python 3.9.2 on Linux-5.10.0-10-amd64-x86_64-with-glibc2.31
Supports:
        webhdfs (fsspec = 2022.1.0),
        http (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
        https (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
        s3 (s3fs = 2022.1.0, boto3 = 1.21.7)
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/mapper/vg0-scratch
Caches: local
Remotes: s3
Workspace directory: ext4 on /dev/mapper/vg0-scratch
Repo: dvc, git
@karajan1001 karajan1001 added bug Did we break something? diff/show Related to the diff/show feature labels Apr 23, 2022
@daavoo daavoo added the A: status Related to the dvc diff/list/status label Apr 26, 2022
@dberenbaum
Copy link
Collaborator

This looks like it was introduced in #7353 and is coming from changes to the if statement in:

dvc/dvc/repo/diff.py

Lines 187 to 199 in c81d841

def _filter_missing(repo_fs, paths):
for path in paths:
try:
info = repo_fs.info(path)
dvc_info = info.get("dvc_info")
if (
dvc_info
and info["type"] == "directory"
and not dvc_info["meta"].obj
):
yield path
except FileNotFoundError:
pass

There are a couple different issues here:

  1. Under the logic introduced in dvcfs: detach from pipeline outputs #7353, it's only possible for directories to be considered missing.
  2. Even directories appear not to be working as expected:
$ git clone [email protected]:iterative/example-get-started.git
$ cd example-get-started
$ dvc diff --hide-missing
Deleted:
    data/data.xml
    data/features/
    data/prepared/
    model.pkl

files summary: 2 deleted

@efiop Any thoughts on this?

@daavoo daavoo added regression Ohh, we broke something :-( and removed bug Did we break something? labels Oct 11, 2022
@mattseddon mattseddon closed this as not planned Won't fix, can't repro, duplicate, stale Mar 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: status Related to the dvc diff/list/status diff/show Related to the diff/show feature regression Ohh, we broke something :-(
Projects
None yet
Development

No branches or pull requests

5 participants