Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

list: when adding a file to a dataset isdir is always true #6094

Closed
mattseddon opened this issue Jun 2, 2021 · 7 comments
Closed

list: when adding a file to a dataset isdir is always true #6094

mattseddon opened this issue Jun 2, 2021 · 7 comments
Labels
A: status Related to the dvc diff/list/status bug Did we break something? p1-important Important, aka current backlog of things to do product: VSCode Integration with VSCode extension

Comments

@mattseddon
Copy link
Member

Bug Report

Description

When adding to a tracked dataset (folder) isdir from dvc list . path/to/folder --dvc-only --show-json is always true for the addition, even if it is a file.

Reproduce

  1. Create a new file in a tracked folder: echo "some text" > data/MNIST/raw/newtest.txt
  2. Run list against the tracked folder: dvc list . data/MNIST/raw --show-json --dvc-only
  3. Inspect the output:
[
  { isout: false, isdir: true, isexec: false, path: 'newtest.txt' },
  { isout: false, isdir: false, isexec: false, path: 't10k-images-idx3-ubyte' },
  ...
]

Expected

isdir in the output will reflect whether or not the new path represents a directory.

Environment information

Output of dvc doctor:

DVC version: 2.1.0 (pip)
---------------------------------
Platform: Python 3.8.7 on macOS-11.3.1-x86_64-i386-64bit
Supports: http, https, s3
Cache types: reflink, hardlink, symlink
Cache directory: apfs on /dev/disk1s5s1
Caches: local
Remotes: https
Workspace directory: apfs on /dev/disk1s5s1
Repo: dvc (subdir), git

Originally raised here iterative/vscode-dvc#330 (comment).

@pared
Copy link
Contributor

pared commented Jun 2, 2021

I am able to reproduce:

def test_add_new_file(tmp_dir, scm, dvc):
    tmp_dir.dvc_gen({"dir": {"file": "content"}}, commit="initial commit")

    tmp_dir.gen({"dir": {"new_file": "other content"}})

    result = dvc.ls(str(tmp_dir), "dir", dvc_only=True)
    assert result[1]['path'] == 'new_file'
    assert not result[1]['isdir']

@pared pared added bug Did we break something? p2-medium Medium priority, should be done, but less important labels Jun 2, 2021
@shcheklein
Copy link
Member

Some context: this one breaks a regular UI workflow on the VS code side since we rely a lot on this command now.

@dberenbaum dberenbaum added the product: VSCode Integration with VSCode extension label Jun 2, 2021
@pared
Copy link
Contributor

pared commented Jun 3, 2021

@shcheklein then I guess we should bump the importance?

@pared pared added p1-important Important, aka current backlog of things to do and removed p2-medium Medium priority, should be done, but less important labels Jun 3, 2021
@daavoo daavoo added the A: status Related to the dvc diff/list/status label Feb 22, 2022
@dberenbaum
Copy link
Collaborator

@mattseddon Is this still a need for vs code? Is it a blocker?

@mattseddon
Copy link
Member Author

Should not be required once we get the new status output.
Worst case scenario we can read the file system.

@mattseddon mattseddon reopened this Apr 6, 2022
@dberenbaum
Copy link
Collaborator

dberenbaum commented Apr 27, 2022

This looks like it was actually fixed in #7345 and isdir is now correct.

$ git clone [email protected]:iterative/vscode-dvc.git
Cloning into 'vscode-dvc'...
remote: Enumerating objects: 27746, done.
remote: Counting objects: 100% (287/287), done.
remote: Compressing objects: 100% (142/142), done.
remote: Total 27746 (delta 145), reused 268 (delta 138), pack-reused 27459
Receiving objects: 100% (27746/27746), 10.50 MiB | 7.28 MiB/s, done.
Resolving deltas: 100% (20050/20050), done.
$ cd vscode-dvc/demo
$ dvc pull
A       data/MNIST/raw/
A       model.pt
A       plots/
A       logs/
4 files added and 12 files fetched
$ echo "some text" > data/MNIST/raw/newtest.txt
$ dvc list . data/MNIST/raw --show-json
[
  {
    "isout": false,
    "isdir": false,
    "isexec": false,
    "path": "newtest.txt"
  },
  {
    "isout": true,
    "isdir": false,
    "isexec": false,
    "path": "t10k-images-idx3-ubyte"
  },
  {
    "isout": true,
    "isdir": false,
    "isexec": false,
    "path": "t10k-images-idx3-ubyte.gz"
  },
  {
    "isout": true,
    "isdir": false,
    "isexec": false,
    "path": "t10k-labels-idx1-ubyte"
  },
  {
    "isout": true,
    "isdir": false,
    "isexec": false,
    "path": "t10k-labels-idx1-ubyte.gz"
  },
  {
    "isout": true,
    "isdir": false,
    "isexec": false,
    "path": "train-images-idx3-ubyte"
  },
  {
    "isout": true,
    "isdir": false,
    "isexec": false,
    "path": "train-images-idx3-ubyte.gz"
  },
  {
    "isout": true,
    "isdir": false,
    "isexec": false,
    "path": "train-labels-idx1-ubyte"
  },
  {
    "isout": true,
    "isdir": false,
    "isexec": false,
    "path": "train-labels-idx1-ubyte.gz"
  }
]

However, the new file only shows up in dvc list without --dvc-only. According to #5712 (comment), a new file in a tracked folder shouldn't show up at all in dvc list with or without --dvc-only. dvc list should only show the dvc-committed state of tracked folders.

So to borrow from @pared above, the test should probably be more like:

@pytest.mark.parametrize("dvc_only", [True, False])
def test_add_new_file(tmp_dir, scm, dvc, dvc_only):
    tmp_dir.dvc_gen({"dir": {"file": "content"}}, commit="initial commit")

    tmp_dir.gen({"dir": {"new_file": "other content"}})

    result = dvc.ls(str(tmp_dir), "dir", dvc_only=dvc_only)
    assert len(result) == 1
    assert result[0]['path'] == 'file'

I think we can probably close this one and comment on this test case in #5712. WDYT?

@mattseddon
Copy link
Member Author

mattseddon commented Apr 27, 2022

Thanks, @dberenbaum. The extension is no longer reliant on list (we'll use the new status for all of our data tracking needs) so I was only keeping this open for the bug. Happy to go with the behaviour described.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: status Related to the dvc diff/list/status bug Did we break something? p1-important Important, aka current backlog of things to do product: VSCode Integration with VSCode extension
Projects
None yet
Development

No branches or pull requests

5 participants