Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DVC add fails when there are broken symlinks in the dataset #3717

Closed
greaber opened this issue May 1, 2020 · 1 comment
Closed

DVC add fails when there are broken symlinks in the dataset #3717

greaber opened this issue May 1, 2020 · 1 comment
Labels
bug Did we break something? p2-medium Medium priority, should be done, but less important

Comments

@greaber
Copy link

greaber commented May 1, 2020

DVC version: 0.93.0
Python version: 3.7.4
Platform: Linux-4.15.0-1065-aws-x86_64-with-debian-buster-sid
Binary: False
Package: pip
Supported remotes: http, https, s3
Cache: reflink - supported, hardlink - supported, symlink - supported
Repo: dvc, git

When t:rying to add TED-LIUM Release 3 (https://www.openslr.org/51/) to a data registry, dvc add failed. The problem was that this dataset has a few broken symlinks. Removing these symlinks fixed the problem. Here is the stack trace from dvc add -v

Traceback (most recent call last):
  File "/home/grant/miniconda3/lib/python3.7/site-packages/dvc/main.py", line 49, in main
    ret = cmd.run()
  File "/home/grant/miniconda3/lib/python3.7/site-packages/dvc/command/add.py", line 20, in run
    fname=self.args.file,
  File "/home/grant/miniconda3/lib/python3.7/site-packages/dvc/repo/__init__.py", line 28, in wrapper
    ret = f(repo, *args, **kwargs)
  File "/home/grant/miniconda3/lib/python3.7/site-packages/dvc/repo/scm_context.py", line 4, in run
    result = method(repo, *args, **kw)
  File "/home/grant/miniconda3/lib/python3.7/site-packages/dvc/repo/add.py", line 86, in add
    stage.commit()
  File "/home/grant/miniconda3/lib/python3.7/site-packages/funcy/decorators.py", line 39, in wrapper
    return deco(call, *dargs, **dkwargs)
  File "/home/grant/miniconda3/lib/python3.7/site-packages/dvc/stage.py", line 155, in rwlocked
    return call()
  File "/home/grant/miniconda3/lib/python3.7/site-packages/funcy/decorators.py", line 60, in __call__
    return self._func(*self._args, **self._kwargs)
  File "/home/grant/miniconda3/lib/python3.7/site-packages/dvc/stage.py", line 828, in commit
    out.commit()
  File "/home/grant/miniconda3/lib/python3.7/site-packages/dvc/output/base.py", line 244, in commit
    self.cache.save(self.path_info, self.info)
  File "/home/grant/miniconda3/lib/python3.7/site-packages/dvc/remote/base.py", line 527, in save
    self._save(path_info, checksum, save_link)
  File "/home/grant/miniconda3/lib/python3.7/site-packages/dvc/remote/base.py", line 533, in _save
    self._save_dir(path_info, checksum, save_link)
  File "/home/grant/miniconda3/lib/python3.7/site-packages/dvc/remote/base.py", line 484, in _save_dir
    self._save_file(entry_info, entry_checksum, save_link=False)
  File "/home/grant/miniconda3/lib/python3.7/site-packages/dvc/remote/base.py", line 431, in _save_file
    assert checksum
AssertionError
@efiop efiop added bug Did we break something? p2-medium Medium priority, should be done, but less important labels May 3, 2020
@skshetry
Copy link
Member

I get the following error now. Although it says data does not exist, the reason is clear.

I think this should error out, and should not suppress or add a file anyway like git does.

$ dvc add data
Adding...
ERROR: output 'data' does not exist: [Errno 2] No such file or directory: '/Users/saugat/Projects/iterative/dvc/data/bar'
Verbose logging

$ dvc add data -v
2024-03-26 15:50:01,042 DEBUG: v3.48.5.dev11+ge223f51e7, CPython 3.12.2 on macOS-14.4-x86_64-i386-64bit
2024-03-26 15:50:01,043 DEBUG: command: /Users/saugat/Projects/iterative/dvc/.venv/bin/dvc add data -v
Adding...
2024-03-26 15:50:01,625 ERROR: output 'data' does not exist: [Errno 2] No such file or directory: '/Users/saugat/Projects/iterative/dvc/data/bar'
Traceback (most recent call last):
  File "/Users/saugat/Projects/iterative/dvc/dvc/output.py", line 1359, in add
    staging, meta, obj = self._build(
                         ^^^^^^^^^^^^
  File "/Users/saugat/Projects/iterative/dvc/dvc/output.py", line 545, in _build
    return build(*args, callback=pb.as_callback(), **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/saugat/Projects/iterative/dvc/.venv/lib/python3.12/site-packages/dvc_data/hashfile/build.py", line 257, in build
    meta, obj = _build_tree(
                ^^^^^^^^^^^^
  File "/Users/saugat/Projects/iterative/dvc/.venv/lib/python3.12/site-packages/dvc_data/hashfile/build.py", line 145, in _build_tree
    meta, obj = _build_file(
                ^^^^^^^^^^^^
  File "/Users/saugat/Projects/iterative/dvc/.venv/lib/python3.12/site-packages/dvc_data/hashfile/build.py", line 74, in _build_file
    meta, hash_info = hash_file(path, fs, name, state=state)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/saugat/Projects/iterative/dvc/.venv/lib/python3.12/site-packages/dvc_data/hashfile/hash.py", line 199, in hash_file
    hash_value, meta = _hash_file(path, fs, name, callback=cb, info=info)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/saugat/Projects/iterative/dvc/.venv/lib/python3.12/site-packages/dvc_data/hashfile/hash.py", line 139, in _hash_file
    info = info or fs.info(path)
                   ^^^^^^^^^^^^^
  File "/Users/saugat/Projects/iterative/dvc/.venv/lib/python3.12/site-packages/dvc_objects/fs/base.py", line 592, in info
    return self.fs.info(path, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/saugat/Projects/iterative/dvc/.venv/lib/python3.12/site-packages/dvc_objects/fs/local.py", line 39, in info
    return self.fs.info(path)
           ^^^^^^^^^^^^^^^^^^
  File "/Users/saugat/Projects/iterative/dvc/.venv/lib/python3.12/site-packages/fsspec/implementations/local.py", line 90, in info
    out = os.stat(path, follow_symlinks=True)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '/Users/saugat/Projects/iterative/dvc/data/bar'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/saugat/Projects/iterative/dvc/dvc/commands/add.py", line 45, in run
    self.repo.add(
  File "/Users/saugat/Projects/iterative/dvc/dvc/repo/__init__.py", line 58, in wrapper
    return f(repo, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/saugat/Projects/iterative/dvc/dvc/repo/scm_context.py", line 143, in run
    return method(repo, *args, **kw)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/saugat/Projects/iterative/dvc/dvc/repo/add.py", line 227, in add
    _add(stage, source if output_exists else None, no_commit=no_commit)
  File "/Users/saugat/Projects/iterative/dvc/dvc/repo/add.py", line 178, in _add
    stage.add_outs(path, no_commit=no_commit)
  File "/Users/saugat/Projects/iterative/dvc/.venv/lib/python3.12/site-packages/funcy/decorators.py", line 47, in wrapper
    return deco(call, *dargs, **dkwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/saugat/Projects/iterative/dvc/dvc/stage/decorators.py", line 44, in rwlocked
    return call()
           ^^^^^^
  File "/Users/saugat/Projects/iterative/dvc/.venv/lib/python3.12/site-packages/funcy/decorators.py", line 68, in __call__
    return self._func(*self._args, **self._kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/saugat/Projects/iterative/dvc/dvc/stage/__init__.py", line 574, in add_outs
    out.add(filter_info, **kwargs)
  File "/Users/saugat/Projects/iterative/dvc/dvc/output.py", line 1369, in add
    raise self.DoesNotExistError(self) from exc
dvc.output.OutputDoesNotExistError: output 'data' does not exist

2024-03-26 15:50:01,635 DEBUG: Analytics is disabled.

@skshetry skshetry closed this as not planned Won't fix, can't repro, duplicate, stale Mar 26, 2024
skshetry added a commit to skshetry/dvc that referenced this issue Mar 26, 2024
`dvc add` command incorrectly raises a `DoesNotExistError` when a
broken symlink exists in an output directory, and the target name
is same as the directory's name.

eg: If `data` is an output, and is the command is invoked as `dvc add data`
(i.e. no virtual directory operations to perform).

The expected behavior to raise a `FileNotFoundError`.
`DoesNotExistError` should only be raised if the output itself does not
exist.

Related: iterative#3717
skshetry added a commit that referenced this issue Mar 26, 2024
…10373)

`dvc add` command incorrectly raises a `DoesNotExistError` when a
broken symlink exists in an output directory, and the target name
is same as the directory's name.

eg: If `data` is an output, and is the command is invoked as `dvc add data`
(i.e. no virtual directory operations to perform).

The expected behavior to raise a `FileNotFoundError`.
`DoesNotExistError` should only be raised if the output itself does not
exist.

Related: #3717
BradyJ27 pushed a commit to BradyJ27/dvc that referenced this issue Apr 22, 2024
…terative#10373)

`dvc add` command incorrectly raises a `DoesNotExistError` when a
broken symlink exists in an output directory, and the target name
is same as the directory's name.

eg: If `data` is an output, and is the command is invoked as `dvc add data`
(i.e. no virtual directory operations to perform).

The expected behavior to raise a `FileNotFoundError`.
`DoesNotExistError` should only be raised if the output itself does not
exist.

Related: iterative#3717
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Did we break something? p2-medium Medium priority, should be done, but less important
Projects
None yet
Development

No branches or pull requests

3 participants