Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add: fails when declaring a git submodule as a dependency #5740

Closed
woodshop opened this issue Mar 31, 2021 · 1 comment · Fixed by #5745
Closed

add: fails when declaring a git submodule as a dependency #5740

woodshop opened this issue Mar 31, 2021 · 1 comment · Fixed by #5745
Assignees
Labels
bug Did we break something?

Comments

@woodshop
Copy link
Contributor

Bug Report

Description

In DVC v2 dvc run fails when a git submodules is provided as a dependency.

Reproduce

#!/usr/bin/env bash
set -x
set -e
mkdir -p repo_a
mkdir -p repo_b
git init repo_a
git init repo_b
cd repo_b
echo "nothing" > nothing
git add -A
git commit -m "first commit"
cd ../repo_a
echo "nothing" > nothing
git add -A
git commit -m "first commit"
git submodule add ../repo_b modules/repo_b
git submodule update --init modules/repo_b
git commit -a -m "second commit"
dvc init
dvc run -v -n test -d nothing -d modules -o something 'echo "test" > something'

Expected

dvc run exits normally.

Environment information

Ubuntu 18.04, dvc v2.0.14, Python 3.6.

Output of dvc doctor:

$ dvc doctor
DVC version: 2.0.14 (pip)
---------------------------------
Platform: Python 3.6.12 on Linux-4.15.0-1065-aws-x86_64-with-debian-buster-sid
Supports: http, https, s3, ssh

Additional Information (if any):
Output from above:

+ set -e
+ mkdir -p repo_a
+ mkdir -p repo_b
+ git init repo_a
Initialized empty Git repository in /home/asarroff/tmp/dvc-test/repo_a/.git/
+ git init repo_b
Initialized empty Git repository in /home/asarroff/tmp/dvc-test/repo_b/.git/
+ cd repo_b
+ echo nothing
+ git add -A
+ git commit -m 'first commit'
[master (root-commit) 1f99a56] first commit
 1 file changed, 1 insertion(+)
 create mode 100644 nothing
+ cd ../repo_a
+ echo nothing
+ git add -A
+ git commit -m 'first commit'
[master (root-commit) f671379] first commit
 1 file changed, 1 insertion(+)
 create mode 100644 nothing
+ git submodule add ../repo_b modules/repo_b
Cloning into '/home/asarroff/tmp/dvc-test/repo_a/modules/repo_b'...
done.
+ git submodule update --init modules/repo_b
+ git commit -a -m 'second commit'
[master 175fdcb] second commit
 2 files changed, 4 insertions(+)
 create mode 100644 .gitmodules
 create mode 160000 modules/repo_b
+ dvc init
Initialized DVC repository.

You can now commit the changes to git.

What's next?
------------
- Check out the documentation: <https://dvc.org/doc>
- Get help and share ideas: <https://dvc.org/chat>
- Star us on GitHub: <https://github.com/iterative/dvc>
+ dvc run -v -n test -d nothing -d modules -o something 'echo "test" > something'
2021-03-30 21:14:47,194 DEBUG: Check for update is enabled.
2021-03-30 21:14:47,278 DEBUG: Trying to spawn '['daemon', '-q', 'updater']'
2021-03-30 21:14:47,279 DEBUG: Spawned '['daemon', '-q', 'updater']'
2021-03-30 21:14:50,834 DEBUG: Removing output 'something' of stage: 'test'.
2021-03-30 21:14:50,834 DEBUG: Removing 'something'
2021-03-30 21:14:51,186 DEBUG: state save (14035717747750367173, 1617153265880000000, 8) 3618634fc7650e537697fd7f542002c0
2021-03-30 21:14:51,270 DEBUG: state save (14035717747750367173, 1617153265880000000, 8) 3618634fc7650e537697fd7f542002c0
2021-03-30 21:14:51,516 DEBUG: state save (2807698749563520249, 1617153267732000000, 8) 3618634fc7650e537697fd7f542002c0
2021-03-30 21:14:51,686 ERROR: unexpected error - [Errno 2] No such file or directory: PosixPathInfo: 'modules/repo_b/.git'
------------------------------------------------------------
Traceback (most recent call last):
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.6/site-packages/dvc/main.py", line 55, in main
    ret = cmd.run()
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.6/site-packages/dvc/command/run.py", line 64, in run
    desc=self.args.desc,
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.6/site-packages/dvc/repo/__init__.py", line 49, in wrapper
    return f(repo, *args, **kwargs)
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.6/site-packages/dvc/repo/scm_context.py", line 14, in run
    return method(repo, *args, **kw)
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.6/site-packages/dvc/repo/run.py", line 33, in run
    stage.run(no_commit=no_commit, run_cache=run_cache)
  File "/home/asarroff/.local/lib/python3.6/site-packages/funcy/decorators.py", line 39, in wrapper
    return deco(call, *dargs, **dkwargs)
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.6/site-packages/dvc/stage/decorators.py", line 36, in rwlocked
    return call()
  File "/home/asarroff/.local/lib/python3.6/site-packages/funcy/decorators.py", line 60, in __call__
    return self._func(*self._args, **self._kwargs)
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.6/site-packages/dvc/stage/__init__.py", line 508, in run
    run_stage(self, dry, force, **kwargs)
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.6/site-packages/dvc/stage/run.py", line 154, in run_stage
    stage.repo.stage_cache.restore(stage, **kwargs)
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.6/site-packages/dvc/stage/cache.py", line 179, in restore
    if not _can_hash(stage):
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.6/site-packages/dvc/stage/cache.py", line 38, in _can_hash
    if not (dep.scheme == "local" and dep.def_path and dep.get_hash()):
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.6/site-packages/dvc/output/base.py", line 196, in get_hash
    self.fs.PARAM_CHECKSUM,
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.6/site-packages/dvc/objects/stage.py", line 177, in stage
    obj = _get_tree_obj(path_info, fs, name, odb, state, **kwargs)
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.6/site-packages/dvc/objects/stage.py", line 116, in _get_tree_obj
    tree = _build_tree(path_info, fs, name, state, **kwargs)
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.6/site-packages/dvc/objects/stage.py", line 84, in _build_tree
    for fi, hi in _iter_hashes(path_info, fs, name, state, **kwargs):
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.6/site-packages/dvc/objects/stage.py", line 76, in _iter_hashes
    yield from _calculate_hashes(path_info, fs, name, state, **kwargs).items()
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.6/site-packages/dvc/objects/stage.py", line 62, in _calculate_hashes
    return dict(pairs)
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.6/concurrent/futures/_base.py", line 586, in result_iterator
    yield fs.pop().result()
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.6/concurrent/futures/_base.py", line 432, in result
    return self.__get_result()
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.6/concurrent/futures/thread.py", line 56, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.6/site-packages/dvc/progress.py", line 129, in wrapped
    res = fn(*args, **kwargs)
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.6/site-packages/dvc/objects/stage.py", line 54, in _hash
    return file_info, get_file_hash(file_info, fs, name, state)
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.6/site-packages/dvc/objects/stage.py", line 40, in get_file_hash
    errno.ENOENT, os.strerror(errno.ENOENT), path_info
FileNotFoundError: [Errno 2] No such file or directory: PosixPathInfo: 'modules/repo_b/.git'
------------------------------------------------------------
2021-03-30 21:14:58,081 DEBUG: Version info for developers:
DVC version: 2.0.14 (pip)
---------------------------------
Platform: Python 3.6.12 on Linux-4.15.0-1065-aws-x86_64-with-debian-buster-sid
Supports: http, https, s3, ssh
Cache types: <https://error.dvc.org/no-dvc-cache>
Caches: local
Remotes: None
Workspace directory: nfs4 on 127.0.0.1:/
Repo: dvc, git
@pmrowla pmrowla added the bug Did we break something? label Mar 31, 2021
@pmrowla
Copy link
Contributor

pmrowla commented Mar 31, 2021

Issue happens because submodules contain a flat file named .git, but our default ignore pattern only contains a pattern for the directory .git/.

In fs.local.walk, for an actual file we only test against file ignore patterns, and we yield the path to the file .git. However, in fs.local.exists, we always check against both file and directory patterns (without checking to see of our path is a file or directory), which ends up returning that the file .git does not exist, since it technically matches the directory pattern. This mismatch causes the eventual exception - walk yields a file which later appears to be ignored/not exist.

Basically, we should also be including the file pattern .git in our default ignores so that we do not get this behavior mismatch between walk and exists (walk should not be yielding .git file in submodules).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Did we break something?
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants