-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
exp run
: doesn't work with submodule as dependency
#7186
Comments
Wondering if this is related to an earlier resolved issue that I posted about. #5740 |
@pmrowla ping |
This is definitely a bug for the I'm not sure why @woodshop it would help if you could provide the entire error traceback from the |
Apologies for the slow reply! (tensorflow-stable) asarroff@neu:/fsx/test/main$ dvc exp run -f -v
2022-01-24 16:52:02,910 DEBUG: Adding '/fsx/test/main/.dvc/config.local' to gitignore file.
2022-01-24 16:52:02,921 DEBUG: Adding '/fsx/test/main/.dvc/tmp' to gitignore file.
2022-01-24 16:52:02,922 DEBUG: Adding '/fsx/test/main/.dvc/cache' to gitignore file.
2022-01-24 16:52:03,523 DEBUG: Stashed experiment '3e6d5d6' with baseline '1fab656' for future execution.
2022-01-24 16:52:03,574 DEBUG: Reproducing experiment revs '3e6d5d6'
2022-01-24 16:52:03,711 DEBUG: Init workspace executor in '/fsx/test/main'
2022-01-24 16:52:03,841 DEBUG: Adding '/fsx/test/main/.dvc/config.local' to gitignore file.
2022-01-24 16:52:03,849 DEBUG: Adding '/fsx/test/main/.dvc/tmp' to gitignore file.
2022-01-24 16:52:03,849 DEBUG: Adding '/fsx/test/main/.dvc/cache' to gitignore file.
2022-01-24 16:52:03,853 DEBUG: Running repro in '/fsx/test/main'
2022-01-24 16:52:03,853 DEBUG: Removing '/fsx/test/main/.dvc/tmp/repro.dat'
2022-01-24 16:52:05,455 DEBUG: Removing output 'models/b.txt' of stage: 'cp'.
2022-01-24 16:52:05,455 DEBUG: Removing '/fsx/test/main/models/b.txt'
Running stage 'cp':
> bash run.sh
2022-01-24 16:52:05,551 DEBUG: staged tree 'object md5: 7836100ad7371e5f9125fbeb2b24a8e5.dir'
2022-01-24 16:52:05,552 DEBUG: state save (144115339624539791, 16d176444d6ed86e1a7e908b91b81625, 33) 7836100ad7371e5f9125fbeb2b24a8e5.dir
2022-01-24 16:52:05,557 DEBUG: Adding '/fsx/test/main/models/b.txt' to gitignore file.
2022-01-24 16:52:05,567 DEBUG: state save (144115339624540011, 1643061125000000000, 5) d8e8fca2dc0f896fd7cb4cb0031ba249
2022-01-24 16:52:05,583 DEBUG: state save (144115339624540011, 1643061125000000000, 5) d8e8fca2dc0f896fd7cb4cb0031ba249
2022-01-24 16:52:05,585 DEBUG: Computed stage: 'cp' md5: '44286a707e35ea3bf08062b5fe4b7152'
2022-01-24 16:52:05,595 DEBUG: staged tree 'object md5: 7836100ad7371e5f9125fbeb2b24a8e5.dir'
2022-01-24 16:52:05,596 DEBUG: state save (144115339624539791, 16d176444d6ed86e1a7e908b91b81625, 33) 7836100ad7371e5f9125fbeb2b24a8e5.dir
2022-01-24 16:52:05,607 DEBUG: staged tree 'object md5: 7836100ad7371e5f9125fbeb2b24a8e5.dir'
2022-01-24 16:52:05,607 DEBUG: state save (144115339624539791, 16d176444d6ed86e1a7e908b91b81625, 33) 7836100ad7371e5f9125fbeb2b24a8e5.dir
2022-01-24 16:52:05,622 DEBUG: Preparing to transfer data from '/fsx/test/main/.dvc/cache' to '/fsx/test/main/.dvc/cache'
2022-01-24 16:52:05,627 DEBUG: [Errno 95] no more link types left to try out: [Errno 95] 'reflink' is not supported by <class 'dvc.fs.local.LocalFileSystem'>: [Errno 95] Operation not supported
------------------------------------------------------------
Traceback (most recent call last):
File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/fs/utils.py", line 28, in _link
func(from_path, to_path)
File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/fs/local.py", line 148, in reflink
System.reflink(from_info, to_info)
File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/system.py", line 112, in reflink
System._reflink_linux(source, link_name)
File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/system.py", line 96, in _reflink_linux
fcntl.ioctl(d.fileno(), FICLONE, s.fileno())
OSError: [Errno 95] Operation not supported
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/fs/utils.py", line 69, in _try_links
return _link(link, from_fs, from_path, to_fs, to_path)
File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/fs/utils.py", line 32, in _link
raise OSError(
OSError: [Errno 95] 'reflink' is not supported by <class 'dvc.fs.local.LocalFileSystem'>
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/fs/utils.py", line 124, in _test_link
_try_links([link], from_fs, from_file, to_fs, to_file)
File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/fs/utils.py", line 77, in _try_links
raise OSError(
OSError: [Errno 95] no more link types left to try out
------------------------------------------------------------
2022-01-24 16:52:05,635 DEBUG: Removing '/fsx/test/main/models/.V7EcLywV6a78oWQAv9fYxa.tmp'
2022-01-24 16:52:05,636 DEBUG: Uploading '/fsx/test/main/.dvc/cache/.FeWKPhhRkQ8TQt3zZ9NyDG.tmp' to '/fsx/test/main/models/.V7EcLywV6a78oWQAv9fYxa.tmp'
2022-01-24 16:52:05,643 DEBUG: Removing '/fsx/test/main/models/.V7EcLywV6a78oWQAv9fYxa.tmp'
2022-01-24 16:52:05,644 DEBUG: Removing '/fsx/test/main/.dvc/cache/.FeWKPhhRkQ8TQt3zZ9NyDG.tmp'
2022-01-24 16:52:05,645 DEBUG: Removing '/fsx/test/main/models/b.txt'
2022-01-24 16:52:05,647 DEBUG: Uploading '/fsx/test/main/.dvc/cache/d8/e8fca2dc0f896fd7cb4cb0031ba249' to '/fsx/test/main/models/b.txt'
2022-01-24 16:52:05,654 DEBUG: state save (144115339624540020, 1643061125000000000, 5) d8e8fca2dc0f896fd7cb4cb0031ba249
2022-01-24 16:52:05,673 DEBUG: state save (144115339624540020, 1643061125000000000, 5) d8e8fca2dc0f896fd7cb4cb0031ba249
2022-01-24 16:52:05,679 DEBUG: stage: 'cp' was reproduced
2022-01-24 16:52:05,706 DEBUG: Staging files: {'dvc.yaml', 'src', 'data/a.txt'}
To track the changes with git, run:
git add dvc.yaml src data/a.txt
To enable auto staging, run:
dvc config core.autostage true
2022-01-24 16:52:06,189 DEBUG: Commit to new experiment branch 'refs/exps/1f/ab656a477d19c52d1d99ce1e151191afb74cd9/exp-1c9fb'
2022-01-24 16:52:06,528 DEBUG: Collected experiment '1e1c03b'.
2022-01-24 16:52:06,563 ERROR: unexpected error - invalid data in index - invalid entry
------------------------------------------------------------
Traceback (most recent call last):
File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/main.py", line 55, in main
ret = cmd.do_run()
File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/command/base.py", line 45, in do_run
return self.run()
File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/command/experiments/run.py", line 32, in run
results = self.repo.experiments.run(
File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/repo/experiments/__init__.py", line 812, in run
return run(self.repo, *args, **kwargs)
File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/repo/__init__.py", line 49, in wrapper
return f(repo, *args, **kwargs)
File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/repo/experiments/run.py", line 32, in run
return repo.experiments.reproduce_one(
File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/repo/experiments/__init__.py", line 433, in reproduce_one
results = self._reproduce_revs(
File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/repo/experiments/__init__.py", line 51, in wrapper
return f(exp, *args, **kwargs)
File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/repo/experiments/__init__.py", line 636, in _reproduce_revs
exec_results.update(self._executors_repro(manager, **kwargs))
File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/repo/experiments/__init__.py", line 62, in wrapper
ret = f(exp, *args, **kwargs)
File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/repo/experiments/__init__.py", line 667, in _executors_repro
return manager.exec_queue(**kwargs)
File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/repo/experiments/executor/manager.py", line 350, in exec_queue
self.cleanup_executor(exec_name, executor)
File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/repo/experiments/executor/manager.py", line 257, in cleanup_executor
executor.cleanup()
File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/repo/experiments/executor/local.py", line 165, in cleanup
self.scm.set_ref(EXEC_APPLY, checkpoint)
File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/contextlib.py", line 525, in __exit__
raise exc_details[1]
File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/contextlib.py", line 510, in __exit__
if cb(*exc_details):
File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/contextlib.py", line 120, in __exit__
next(self.gen)
File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/scmrepo/git/__init__.py", line 380, in detach_head
self.reset()
File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/scmrepo/git/__init__.py", line 253, in _backend_func
return func(*args, **kwargs)
File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/scmrepo/git/backend/pygit2.py", line 484, in reset
self.repo.index.read(False)
File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/pygit2/repository.py", line 646, in index
check_error(err, io=True)
File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/pygit2/errors.py", line 65, in check_error
raise GitError(message)
_pygit2.GitError: invalid data in index - invalid entry
------------------------------------------------------------
2022-01-24 16:52:08,665 DEBUG: Adding '/fsx/test/main/.dvc/config.local' to gitignore file.
2022-01-24 16:52:08,673 DEBUG: Adding '/fsx/test/main/.dvc/tmp' to gitignore file.
2022-01-24 16:52:08,673 DEBUG: Adding '/fsx/test/main/.dvc/cache' to gitignore file.
2022-01-24 16:52:08,678 DEBUG: [Errno 95] no more link types left to try out: [Errno 95] 'reflink' is not supported by <class 'dvc.fs.local.LocalFileSystem'>: [Errno 95] Operation not supported
------------------------------------------------------------
Traceback (most recent call last):
File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/main.py", line 55, in main
ret = cmd.do_run()
File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/command/base.py", line 45, in do_run
return self.run()
File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/command/experiments/run.py", line 32, in run
results = self.repo.experiments.run(
File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/repo/experiments/__init__.py", line 812, in run
return run(self.repo, *args, **kwargs)
File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/repo/__init__.py", line 49, in wrapper
return f(repo, *args, **kwargs)
File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/repo/experiments/run.py", line 32, in run
return repo.experiments.reproduce_one(
File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/repo/experiments/__init__.py", line 433, in reproduce_one
results = self._reproduce_revs(
File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/repo/experiments/__init__.py", line 51, in wrapper
return f(exp, *args, **kwargs)
File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/repo/experiments/__init__.py", line 636, in _reproduce_revs
exec_results.update(self._executors_repro(manager, **kwargs))
File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/repo/experiments/__init__.py", line 62, in wrapper
ret = f(exp, *args, **kwargs)
File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/repo/experiments/__init__.py", line 667, in _executors_repro
return manager.exec_queue(**kwargs)
File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/repo/experiments/executor/manager.py", line 350, in exec_queue
self.cleanup_executor(exec_name, executor)
File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/repo/experiments/executor/manager.py", line 257, in cleanup_executor
executor.cleanup()
File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/repo/experiments/executor/local.py", line 165, in cleanup
self.scm.set_ref(EXEC_APPLY, checkpoint)
File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/contextlib.py", line 525, in __exit__
raise exc_details[1]
File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/contextlib.py", line 510, in __exit__
if cb(*exc_details):
File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/contextlib.py", line 120, in __exit__
next(self.gen)
File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/scmrepo/git/__init__.py", line 380, in detach_head
self.reset()
File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/scmrepo/git/__init__.py", line 253, in _backend_func
return func(*args, **kwargs)
File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/scmrepo/git/backend/pygit2.py", line 484, in reset
self.repo.index.read(False)
File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/pygit2/repository.py", line 646, in index
check_error(err, io=True)
File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/pygit2/errors.py", line 65, in check_error
raise GitError(message)
_pygit2.GitError: invalid data in index - invalid entry
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/fs/utils.py", line 28, in _link
func(from_path, to_path)
File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/fs/local.py", line 148, in reflink
System.reflink(from_info, to_info)
File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/system.py", line 112, in reflink
System._reflink_linux(source, link_name)
File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/system.py", line 96, in _reflink_linux
fcntl.ioctl(d.fileno(), FICLONE, s.fileno())
OSError: [Errno 95] Operation not supported
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/fs/utils.py", line 69, in _try_links
return _link(link, from_fs, from_path, to_fs, to_path)
File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/fs/utils.py", line 32, in _link
raise OSError(
OSError: [Errno 95] 'reflink' is not supported by <class 'dvc.fs.local.LocalFileSystem'>
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/fs/utils.py", line 124, in _test_link
_try_links([link], from_fs, from_file, to_fs, to_file)
File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/fs/utils.py", line 77, in _try_links
raise OSError(
OSError: [Errno 95] no more link types left to try out
------------------------------------------------------------
2022-01-24 16:52:08,678 DEBUG: Removing '/fsx/test/.jw3rPE3S2u4AbGdwDSuHS7.tmp'
2022-01-24 16:52:08,680 DEBUG: Removing '/fsx/test/.jw3rPE3S2u4AbGdwDSuHS7.tmp'
2022-01-24 16:52:08,680 DEBUG: Removing '/fsx/test/.jw3rPE3S2u4AbGdwDSuHS7.tmp'
2022-01-24 16:52:08,681 DEBUG: Removing '/fsx/test/main/.dvc/cache/.4BUAToT43jGip9MS4pSMZ3.tmp'
2022-01-24 16:52:08,729 DEBUG: Version info for developers:
DVC version: 2.9.3 (pip)
---------------------------------
Platform: Python 3.8.8 on Linux-4.15.0-1065-aws-x86_64-with-glibc2.10
Supports:
webhdfs (fsspec = 2021.10.1),
http (aiohttp = 3.7.4.post0, aiohttp-retry = 2.4.5),
https (aiohttp = 3.7.4.post0, aiohttp-retry = 2.4.5),
s3 (s3fs = 2021.8.1, boto3 = 1.17.106)
Cache types: hardlink, symlink
Cache directory: lustre on 172.16.38.30@tcp:/skl3jbmv
Caches: local
Remotes: None
Workspace directory: lustre on 172.16.38.30@tcp:/skl3jbmv
Repo: dvc, git
Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
2022-01-24 16:52:08,733 DEBUG: Analytics is disabled. |
@pmrowla this issue is no longer "awaiting response" but I cannot change the label. |
Just wondering @daavoo if this issue is on any roadmap for resolution. |
@woodshop unfortunately we have not been able to get to this issue yet, and it's not currently planned. There's a few submodule related |
I would actually like to see this working as well for the --temp, --queue case. |
I have had the same issue, but it happens even if What is important is that $ git status
On branch jn/dvc.
nothing to commit, working tree clean
$ dvc exp run
[...]
ERROR: unexpected error - invalid data in index - invalid entry
Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
$ git status
On branch jn/dvc
Changes to be committed:
(use "git restore --staged <file>..." to unstage)
new file: .git/COMMIT_EDITMSG
new file: .git/FETCH_HEAD
new file: .git/HEAD
[... lots and lots of files, including all submodule files ...]
Changes not staged for commit:
(use "git add/rm <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: .git/HEAD
modified: .git/index
modified: .git/logs/HEAD
deleted: .git/refs/exps/exec/EXEC_BASELINE
deleted: .git/refs/exps/exec/EXEC_MERGE To get rid of this I need to do DVC should at least detect that it cannot run Output of $ dvc doctor
DVC version: 3.25.0 (pip)
-------------------------
Platform: Python 3.11.4 on Linux-6.4.0-2-amd64-x86_64-with-glibc2.37
Subprojects:
dvc_data = 2.18.1
dvc_objects = 1.0.1
dvc_render = 0.6.0
dvc_task = 0.3.0
scmrepo = 1.3.1
Supports:
http (aiohttp = 3.8.5, aiohttp-retry = 2.8.3),
https (aiohttp = 3.8.5, aiohttp-retry = 2.8.3),
s3 (s3fs = 2023.9.2, boto3 = 1.28.17) |
Hey team -- I'm running into this same error message:
Also similarly,
|
Not sure if different or related issue. |
Also unsure if this is related but some strange git things seem to be going on - DVC has added a few files (metric files) outside of my repo:
|
hmmm seems the errant metrics files were created but not properly cleaned up in a tmp directory I was using for some testing. |
Ok so I think the issue here was due to some sort of git issue caused by running DVC Live in a temporary folder via python library with save_exp=True. |
Bug Report
Description
dvc exp run
fails anddvc repro
runs successfully when a cmd is executed from inside of a submodule and the submodule is included as a dependency.Reproduce
Set Up:
This works
dvc repro -f
This fails:
dvc exp run -f
This also fails:
dvc exp run -f --temp
Expected
dvc repro
anddvc exp run
run and succeed similarly.Environment information
Output of
dvc doctor
:Additional Information (if any):
There's an error in the verbose output that indicates
ERROR: unexpected error - invalid data in index - invalid entry
The text was updated successfully, but these errors were encountered: