Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exp run: doesn't work with submodule as dependency #7186

Open
woodshop opened this issue Dec 22, 2021 · 14 comments
Open

exp run: doesn't work with submodule as dependency #7186

woodshop opened this issue Dec 22, 2021 · 14 comments
Labels
A: experiments Related to dvc exp bug Did we break something? git-submodule Related to the git-submodule p3-nice-to-have It should be done this or next sprint

Comments

@woodshop
Copy link
Contributor

Bug Report

Description

dvc exp run fails and dvc repro runs successfully when a cmd is executed from inside of a submodule and the submodule is included as a dependency.

Reproduce

Set Up:

mkdir test
cd test

mkdir submodule
cd submodule
git init
echo "cp ../data/a.txt ../models/b.txt" > run.sh
git add run.sh
git commit -m "initial commit"

cd ..
mkdir main
cd main
git init
dvc init
dvc config cache.type=hardlink,symlink,copy
git submodule add ../submodule src
mkdir data
echo "test" > data/a.txt
mkdir models
dvc stage add -n cp -w src -d ../src -d ../data/a.txt -o ../models/b.txt bash run.sh
dvc repro
git add .
git commit -m "initial commit"

This works
dvc repro -f

This fails:
dvc exp run -f

This also fails:
dvc exp run -f --temp

Expected

dvc repro and dvc exp run run and succeed similarly.

Environment information

Output of dvc doctor:

$ dvc doctor

DVC version: 2.9.2 (pip)
---------------------------------
Platform: Python 3.8.10 on Linux-5.11.0-1021-aws-x86_64-with-glibc2.29
Supports:
	hdfs (fsspec = 2021.10.1, pyarrow = 5.0.0),
	webhdfs (fsspec = 2021.10.1),
	http (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
	https (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
	s3 (s3fs = 2021.10.1, boto3 = 1.17.106)
Cache types: hardlink, symlink
Cache directory: lustre on 172.16.38.30@tcp:/skl3jbmv
Caches: local
Remotes: s3
Workspace directory: lustre on 172.16.38.30@tcp:/skl3jbmv
Repo: dvc, git

Additional Information (if any):
There's an error in the verbose output that indicates

ERROR: unexpected error - invalid data in index - invalid entry
@karajan1001 karajan1001 added bug Did we break something? A: experiments Related to dvc exp labels Dec 22, 2021
@woodshop
Copy link
Contributor Author

Wondering if this is related to an earlier resolved issue that I posted about. #5740

@daavoo
Copy link
Contributor

daavoo commented Jan 14, 2022

@pmrowla ping

@pmrowla
Copy link
Contributor

pmrowla commented Jan 15, 2022

This is definitely a bug for the --temp/--queue use case, we don't handle submodules at all there. We probably need to be doing submodule pull/update in the temp workspaces that we create.

I'm not sure why exp run does not work for workspace runs though, as we should not have to do anything special there, since the submodule is already set up properly.

@woodshop it would help if you could provide the entire error traceback from the -v command output, the error message alone doesn't provide us with enough information to debug the issue.

@pmrowla pmrowla added the awaiting response we are waiting for your reply, please respond! :) label Jan 15, 2022
@woodshop
Copy link
Contributor Author

woodshop commented Jan 24, 2022

Apologies for the slow reply!

(tensorflow-stable) asarroff@neu:/fsx/test/main$ dvc exp run -f -v
2022-01-24 16:52:02,910 DEBUG: Adding '/fsx/test/main/.dvc/config.local' to gitignore file.
2022-01-24 16:52:02,921 DEBUG: Adding '/fsx/test/main/.dvc/tmp' to gitignore file.
2022-01-24 16:52:02,922 DEBUG: Adding '/fsx/test/main/.dvc/cache' to gitignore file.
2022-01-24 16:52:03,523 DEBUG: Stashed experiment '3e6d5d6' with baseline '1fab656' for future execution.
2022-01-24 16:52:03,574 DEBUG: Reproducing experiment revs '3e6d5d6'
2022-01-24 16:52:03,711 DEBUG: Init workspace executor in '/fsx/test/main'
2022-01-24 16:52:03,841 DEBUG: Adding '/fsx/test/main/.dvc/config.local' to gitignore file.
2022-01-24 16:52:03,849 DEBUG: Adding '/fsx/test/main/.dvc/tmp' to gitignore file.
2022-01-24 16:52:03,849 DEBUG: Adding '/fsx/test/main/.dvc/cache' to gitignore file.
2022-01-24 16:52:03,853 DEBUG: Running repro in '/fsx/test/main'
2022-01-24 16:52:03,853 DEBUG: Removing '/fsx/test/main/.dvc/tmp/repro.dat'
2022-01-24 16:52:05,455 DEBUG: Removing output 'models/b.txt' of stage: 'cp'.
2022-01-24 16:52:05,455 DEBUG: Removing '/fsx/test/main/models/b.txt'
Running stage 'cp':
> bash run.sh
2022-01-24 16:52:05,551 DEBUG: staged tree 'object md5: 7836100ad7371e5f9125fbeb2b24a8e5.dir'
2022-01-24 16:52:05,552 DEBUG: state save (144115339624539791, 16d176444d6ed86e1a7e908b91b81625, 33) 7836100ad7371e5f9125fbeb2b24a8e5.dir
2022-01-24 16:52:05,557 DEBUG: Adding '/fsx/test/main/models/b.txt' to gitignore file.
2022-01-24 16:52:05,567 DEBUG: state save (144115339624540011, 1643061125000000000, 5) d8e8fca2dc0f896fd7cb4cb0031ba249
2022-01-24 16:52:05,583 DEBUG: state save (144115339624540011, 1643061125000000000, 5) d8e8fca2dc0f896fd7cb4cb0031ba249
2022-01-24 16:52:05,585 DEBUG: Computed stage: 'cp' md5: '44286a707e35ea3bf08062b5fe4b7152'
2022-01-24 16:52:05,595 DEBUG: staged tree 'object md5: 7836100ad7371e5f9125fbeb2b24a8e5.dir'
2022-01-24 16:52:05,596 DEBUG: state save (144115339624539791, 16d176444d6ed86e1a7e908b91b81625, 33) 7836100ad7371e5f9125fbeb2b24a8e5.dir
2022-01-24 16:52:05,607 DEBUG: staged tree 'object md5: 7836100ad7371e5f9125fbeb2b24a8e5.dir'
2022-01-24 16:52:05,607 DEBUG: state save (144115339624539791, 16d176444d6ed86e1a7e908b91b81625, 33) 7836100ad7371e5f9125fbeb2b24a8e5.dir
2022-01-24 16:52:05,622 DEBUG: Preparing to transfer data from '/fsx/test/main/.dvc/cache' to '/fsx/test/main/.dvc/cache'
2022-01-24 16:52:05,627 DEBUG: [Errno 95] no more link types left to try out: [Errno 95] 'reflink' is not supported by <class 'dvc.fs.local.LocalFileSystem'>: [Errno 95] Operation not supported
------------------------------------------------------------
Traceback (most recent call last):
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/fs/utils.py", line 28, in _link
    func(from_path, to_path)
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/fs/local.py", line 148, in reflink
    System.reflink(from_info, to_info)
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/system.py", line 112, in reflink
    System._reflink_linux(source, link_name)
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/system.py", line 96, in _reflink_linux
    fcntl.ioctl(d.fileno(), FICLONE, s.fileno())
OSError: [Errno 95] Operation not supported

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/fs/utils.py", line 69, in _try_links
    return _link(link, from_fs, from_path, to_fs, to_path)
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/fs/utils.py", line 32, in _link
    raise OSError(
OSError: [Errno 95] 'reflink' is not supported by <class 'dvc.fs.local.LocalFileSystem'>

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/fs/utils.py", line 124, in _test_link
    _try_links([link], from_fs, from_file, to_fs, to_file)
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/fs/utils.py", line 77, in _try_links
    raise OSError(
OSError: [Errno 95] no more link types left to try out
------------------------------------------------------------
2022-01-24 16:52:05,635 DEBUG: Removing '/fsx/test/main/models/.V7EcLywV6a78oWQAv9fYxa.tmp'
2022-01-24 16:52:05,636 DEBUG: Uploading '/fsx/test/main/.dvc/cache/.FeWKPhhRkQ8TQt3zZ9NyDG.tmp' to '/fsx/test/main/models/.V7EcLywV6a78oWQAv9fYxa.tmp'
2022-01-24 16:52:05,643 DEBUG: Removing '/fsx/test/main/models/.V7EcLywV6a78oWQAv9fYxa.tmp'
2022-01-24 16:52:05,644 DEBUG: Removing '/fsx/test/main/.dvc/cache/.FeWKPhhRkQ8TQt3zZ9NyDG.tmp'
2022-01-24 16:52:05,645 DEBUG: Removing '/fsx/test/main/models/b.txt'
2022-01-24 16:52:05,647 DEBUG: Uploading '/fsx/test/main/.dvc/cache/d8/e8fca2dc0f896fd7cb4cb0031ba249' to '/fsx/test/main/models/b.txt'
2022-01-24 16:52:05,654 DEBUG: state save (144115339624540020, 1643061125000000000, 5) d8e8fca2dc0f896fd7cb4cb0031ba249
2022-01-24 16:52:05,673 DEBUG: state save (144115339624540020, 1643061125000000000, 5) d8e8fca2dc0f896fd7cb4cb0031ba249
2022-01-24 16:52:05,679 DEBUG: stage: 'cp' was reproduced
2022-01-24 16:52:05,706 DEBUG: Staging files: {'dvc.yaml', 'src', 'data/a.txt'}

To track the changes with git, run:

    git add dvc.yaml src data/a.txt

To enable auto staging, run:

	dvc config core.autostage true
2022-01-24 16:52:06,189 DEBUG: Commit to new experiment branch 'refs/exps/1f/ab656a477d19c52d1d99ce1e151191afb74cd9/exp-1c9fb'
2022-01-24 16:52:06,528 DEBUG: Collected experiment '1e1c03b'.
2022-01-24 16:52:06,563 ERROR: unexpected error - invalid data in index - invalid entry
------------------------------------------------------------
Traceback (most recent call last):
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/main.py", line 55, in main
    ret = cmd.do_run()
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/command/base.py", line 45, in do_run
    return self.run()
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/command/experiments/run.py", line 32, in run
    results = self.repo.experiments.run(
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/repo/experiments/__init__.py", line 812, in run
    return run(self.repo, *args, **kwargs)
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/repo/__init__.py", line 49, in wrapper
    return f(repo, *args, **kwargs)
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/repo/experiments/run.py", line 32, in run
    return repo.experiments.reproduce_one(
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/repo/experiments/__init__.py", line 433, in reproduce_one
    results = self._reproduce_revs(
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/repo/experiments/__init__.py", line 51, in wrapper
    return f(exp, *args, **kwargs)
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/repo/experiments/__init__.py", line 636, in _reproduce_revs
    exec_results.update(self._executors_repro(manager, **kwargs))
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/repo/experiments/__init__.py", line 62, in wrapper
    ret = f(exp, *args, **kwargs)
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/repo/experiments/__init__.py", line 667, in _executors_repro
    return manager.exec_queue(**kwargs)
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/repo/experiments/executor/manager.py", line 350, in exec_queue
    self.cleanup_executor(exec_name, executor)
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/repo/experiments/executor/manager.py", line 257, in cleanup_executor
    executor.cleanup()
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/repo/experiments/executor/local.py", line 165, in cleanup
    self.scm.set_ref(EXEC_APPLY, checkpoint)
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/contextlib.py", line 525, in __exit__
    raise exc_details[1]
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/contextlib.py", line 510, in __exit__
    if cb(*exc_details):
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/contextlib.py", line 120, in __exit__
    next(self.gen)
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/scmrepo/git/__init__.py", line 380, in detach_head
    self.reset()
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/scmrepo/git/__init__.py", line 253, in _backend_func
    return func(*args, **kwargs)
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/scmrepo/git/backend/pygit2.py", line 484, in reset
    self.repo.index.read(False)
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/pygit2/repository.py", line 646, in index
    check_error(err, io=True)
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/pygit2/errors.py", line 65, in check_error
    raise GitError(message)
_pygit2.GitError: invalid data in index - invalid entry
------------------------------------------------------------
2022-01-24 16:52:08,665 DEBUG: Adding '/fsx/test/main/.dvc/config.local' to gitignore file.
2022-01-24 16:52:08,673 DEBUG: Adding '/fsx/test/main/.dvc/tmp' to gitignore file.
2022-01-24 16:52:08,673 DEBUG: Adding '/fsx/test/main/.dvc/cache' to gitignore file.
2022-01-24 16:52:08,678 DEBUG: [Errno 95] no more link types left to try out: [Errno 95] 'reflink' is not supported by <class 'dvc.fs.local.LocalFileSystem'>: [Errno 95] Operation not supported
------------------------------------------------------------
Traceback (most recent call last):
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/main.py", line 55, in main
    ret = cmd.do_run()
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/command/base.py", line 45, in do_run
    return self.run()
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/command/experiments/run.py", line 32, in run
    results = self.repo.experiments.run(
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/repo/experiments/__init__.py", line 812, in run
    return run(self.repo, *args, **kwargs)
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/repo/__init__.py", line 49, in wrapper
    return f(repo, *args, **kwargs)
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/repo/experiments/run.py", line 32, in run
    return repo.experiments.reproduce_one(
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/repo/experiments/__init__.py", line 433, in reproduce_one
    results = self._reproduce_revs(
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/repo/experiments/__init__.py", line 51, in wrapper
    return f(exp, *args, **kwargs)
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/repo/experiments/__init__.py", line 636, in _reproduce_revs
    exec_results.update(self._executors_repro(manager, **kwargs))
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/repo/experiments/__init__.py", line 62, in wrapper
    ret = f(exp, *args, **kwargs)
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/repo/experiments/__init__.py", line 667, in _executors_repro
    return manager.exec_queue(**kwargs)
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/repo/experiments/executor/manager.py", line 350, in exec_queue
    self.cleanup_executor(exec_name, executor)
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/repo/experiments/executor/manager.py", line 257, in cleanup_executor
    executor.cleanup()
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/repo/experiments/executor/local.py", line 165, in cleanup
    self.scm.set_ref(EXEC_APPLY, checkpoint)
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/contextlib.py", line 525, in __exit__
    raise exc_details[1]
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/contextlib.py", line 510, in __exit__
    if cb(*exc_details):
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/contextlib.py", line 120, in __exit__
    next(self.gen)
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/scmrepo/git/__init__.py", line 380, in detach_head
    self.reset()
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/scmrepo/git/__init__.py", line 253, in _backend_func
    return func(*args, **kwargs)
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/scmrepo/git/backend/pygit2.py", line 484, in reset
    self.repo.index.read(False)
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/pygit2/repository.py", line 646, in index
    check_error(err, io=True)
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/pygit2/errors.py", line 65, in check_error
    raise GitError(message)
_pygit2.GitError: invalid data in index - invalid entry

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/fs/utils.py", line 28, in _link
    func(from_path, to_path)
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/fs/local.py", line 148, in reflink
    System.reflink(from_info, to_info)
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/system.py", line 112, in reflink
    System._reflink_linux(source, link_name)
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/system.py", line 96, in _reflink_linux
    fcntl.ioctl(d.fileno(), FICLONE, s.fileno())
OSError: [Errno 95] Operation not supported

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/fs/utils.py", line 69, in _try_links
    return _link(link, from_fs, from_path, to_fs, to_path)
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/fs/utils.py", line 32, in _link
    raise OSError(
OSError: [Errno 95] 'reflink' is not supported by <class 'dvc.fs.local.LocalFileSystem'>

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/fs/utils.py", line 124, in _test_link
    _try_links([link], from_fs, from_file, to_fs, to_file)
  File "/home/asarroff/miniconda3/envs/tensorflow-stable/lib/python3.8/site-packages/dvc/fs/utils.py", line 77, in _try_links
    raise OSError(
OSError: [Errno 95] no more link types left to try out
------------------------------------------------------------
2022-01-24 16:52:08,678 DEBUG: Removing '/fsx/test/.jw3rPE3S2u4AbGdwDSuHS7.tmp'
2022-01-24 16:52:08,680 DEBUG: Removing '/fsx/test/.jw3rPE3S2u4AbGdwDSuHS7.tmp'
2022-01-24 16:52:08,680 DEBUG: Removing '/fsx/test/.jw3rPE3S2u4AbGdwDSuHS7.tmp'
2022-01-24 16:52:08,681 DEBUG: Removing '/fsx/test/main/.dvc/cache/.4BUAToT43jGip9MS4pSMZ3.tmp'
2022-01-24 16:52:08,729 DEBUG: Version info for developers:
DVC version: 2.9.3 (pip)
---------------------------------
Platform: Python 3.8.8 on Linux-4.15.0-1065-aws-x86_64-with-glibc2.10
Supports:
	webhdfs (fsspec = 2021.10.1),
	http (aiohttp = 3.7.4.post0, aiohttp-retry = 2.4.5),
	https (aiohttp = 3.7.4.post0, aiohttp-retry = 2.4.5),
	s3 (s3fs = 2021.8.1, boto3 = 1.17.106)
Cache types: hardlink, symlink
Cache directory: lustre on 172.16.38.30@tcp:/skl3jbmv
Caches: local
Remotes: None
Workspace directory: lustre on 172.16.38.30@tcp:/skl3jbmv
Repo: dvc, git

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
2022-01-24 16:52:08,733 DEBUG: Analytics is disabled.

@woodshop
Copy link
Contributor Author

woodshop commented Feb 2, 2022

@pmrowla this issue is no longer "awaiting response" but I cannot change the label.

@dtrifiro dtrifiro removed the awaiting response we are waiting for your reply, please respond! :) label Feb 2, 2022
@pmrowla pmrowla added the git-submodule Related to the git-submodule label Feb 18, 2022
@woodshop
Copy link
Contributor Author

woodshop commented Aug 1, 2022

Just wondering @daavoo if this issue is on any roadmap for resolution.

@pmrowla
Copy link
Contributor

pmrowla commented Aug 2, 2022

@woodshop unfortunately we have not been able to get to this issue yet, and it's not currently planned. There's a few submodule related dvc exp issues that are open, but given that it's not a very common setup (at least based on user reports we've received so far) we have had to prioritize other work over addressing the submodule problems.

@rick-van-veen
Copy link

I would actually like to see this working as well for the --temp, --queue case.

@jnareb
Copy link

jnareb commented Oct 11, 2023

I have had the same issue, but it happens even if dvc exp run is run from top directory of the project, and when submodule changes are committed (git status returns all clear).

What is important is that dvc exp run not only fails with cryptic error message, but it makes a mess out of repository:

$ git status
On branch jn/dvc.

nothing to commit, working tree clean
$ dvc exp run
[...]
ERROR: unexpected error - invalid data in index - invalid entry

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
$ git status
On branch jn/dvc

Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
        new file:   .git/COMMIT_EDITMSG
        new file:   .git/FETCH_HEAD
        new file:   .git/HEAD
[... lots and lots of files, including all submodule files ...]

Changes not staged for commit:
  (use "git add/rm <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
        modified:   .git/HEAD
        modified:   .git/index
        modified:   .git/logs/HEAD
        deleted:    .git/refs/exps/exec/EXEC_BASELINE
        deleted:    .git/refs/exps/exec/EXEC_MERGE

To get rid of this I need to do git reset --hard HEAD in repository (where I get lots of "error: invalid path" warnings) and in submodule.

DVC should at least detect that it cannot run dvc exp run, instead of messing up the state of Git repository.

Output of dvc doctor:

$ dvc doctor
DVC version: 3.25.0 (pip)
-------------------------
Platform: Python 3.11.4 on Linux-6.4.0-2-amd64-x86_64-with-glibc2.37
Subprojects:
        dvc_data = 2.18.1
        dvc_objects = 1.0.1
        dvc_render = 0.6.0
        dvc_task = 0.3.0
        scmrepo = 1.3.1
Supports:
        http (aiohttp = 3.8.5, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.8.5, aiohttp-retry = 2.8.3),
        s3 (s3fs = 2023.9.2, boto3 = 1.28.17)

@dflatow
Copy link

dflatow commented Jan 16, 2024

Hey team -- I'm running into this same error message:

ERROR: unexpected error - invalid data in index - invalid entry

Also similarly, dvc repro -R seems to work but dvc exp run -R src/pipelines/ does not.

$ dvc doctor
DVC version: 3.33.4 (pip)
-------------------------
Platform: Python 3.10.13 on macOS-12.6-arm64-arm-64bit
Subprojects:
        dvc_data = 2.24.0
        dvc_objects = 2.0.1
        dvc_render = 1.0.0
        dvc_task = 0.3.0
        scmrepo = 1.6.0
Supports:
        http (aiohttp = 3.9.1, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.9.1, aiohttp-retry = 2.8.3),
        s3 (s3fs = 2023.12.2, boto3 = 1.33.13)
Config:
        Global: /Users/dflatow/Library/Application Support/dvc
        System: /Library/Application Support/dvc
Cache types: reflink, hardlink, symlink
Cache directory: apfs on /dev/disk3s1s1
Caches: local
Remotes: s3
Workspace directory: apfs on /dev/disk3s1s1
Repo: dvc, git
Repo.site_cache_dir: /Library/Caches/dvc/repo/ceb4be7a51a3752fe6157394a646b490

@dflatow
Copy link

dflatow commented Jan 16, 2024

Not sure if different or related issue.

@dflatow
Copy link

dflatow commented Jan 16, 2024

Also unsure if this is related but some strange git things seem to be going on - DVC has added a few files (metric files) outside of my repo:

Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
        new file:   ../../../../var/folders/2h/2_3k3c317ts84_86kzzcpqjh0000gn/T/tmprlj4wq1v/noise/evaluate/metrics/metrics.json
        new file:   ../../../../var/folders/2h/2_3k3c317ts84_86kzzcpqjh0000gn/T/tmptg4lvlwb/noise/evaluate/metrics/metrics.json
        new file:   ../../../../var/folders/2h/2_3k3c317ts84_86kzzcpqjh0000gn/T/tmpvjg0amiu/noise/evaluate/metrics/metrics.json
        new file:   ../../../../var/folders/2h/2_3k3c317ts84_86kzzcpqjh0000gn/T/tmpvjyepa6l/noise/evaluate/metrics/metrics.json

Changes not staged for commit:
  (use "git add/rm <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
        deleted:    ../../../../var/folders/2h/2_3k3c317ts84_86kzzcpqjh0000gn/T/tmprlj4wq1v/noise/evaluate/metrics/metrics.json
        deleted:    ../../../../var/folders/2h/2_3k3c317ts84_86kzzcpqjh0000gn/T/tmptg4lvlwb/noise/evaluate/metrics/metrics.json
        deleted:    ../../../../var/folders/2h/2_3k3c317ts84_86kzzcpqjh0000gn/T/tmpvjg0amiu/noise/evaluate/metrics/metrics.json
        deleted:    ../../../../var/folders/2h/2_3k3c317ts84_86kzzcpqjh0000gn/T/tmpvjyepa6l/noise/evaluate/metrics/metrics.json

@dflatow
Copy link

dflatow commented Jan 16, 2024

hmmm seems the errant metrics files were created but not properly cleaned up in a tmp directory I was using for some testing.

@dflatow
Copy link

dflatow commented Jan 16, 2024

Ok so I think the issue here was due to some sort of git issue caused by running DVC Live in a temporary folder via python library with save_exp=True.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: experiments Related to dvc exp bug Did we break something? git-submodule Related to the git-submodule p3-nice-to-have It should be done this or next sprint
Projects
None yet
Development

No branches or pull requests

9 participants