Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DVC pull broken after upgrading pipeline from v1 to v2 #5656

Closed
theoturner opened this issue Mar 19, 2021 · 5 comments
Closed

DVC pull broken after upgrading pipeline from v1 to v2 #5656

theoturner opened this issue Mar 19, 2021 · 5 comments

Comments

@theoturner
Copy link

theoturner commented Mar 19, 2021

Bug Report

Issue name

pull: looks in nonexistent location

Description

We use dvc pull in a Github action, which runs on Azure Linux (labelled Ubuntu in Github actions). Until today that has worked fine. We recently upgraded our pipeline from v1.11 to v2.0.6. On MacOS, everything works as expected with dvc push and dvc pull. On Azure Linux, dvc pull 404s on any files it tries to pull:

ERROR: failed to download 's3://my-dvc/fa/f8906fa071e94f623456243e5aed63' to '.dvc/cache/fa/f8906fa071e94f623456243e5aed63' - An error occurred (404) when calling the HeadObject operation: Not Found

This is repeated for all files. Looking at S3, the location DVC is looking in my-dvc/fa/f8906fa071e94f623456243e5aed63 does not exist. However, locally on MacOS, .dvc/cache/fa/f8906fa071e94f623456243e5aed63 does exist.

Reproduce

  1. On MacOS: dvc add file && dvc push file
  2. On Ubuntu: dvc pull file

Note that I can delete the file on MacOS and run a dvc pull successfully.

Expected

DVC pull on Linux looks in the same storage location as on MacOS.

Environment information

MacOS

Platform: Python 3.8.5 on macOS-10.16-x86_64-i386-64bit
Supports: hdfs, http, https, s3
Cache types: reflink, hardlink, symlink
Cache directory: apfs on /dev/disk1s1s1
Caches: local
Remotes: s3
Workspace directory: apfs on /dev/disk1s1s1
Repo: dvc, git

Linux

Platform: Python 3.8.8 on Linux-5.4.0-1040-azure-x86_64-with-glibc2.2.5
Supports: hdfs, http, https, s3
Cache types: <https://error.dvc.org/no-dvc-cache>
Caches: local
Remotes: s3
Workspace directory: ext4 on /dev/root
Repo: dvc, git

The missing cache is expected, it is created when running dvc pull.

What I've tried

I have made sure to delete all files and then add them all back, push and pull using exclusively v2.

I have also tried downgrading the entire pipeline to v1, deleting all files and then adding them all back, push and pulling using exclusively v1. The issue persists, so now we can't even roll back!

I have tried resetting index with rm -rf .dvc/tmp/index and rerunning pushes.

I believe this issue is related: #4343

@isidentical
Copy link
Contributor

Can you add -v option to your dvc pull command on the Azure Linux, and share the full output?

@isidentical isidentical added the awaiting response we are waiting for your reply, please respond! :) label Mar 19, 2021
@theoturner
Copy link
Author

Run dvc pull -v src/server/assets

2021-03-22 09:07:03,216 DEBUG: Checking if stage 'src/server/assets' is in 'dvc.yaml'
2021-03-22 09:07:03,777 DEBUG: Preparing to download data from 's3://my-dvc/'
2021-03-22 09:07:03,778 DEBUG: Preparing to collect status from s3://my-dvc/
2021-03-22 09:07:03,778 DEBUG: Collecting information from local cache...
2021-03-22 09:07:03,778 DEBUG: Collecting information from remote cache...
2021-03-22 09:07:03,779 DEBUG: Matched '0' indexed hashes
2021-03-22 09:07:03,779 DEBUG: Querying 1 hashes via object_exists
2021-03-22 09:07:04,419 DEBUG: Downloading 's3://my-dvc/fa/c55d40e760c0a6fc67e5285b5ea857.dir' to '.dvc/cache/fa/c55d40e760c0a6fc67e5285b5ea857.dir'
2021-03-22 09:07:04,722 DEBUG: state save (135357, 1616404024715563520, 289) fac55d40e760c0a6fc67e5285b5ea857.dir
2021-03-22 09:07:04,729 DEBUG: Preparing to download data from 's3://my-dvc/'
2021-03-22 09:07:04,729 DEBUG: Preparing to collect status from s3://my-dvc/
2021-03-22 09:07:04,729 DEBUG: Collecting information from local cache...
2021-03-22 09:07:04,730 DEBUG: Collecting information from remote cache...
2021-03-22 09:07:04,730 DEBUG: Querying 1 hashes via object_exists
2021-03-22 09:07:05,226 DEBUG: Indexing new .dir 'fac55d40e760c0a6fc67e5285b5ea857.dir' with '4' nested files
2021-03-22 09:07:05,327 DEBUG: Downloading 's3://my-dvc/19/4577a7e20bdcc7afbb718f502c134c' to '.dvc/cache/19/4577a7e20bdcc7afbb718f502c134c'
2021-03-22 09:07:05,601 DEBUG: Downloading 's3://my-dvc/fa/f8906fa071e94f623456243e5aed63' to '.dvc/cache/fa/f8906fa071e94f623456243e5aed63'
2021-03-22 09:07:05,607 DEBUG: Downloading 's3://my-dvc/02/08587e2ac67d3a16a987246d4e9463' to '.dvc/cache/02/08587e2ac67d3a16a987246d4e9463'
2021-03-22 09:07:05,610 DEBUG: Downloading 's3://my-dvc/9e/c6e194dde6429e28a8c43698ad5f02' to '.dvc/cache/9e/c6e194dde6429e28a8c43698ad5f02'
2021-03-22 09:07:05,696 ERROR: failed to download 's3://my-dvc/fa/f8906fa071e94f623456243e5aed63' to '.dvc/cache/fa/f8906fa071e94f623456243e5aed63' - An error occurred (404) when calling the HeadObject operation: Not Found
------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.8.8/x64/lib/python3.8/site-packages/dvc/remote/base.py", line 35, in wrapper
    func(from_info, to_info, *args, **kwargs)
  File "/opt/hostedtoolcache/Python/3.8.8/x64/lib/python3.8/site-packages/dvc/fs/base.py", line 280, in download
    return self._download_file(from_info, to_info, name, no_progress_bar,)
  File "/opt/hostedtoolcache/Python/3.8.8/x64/lib/python3.8/site-packages/dvc/fs/base.py", line 334, in _download_file
    self._download(  # noqa, pylint: disable=no-member
  File "/opt/hostedtoolcache/Python/3.8.8/x64/lib/python3.8/site-packages/dvc/fs/s3.py", line 456, in _download
    total=obj.content_length,
  File "/opt/hostedtoolcache/Python/3.8.8/x64/lib/python3.8/site-packages/boto3/resources/factory.py", line 339, in property_loader
    self.load()
  File "/opt/hostedtoolcache/Python/3.8.8/x64/lib/python3.8/site-packages/boto3/resources/factory.py", line 505, in do_action
    response = action(self, *args, **kwargs)
  File "/opt/hostedtoolcache/Python/3.8.8/x64/lib/python3.8/site-packages/boto3/resources/action.py", line 83, in __call__
    response = getattr(parent.meta.client, operation_name)(*args, **params)
  File "/opt/hostedtoolcache/Python/3.8.8/x64/lib/python3.8/site-packages/botocore/client.py", line 357, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/opt/hostedtoolcache/Python/3.8.8/x64/lib/python3.8/site-packages/botocore/client.py", line 676, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (404) when calling the HeadObject operation: Not Found
------------------------------------------------------------
2021-03-22 09:07:05,710 ERROR: failed to download 's3://my-dvc/9e/c6e194dde6429e28a8c43698ad5f02' to '.dvc/cache/9e/c6e194dde6429e28a8c43698ad5f02' - An error occurred (404) when calling the HeadObject operation: Not Found
------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.8.8/x64/lib/python3.8/site-packages/dvc/remote/base.py", line 35, in wrapper
    func(from_info, to_info, *args, **kwargs)
  File "/opt/hostedtoolcache/Python/3.8.8/x64/lib/python3.8/site-packages/dvc/fs/base.py", line 280, in download
    return self._download_file(from_info, to_info, name, no_progress_bar,)
  File "/opt/hostedtoolcache/Python/3.8.8/x64/lib/python3.8/site-packages/dvc/fs/base.py", line 334, in _download_file
    self._download(  # noqa, pylint: disable=no-member
  File "/opt/hostedtoolcache/Python/3.8.8/x64/lib/python3.8/site-packages/dvc/fs/s3.py", line 456, in _download
    total=obj.content_length,
  File "/opt/hostedtoolcache/Python/3.8.8/x64/lib/python3.8/site-packages/boto3/resources/factory.py", line 339, in property_loader
    self.load()
  File "/opt/hostedtoolcache/Python/3.8.8/x64/lib/python3.8/site-packages/boto3/resources/factory.py", line 505, in do_action
    response = action(self, *args, **kwargs)
  File "/opt/hostedtoolcache/Python/3.8.8/x64/lib/python3.8/site-packages/boto3/resources/action.py", line 83, in __call__
    response = getattr(parent.meta.client, operation_name)(*args, **params)
  File "/opt/hostedtoolcache/Python/3.8.8/x64/lib/python3.8/site-packages/botocore/client.py", line 357, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/opt/hostedtoolcache/Python/3.8.8/x64/lib/python3.8/site-packages/botocore/client.py", line 676, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (404) when calling the HeadObject operation: Not Found
------------------------------------------------------------
Everything is up to date.
2021-03-22 09:07:05,932 ERROR: failed to pull data from the cloud - 2 files failed to download
------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.8.8/x64/lib/python3.8/site-packages/dvc/command/data_sync.py", line 29, in run
    stats = self.repo.pull(
  File "/opt/hostedtoolcache/Python/3.8.8/x64/lib/python3.8/site-packages/dvc/repo/__init__.py", line 49, in wrapper
    return f(repo, *args, **kwargs)
  File "/opt/hostedtoolcache/Python/3.8.8/x64/lib/python3.8/site-packages/dvc/repo/pull.py", line 29, in pull
    processed_files_count = self.fetch(
  File "/opt/hostedtoolcache/Python/3.8.8/x64/lib/python3.8/site-packages/dvc/repo/__init__.py", line 49, in wrapper
    return f(repo, *args, **kwargs)
2021-03-22 09:07:05,935 DEBUG: Analytics is enabled.
  File "/opt/hostedtoolcache/Python/3.8.8/x64/lib/python3.8/site-packages/dvc/repo/fetch.py", line 77, in fetch
    raise DownloadError(failed)
dvc.exceptions.DownloadError: 2 files failed to download
------------------------------------------------------------
2021-03-22 09:07:06,037 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/tmp/tmp3sm7xt8u']'
2021-03-22 09:07:06,040 DEBUG: Spawned '['daemon', '-q', 'analytics', '/tmp/tmp3sm7xt8u']'

@theoturner
Copy link
Author

theoturner commented Mar 22, 2021

Note the pull is actually not working on MacOS either, it was using cache. On a fresh repo DVC pull results in the same issue:

ERROR: failed to download 's3://my-dvc/fa/f8906fa071e94f623456243e5aed63' to '.dvc/cache/fa/f8906fa071e94f623456243e5aed63' - An error occurred (404) when calling the HeadObject operation: Not Found
ERROR: failed to download 's3://my-dvc/9e/c6e194dde6429e28a8c43698ad5f02' to '.dvc/cache/9e/c6e194dde6429e28a8c43698ad5f02' - An error occurred (404) when calling the HeadObject operation: Not Found                                       
Everything is up to date.                                                       
ERROR: failed to pull data from the cloud - 2 files failed to download 

I have a feeling our remote is screwed up. We are going to rip the bandaid off and create a new remote, rerun everything and add. Will update here.

@theoturner
Copy link
Author

Setting up a completely new remote has fixed the issue. I expect there are compatibility issues with v1 and v2 remotes.

It's a bit painful to re-upload everything and lose pipeline history, but for the time being this fix works.

@dberenbaum dberenbaum removed the awaiting response we are waiting for your reply, please respond! :) label Apr 19, 2021
@dberenbaum
Copy link
Collaborator

Closing as stale

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants