Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get/import: could not perform a HEAD request #2600

Closed
jorgeorpinel opened this issue Oct 12, 2019 · 24 comments · Fixed by #2646
Closed

get/import: could not perform a HEAD request #2600

jorgeorpinel opened this issue Oct 12, 2019 · 24 comments · Fixed by #2646
Assignees
Labels
bug Did we break something? p0-critical Critical issue. Needs to be fixed ASAP. research

Comments

@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Oct 12, 2019

DVC version: 0.62.1
Python version: 3.7.3
Platform: Darwin-18.7.0-x86_64-i386-64bit
Binary: False
Cache: reflink - True, hardlink - True, symlink - True
Filesystem type (cache directory): ('apfs', '/dev/disk1s1')
Filesystem type (workspace): ('apfs', '/dev/disk1s1')

I'm trying to import a directory versioned in our own dataset registry project into an empty, non-Git DVC project, but getting this cryptic error:

$ dvc import --rev 0547f58 \                               
           [email protected]:iterative/dataset-registry.git \
           use-cases/data
Importing 'use-cases/data ([email protected]:iterative/dataset-registry.git)' -> 'data'
ERROR: failed to import 'use-cases/data' from '[email protected]:iterative/dataset-registry.git'. - unable to find DVC-file with output '../../../../private/var/folders/_c/3mt_xn_d4xl2ddsx2m98h_r40000gn/T/tmphs83czecdvc-repo/use-cases/data'

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!

The directory in question has file name b6923e1e4ad16ea1a7e2b328842d56a2.dir (See use-cases/cats-dogs.dvc of that version). And the default remote is [configured[(https://github.com/iterative/dataset-registry/blob/master/.dvc/config) to https://remote.dvc.org/dataset-registry (which is an HTTP redirect to the s3://dvc-public/remote/dataset-registry bucket). The file seems to be in the remote

Am I just doing something wrong here (hopefully), or is dvc import broken?

@jorgeorpinel jorgeorpinel added bug Did we break something? question I have a question? labels Oct 12, 2019
@jorgeorpinel
Copy link
Contributor Author

jorgeorpinel commented Oct 12, 2019

p.s. I've also tried without --rev and get the same error (different output path).

@shcheklein
Copy link
Member

@jorgeorpinel should it be use-cases/cats-dogs?

@jorgeorpinel
Copy link
Contributor Author

🤦‍♂ Oops. I forgot I changed the directory name from data (original name in the ZIP files used in the Versioning tutorial). But I still can't get it with the correct path:

$ dvc import --rev 0547f58 \
           [email protected]:iterative/dataset-registry.git \
           use-cases/cats-dogs
Importing 'use-cases/cats-dogs ([email protected]:iterative/dataset-registry.git)' -> 'cats-dogs'
WARNING: Some of the cache files do not exist neither locally nor on remote. Missing cache files:                                                             
name: ../../../../private/var/folders/_c/3mt_xn_d4xl2ddsx2m98h_r40000gn/T/tmpfnwm64lqdvc-repo/use-cases/cats-dogs, md5: b6923e1e4ad16ea1a7e2b328842d56a2.dir
Missing cache for directory '../../../../private/var/folders/_c/3mt_xn_d4xl2ddsx2m98h_r40000gn/T/tmpfnwm64lqdvc-repo/use-cases/cats-dogs'. Cache for files inside will be lost. Would you like to continue? Use '-f' to force. [y/n] y
WARNING: Some of the cache files do not exist neither locally nor on remote. Missing cache files:                                                             
name: ../../../../private/var/folders/_c/3mt_xn_d4xl2ddsx2m98h_r40000gn/T/tmpfnwm64lqdvc-repo/use-cases/cats-dogs, md5: b6923e1e4ad16ea1a7e2b328842d56a2.dir
WARNING: Cache 'b6923e1e4ad16ea1a7e2b328842d56a2.dir' not found. File 'cats-dogs' won't be created.                                                           
ERROR: failed to import 'use-cases/cats-dogs' from '[email protected]:iterative/dataset-registry.git'. - output 'cats-dogs' does not exist

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!

With the additional issue of really long and cryptic messages, as well as a prompt I don't understand and just say y. (Both issues already reported in #2599.)

What does the "output 'cats-dogs' does not exist" error mean?

@jorgeorpinel
Copy link
Contributor Author

Hmmmm... Apparently I didn't push that version of the cats-dogs dir in the dataset registry project. It's not in the remote. I'll have to fix that first... I guess this issue is invalid then, fortunately! The messaging here is still pretty confusing though, should I open another issue about this?

@shcheklein
Copy link
Member

@jorgeorpinel yes, please open a new UI issue!

@jorgeorpinel
Copy link
Contributor Author

Done! #2602

@jorgeorpinel jorgeorpinel changed the title import: broken? import: could not perform a HEAD request? Oct 13, 2019
@jorgeorpinel
Copy link
Contributor Author

jorgeorpinel commented Oct 13, 2019

The directory in question has file name b6923e1e4ad16ea1a7e2b328842d56a2.dir (See use-cases/cats-dogs.dvc of that version).

So, I pushed the data to the remote now and checked that it actually exists on S3:

$ aws s3 ls s3://dvc-public/remote/dataset-registry/b6/
2019-10-05 01:51:13       6388 2f5c18d1af468fd41c979873a8404b
2019-10-05 01:51:41      22202 4ced1e881cc37c0e0673bafe6e789c
2019-10-12 19:10:03     161184 923e1e4ad16ea1a7e2b328842d56a2.dir  <-- Bingo
2019-10-05 01:50:56      17450 efd10ab38ff17fa593e3b102d088ac

However, I try to import it (into the same empty non-Git DVC project) and, although the progress bar runs for a while up to around 90%, the progress bar suddenly disappears and I get:

$ dvc import --rev 0547f58 \
           [email protected]:iterative/dataset-registry.git \
           use-cases/cats-dogs
Importing 'use-cases/cats-dogs ([email protected]:iterative/dataset-registry.git)' -> 'cats-dogs'
ERROR: failed to import 'use-cases/cats-dogs' from '[email protected]:iterative/dataset-registry.git'. - could not perform a HEAD request                         

And nothing is downloaded. I've tried several times. My Internet connection is fine:

Expand for SpeedTest screen capture

image

https://www.speedtest.net/result/8671087724

Is there a single file missing or something? How do I find it? I've tried dvc push from the source project again and it states Everything is up to date. (Investigated in following #2600 (comment))

@jorgeorpinel jorgeorpinel reopened this Oct 13, 2019
@jorgeorpinel
Copy link
Contributor Author

jorgeorpinel commented Oct 13, 2019

p.s. Here's the last part of the -v output of the same command: https://pastebin.com/9tPWivJr (Includes the full Python exception traceback.)

That one run failed at file adb29c1de1624c53c808f1a15bd332ba, but it's there:

$ aws s3 ls s3://dvc-public/remote/dataset-registry/ad/b29c1de1624c53c808f1a15bd332ba
2019-10-05 01:51:44      22427 b29c1de1624c53c808f1a15bd332ba

@shcheklein shcheklein added the p0-critical Critical issue. Needs to be fixed ASAP. label Oct 13, 2019
@shcheklein
Copy link
Member

@iterative/engineering p0 since it's a blocker and a potential bug.

@pared pared self-assigned this Oct 14, 2019
@efiop
Copy link
Contributor

efiop commented Oct 14, 2019

Can reproduce on my mac, but not on linux

ERROR: failed to import 'use-cases/cats-dogs' from '[email protected]:iterative/dataset-registry.git'. - could not perform a HEAD request
------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 159, in _new_conn
    (self._dns_host, self.port), self.timeout, **extra_kw)
  File "/usr/local/lib/python3.7/site-packages/urllib3/util/connection.py", line 57, in create_connection
    for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
  File "/usr/local/Cellar/python/3.7.4_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/socket.py", line 748, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno 8] nodename nor servname provided, or not known

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 600, in urlopen
    chunked=chunked)
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 343, in _make_request
    self._validate_conn(conn)
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 839, in _validate_conn
    conn.connect()
  File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 301, in connect
    conn = self._new_conn()
  File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 168, in _new_conn
    self, "Failed to establish a new connection: %s" % e)
urllib3.exceptions.NewConnectionError: <urllib3.connection.VerifiedHTTPSConnection object at 0x116945310>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/requests/adapters.py", line 449, in send
    timeout=timeout
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 638, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "/usr/local/lib/python3.7/site-packages/urllib3/util/retry.py", line 398, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='remote.dvc.org', port=443): Max retries exceeded with url: /dataset-registry/61/5bb7cebf1779b530f33b100d1f14b5 (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x116945310>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/dvc/remote/http.py", line 87, in _request
    return requests.request(method, url, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/requests/api.py", line 60, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 533, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 646, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/requests/adapters.py", line 516, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='remote.dvc.org', port=443): Max retries exceeded with url: /dataset-registry/61/5bb7cebf1779b530f33b100d1f14b5 (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x116945310>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/dvc/command/imp.py", line 20, in run
    rev=self.args.rev,
  File "/usr/local/lib/python3.7/site-packages/dvc/repo/imp.py", line 6, in imp
    return self.imp_url(path, out=out, erepo=erepo, locked=True)
  File "/usr/local/lib/python3.7/site-packages/dvc/repo/__init__.py", line 33, in wrapper
    ret = f(repo, *args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/dvc/repo/scm_context.py", line 4, in run
    result = method(repo, *args, **kw)
  File "/usr/local/lib/python3.7/site-packages/dvc/repo/imp_url.py", line 25, in imp_url
    stage.run()
  File "/usr/local/lib/python3.7/site-packages/dvc/stage.py", line 861, in run
    self.deps[0].download(self.outs[0])
  File "/usr/local/lib/python3.7/site-packages/dvc/dependency/repo.py", line 77, in download
    out = self.fetch()
  File "/usr/local/lib/python3.7/site-packages/dvc/dependency/repo.py", line 72, in fetch
    repo.cloud.pull(out.get_used_cache())
  File "/usr/local/lib/python3.7/site-packages/dvc/data_cloud.py", line 81, in pull
    show_checksums=show_checksums,
  File "/usr/local/lib/python3.7/site-packages/dvc/remote/local/__init__.py", line 412, in pull
    download=True,
  File "/usr/local/lib/python3.7/site-packages/dvc/remote/local/__init__.py", line 376, in _process
    download=download,
  File "/usr/local/lib/python3.7/site-packages/dvc/remote/local/__init__.py", line 301, in status
    md5s, jobs=jobs, name=str(remote.path_info)
  File "/usr/local/lib/python3.7/site-packages/dvc/remote/base.py", line 738, in cache_exists
    ret = list(itertools.compress(checksums, in_remote))
  File "/usr/local/Cellar/python/3.7.4_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/concurrent/futures/_base.py", line 598, in result_iterator
    yield fs.pop().result()
  File "/usr/local/Cellar/python/3.7.4_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/concurrent/futures/_base.py", line 428, in result
    return self.__get_result()
  File "/usr/local/Cellar/python/3.7.4_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
  File "/usr/local/Cellar/python/3.7.4_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.7/site-packages/dvc/remote/base.py", line 731, in exists_with_progress
    ret = self.exists(path_info)
  File "/usr/local/lib/python3.7/site-packages/dvc/remote/http.py", line 50, in exists
    return bool(self._request("HEAD", path_info.url))
  File "/usr/local/lib/python3.7/site-packages/dvc/remote/http.py", line 89, in _request
    raise DvcException("could not perform a {} request".format(method))
dvc.exceptions.DvcException: could not perform a HEAD request
------------------------------------------------------------

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!

@pared
Copy link
Contributor

pared commented Oct 14, 2019

Reproduction steps for Linux:
script

#!/bin/bash

rm -rf repo
mkdir repo

cd repo

dvc init --no-scm
dvc import --rev 0547f58 \
           [email protected]:iterative/dataset-registry.git \
           use-cases/cats-dogs

Number of max connections here needs to be changed to some big amount. For me 10k worked.

@pared
Copy link
Contributor

pared commented Oct 14, 2019

It seems like we are hitting some limit here.

@pared
Copy link
Contributor

pared commented Oct 14, 2019

Related: #2473

@pared
Copy link
Contributor

pared commented Oct 14, 2019

It seems that the problem is that, with every request send, we are reserving socket "through" requests API, which is taking open file descriptor slot. In this particular case, in method RemoteLOCAL.cache_exists we try to paralelly do a lot of HEAD calls which leads to overcoming open file descriptors limit.

example:
ulimit -n 16

and run:

from requests import sessions
from requests import head 
from concurrent.futures import ThreadPoolExecutor
import time
def run_session(i):
    try:
        head("https://www.google.com")
    except Exception as e:
        print(e)

with ThreadPoolExecutor(max_workers=24) as executor:
    args = [i for i in range(24)]
    executor.map(run_session, args)

print("finished")

@jorgeorpinel
Copy link
Contributor Author

jorgeorpinel commented Oct 14, 2019

Related: #2473 ('Errno 24 - Too many open files' on dvc push)

I didn't have any problem pushing this whole directory (1800 images) from the source project though. I'm guessing probably dvc pull will also work fine, let me check...

@jorgeorpinel jorgeorpinel removed the question I have a question? label Oct 14, 2019
@jorgeorpinel
Copy link
Contributor Author

jorgeorpinel commented Oct 14, 2019

dvc pull also works just fine (from the source project, after deleting the pushed directory). What makes import different?

@jorgeorpinel
Copy link
Contributor Author

p.s. I also just tried dvc get and the same problem occurs. What makes these different from fetch/pull?

@jorgeorpinel jorgeorpinel changed the title import: could not perform a HEAD request? get/import: could not perform a HEAD request Oct 14, 2019
@pared
Copy link
Contributor

pared commented Oct 15, 2019

Little summary so far:

  • The problem is too many file descriptors open upon remote/base.cache_exists
  • Need to check why fetch/pull does not have the same problems as import/get

Possible way of handling the problem:
The problem might be triggered because requests.sessions.Session object is created upon each requests.request calls. Maybe we could solve that by creating our own Session object, mounting proper HTTPAdapters and reusing this session, instead of calling requests.request each time.

@efiop
Copy link
Contributor

efiop commented Oct 16, 2019

Can reproduce this same bug on windows too :(

@efiop
Copy link
Contributor

efiop commented Oct 16, 2019

For the record, this only breaks with binary installs. pip works fine. If you are expriencing this, try uninstalling the binary package and installing from pip or conda.

EDIT: wrong issue, it was meant for #2589

efiop added a commit that referenced this issue Oct 16, 2019
As a part of the research for #2600
efiop added a commit that referenced this issue Oct 16, 2019
As a part of the research for #2600
@jorgeorpinel

This comment has been minimized.

@efiop

This comment has been minimized.

@efiop efiop self-assigned this Oct 21, 2019
@efiop
Copy link
Contributor

efiop commented Oct 21, 2019

https://requests.kennethreitz.org/en/master/user/advanced/ says that session is using a connection pool by default. Chaning to using session instead of requests.request directly made everything work for me and I no longer see fluctuations in fd numbers. Will send a patch ASAP. Kudos @pared 🎉

efiop added a commit to efiop/dvc that referenced this issue Oct 21, 2019
This way we are able to properly utilize automatic connection pools and
not create new fds for each request, which overflows ulimit for max fds
very quickly on mac and windows. Kudos @pared for investigating 🎉

Fixes iterative#2600

Signed-off-by: Ruslan Kuprieiev <[email protected]>
efiop added a commit that referenced this issue Oct 21, 2019
This way we are able to properly utilize automatic connection pools and
not create new fds for each request, which overflows ulimit for max fds
very quickly on mac and windows. Kudos @pared for investigating 🎉

Fixes #2600

Signed-off-by: Ruslan Kuprieiev <[email protected]>
@efiop efiop assigned pared and efiop and unassigned efiop Oct 21, 2019
@jorgeorpinel
Copy link
Contributor Author

jorgeorpinel commented Oct 22, 2019

I can confirm it's fixed for me as well in DVC 0.63.4. Thanks!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Did we break something? p0-critical Critical issue. Needs to be fixed ASAP. research
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants