Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to upload file to Azure due to connection time out #3250

Closed
ogigoc opened this issue Jan 28, 2020 · 7 comments
Closed

Failed to upload file to Azure due to connection time out #3250

ogigoc opened this issue Jan 28, 2020 · 7 comments
Labels
awaiting response we are waiting for your reply, please respond! :) enhancement Enhances DVC p2-medium Medium priority, should be done, but less important

Comments

@ogigoc
Copy link

ogigoc commented Jan 28, 2020

DVC version: 0.82.3
Method of installation: pip
Platform: Ubuntu 18.04

I'm having an issue uploading a file to azure. The file is about 80MB. The azure dependencies are:

azure-common==1.1.24
azure-storage-blob==2.1.0
azure-storage-common==2.1.0

I think the problem might be related to my internet connection. The connection is stable but is really slow. Upload is about 150kB/s. The error seems to always occur after about 2 minutes.
And here is the whole output using --verbose:

(env) ➜  ml git:(develop) ✗ dvc push -r alremote --verbose
DEBUG: PRAGMA user_version;                                             
DEBUG: fetched: [(3,)]
DEBUG: CREATE TABLE IF NOT EXISTS state (inode INTEGER PRIMARY KEY, mtime TEXT NOT NULL, size TEXT NOT NULL, md5 TEXT NOT NULL, timestamp TEXT NOT NULL)
DEBUG: CREATE TABLE IF NOT EXISTS state_info (count INTEGER)
DEBUG: CREATE TABLE IF NOT EXISTS link_state (path TEXT PRIMARY KEY, inode INTEGER NOT NULL, mtime TEXT NOT NULL)
DEBUG: INSERT OR IGNORE INTO state_info (count) SELECT 0 WHERE NOT EXISTS (SELECT * FROM state_info)
DEBUG: PRAGMA user_version = 3;
DEBUG: Preparing to upload data to 'azure://al-dvcrepo'
DEBUG: Preparing to collect status from azure://al-dvcrepo
DEBUG: Collecting information from local cache...
DEBUG: Path ../.dvc/cache/c4/58c4627e59dd0582f16d6e3ab91cc6 inode 19794972                                                                                                       
DEBUG: SELECT mtime, size, md5, timestamp from state WHERE inode=?                                                                                                               
DEBUG: fetched: [('1580173618884848128', '81842644', 'c458c4627e59dd0582f16d6e3ab91cc6', '1580248181034217472')]                                                                 
DEBUG: UPDATE state SET timestamp = ? WHERE inode = ?                                                                                                                            
DEBUG: cache '../.dvc/cache/c4/58c4627e59dd0582f16d6e3ab91cc6' expected 'c458c4627e59dd0582f16d6e3ab91cc6' actual 'c458c4627e59dd0582f16d6e3ab91cc6'                             
DEBUG: Collecting information from remote cache...                                                                                                                               
DEBUG: URL azure://al-dvcrepo                                                                                                                                                    
DEBUG: Connection string DefaultEndpointsProtocol=https;AccountName=account_name;AccountKey=account_key;EndpointSuffix=core.windows.net
DEBUG: Container name al-dvcrepo                                                                                                                                                 
DEBUG: Uploading '../.dvc/cache/c4/58c4627e59dd0582f16d6e3ab91cc6' to 'azure://al-dvcrepo/c4/58c4627e59dd0582f16d6e3ab91cc6'                                                     
data/raw/all_labeled_documents.csv                                                                                                               16.0M [02:01<??:??,     128kB/s]Client-Request-ID=ca43b320-4218-11ea-a37e-6cf049b79d0b Retry policy did not allow for a retry: , HTTP status code=Unknown, Exception=('Connection aborted.', timeout('The write operation timed out',)).
ERROR: failed to upload '../.dvc/cache/c4/58c4627e59dd0582f16d6e3ab91cc6' to 'azure://al-dvcrepo/c4/58c4627e59dd0582f16d6e3ab91cc6' - ('Connection aborted.', timeout('The write operation timed out',))
------------------------------------------------------------
Traceback (most recent call last):
  File "/home/ognjen/Projects/altlegal/document-classifier/ml/env/lib/python3.6/site-packages/urllib3/connectionpool.py", line 672, in urlopen
    chunked=chunked,
  File "/home/ognjen/Projects/altlegal/document-classifier/ml/env/lib/python3.6/site-packages/urllib3/connectionpool.py", line 387, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/usr/lib/python3.6/http/client.py", line 1254, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/lib/python3.6/http/client.py", line 1300, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.6/http/client.py", line 1249, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.6/http/client.py", line 1075, in _send_output
    self.send(chunk)
  File "/usr/lib/python3.6/http/client.py", line 996, in send
    self.sock.sendall(data)
  File "/usr/lib/python3.6/ssl.py", line 975, in sendall
    v = self.send(byte_view[count:])
  File "/usr/lib/python3.6/ssl.py", line 944, in send
    return self._sslobj.write(data)
  File "/usr/lib/python3.6/ssl.py", line 642, in write
    return self._sslobj.write(data)
socket.timeout: The write operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ognjen/Projects/altlegal/document-classifier/ml/env/lib/python3.6/site-packages/requests/adapters.py", line 449, in send
    timeout=timeout
  File "/home/ognjen/Projects/altlegal/document-classifier/ml/env/lib/python3.6/site-packages/urllib3/connectionpool.py", line 720, in urlopen
    method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
  File "/home/ognjen/Projects/altlegal/document-classifier/ml/env/lib/python3.6/site-packages/urllib3/util/retry.py", line 400, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/home/ognjen/Projects/altlegal/document-classifier/ml/env/lib/python3.6/site-packages/urllib3/packages/six.py", line 734, in reraise
    raise value.with_traceback(tb)
  File "/home/ognjen/Projects/altlegal/document-classifier/ml/env/lib/python3.6/site-packages/urllib3/connectionpool.py", line 672, in urlopen
    chunked=chunked,
  File "/home/ognjen/Projects/altlegal/document-classifier/ml/env/lib/python3.6/site-packages/urllib3/connectionpool.py", line 387, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/usr/lib/python3.6/http/client.py", line 1254, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/lib/python3.6/http/client.py", line 1300, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.6/http/client.py", line 1249, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.6/http/client.py", line 1075, in _send_output
    self.send(chunk)
  File "/usr/lib/python3.6/http/client.py", line 996, in send
    self.sock.sendall(data)
  File "/usr/lib/python3.6/ssl.py", line 975, in sendall
    v = self.send(byte_view[count:])
  File "/usr/lib/python3.6/ssl.py", line 944, in send
    return self._sslobj.write(data)
  File "/usr/lib/python3.6/ssl.py", line 642, in write
    return self._sslobj.write(data)
urllib3.exceptions.ProtocolError: ('Connection aborted.', timeout('The write operation timed out',))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ognjen/Projects/altlegal/document-classifier/ml/env/lib/python3.6/site-packages/azure/storage/common/storageclient.py", line 333, in _perform_request
    response = self._httpclient.perform_request(request)
  File "/home/ognjen/Projects/altlegal/document-classifier/ml/env/lib/python3.6/site-packages/azure/storage/common/_http/httpclient.py", line 92, in perform_request
    proxies=self.proxies)
  File "/home/ognjen/Projects/altlegal/document-classifier/ml/env/lib/python3.6/site-packages/requests/sessions.py", line 533, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/ognjen/Projects/altlegal/document-classifier/ml/env/lib/python3.6/site-packages/requests/sessions.py", line 646, in send
    r = adapter.send(request, **kwargs)
  File "/home/ognjen/Projects/altlegal/document-classifier/ml/env/lib/python3.6/site-packages/requests/adapters.py", line 498, in send
    raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', timeout('The write operation timed out',))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ognjen/Projects/altlegal/document-classifier/ml/env/lib/python3.6/site-packages/dvc/remote/base.py", line 580, in upload
    no_progress_bar=no_progress_bar,
  File "/home/ognjen/Projects/altlegal/document-classifier/ml/env/lib/python3.6/site-packages/dvc/remote/azure.py", line 118, in _upload
    progress_callback=pbar.update_to,
  File "/home/ognjen/Projects/altlegal/document-classifier/ml/env/lib/python3.6/site-packages/azure/storage/blob/blockblobservice.py", line 491, in create_blob_from_path
    standard_blob_tier=standard_blob_tier, cpk=cpk)
  File "/home/ognjen/Projects/altlegal/document-classifier/ml/env/lib/python3.6/site-packages/azure/storage/blob/blockblobservice.py", line 666, in create_blob_from_stream
    cpk=cpk,
  File "/home/ognjen/Projects/altlegal/document-classifier/ml/env/lib/python3.6/site-packages/azure/storage/blob/_upload_chunking.py", line 88, in _upload_blob_chunks
    raise f.exception()
  File "/usr/lib/python3.6/concurrent/futures/thread.py", line 56, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/ognjen/Projects/altlegal/document-classifier/ml/env/lib/python3.6/site-packages/azure/storage/blob/_upload_chunking.py", line 213, in process_chunk
    return self._upload_chunk_with_progress(chunk_offset, chunk_bytes)
  File "/home/ognjen/Projects/altlegal/document-classifier/ml/env/lib/python3.6/site-packages/azure/storage/blob/_upload_chunking.py", line 227, in _upload_chunk_with_progress
    range_id = self._upload_chunk(chunk_offset, chunk_data)
  File "/home/ognjen/Projects/altlegal/document-classifier/ml/env/lib/python3.6/site-packages/azure/storage/blob/_upload_chunking.py", line 272, in _upload_chunk
    cpk=self.cpk,
  File "/home/ognjen/Projects/altlegal/document-classifier/ml/env/lib/python3.6/site-packages/azure/storage/blob/blockblobservice.py", line 1301, in _put_block
    self._perform_request(request)
  File "/home/ognjen/Projects/altlegal/document-classifier/ml/env/lib/python3.6/site-packages/azure/storage/common/storageclient.py", line 446, in _perform_request
    raise ex
  File "/home/ognjen/Projects/altlegal/document-classifier/ml/env/lib/python3.6/site-packages/azure/storage/common/storageclient.py", line 377, in _perform_request
    raise _wrap_exception(ex, AzureException)
azure.common.AzureException: ('Connection aborted.', timeout('The write operation timed out',))
------------------------------------------------------------

DEBUG: SELECT count from state_info WHERE rowid=?
DEBUG: fetched: [(2,)]
DEBUG: UPDATE state_info SET count = ? WHERE rowid = ?
ERROR: failed to push data to the cloud - 1 files failed to upload
------------------------------------------------------------
Traceback (most recent call last):
  File "/home/ognjen/Projects/altlegal/document-classifier/ml/env/lib/python3.6/site-packages/dvc/command/data_sync.py", line 49, in run
    recursive=self.args.recursive,
  File "/home/ognjen/Projects/altlegal/document-classifier/ml/env/lib/python3.6/site-packages/dvc/repo/__init__.py", line 31, in wrapper
    ret = f(repo, *args, **kwargs)
  File "/home/ognjen/Projects/altlegal/document-classifier/ml/env/lib/python3.6/site-packages/dvc/repo/push.py", line 25, in push
    return self.cloud.push(used, jobs, remote=remote)
  File "/home/ognjen/Projects/altlegal/document-classifier/ml/env/lib/python3.6/site-packages/dvc/data_cloud.py", line 81, in push
    show_checksums=show_checksums,
  File "/home/ognjen/Projects/altlegal/document-classifier/ml/env/lib/python3.6/site-packages/dvc/remote/local.py", line 385, in push
    download=False,
  File "/home/ognjen/Projects/altlegal/document-classifier/ml/env/lib/python3.6/site-packages/dvc/remote/local.py", line 375, in _process
    raise UploadError(fails)
dvc.exceptions.UploadError: 1 files failed to upload
------------------------------------------------------------


Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
@triage-new-issues triage-new-issues bot added the triage Needs to be triaged label Jan 28, 2020
@efiop efiop added enhancement Enhances DVC p2-medium Medium priority, should be done, but less important labels Jan 28, 2020
@triage-new-issues triage-new-issues bot removed the triage Needs to be triaged label Jan 28, 2020
@efiop
Copy link
Contributor

efiop commented Jan 29, 2020

Hi @ogigoc !

Thanks for reporting the issue! Indeed, looks like the default timeout is too small for your particular case. We've had something similar in GS, where we've solved it with dynamic chunk size logic, but from what I can tell right now, looks like in azure we might be better off trying to increase the timeout in https://github.com/iterative/dvc/blob/0.82.4/dvc/remote/azure.py#L114 (if I've read the azure source code correctly, it is 100 sec by default). If you feel comfortable modifying and installing dvc, maybe consider adjusting that line and seeing if that would help. If it does, we would be glad to receive a PR 🙂 If it doesn't, we'll need to dig a bit deeper.

@efiop
Copy link
Contributor

efiop commented Jan 29, 2020

Btw, @ogigoc does az cp ... work fine for you with big-ish files?

@ogigoc
Copy link
Author

ogigoc commented Feb 3, 2020

Hi @efiop,

Thanks for the fast reply! I tried setting the timeout but it just gets ignored no matter the value. But, setting max_connections=1 solves the issue for me, so i'll just use it like that. I assume that this is specific for me and you don't want it in a pull request?

Did not try az cp i only use azure for dvc.

@efiop
Copy link
Contributor

efiop commented Feb 3, 2020

@ogigoc Could you please elaborate on where you've set max_connections? So far it still sounds like we should consider adjusting the timeout or chunk sizes. The reason why I've asked for az cp results is that it would help us understand if dvc behaves weirdly compared to az and if we could theoretically borrow some logic from az, as we did with gsutil for GS a few months back 🙂

@ogigoc
Copy link
Author

ogigoc commented Feb 4, 2020

@efiop I've set max_connections=1 in create_blob_from_path at https://github.com/iterative/dvc/blob/0.82.4/dvc/remote/azure.py#L114 . Unfortunately, i don't have an azure account so i cannot try az cp. But, i'm pretty sure my issue is reproducible just by limiting your upload speed to about 100kB/s and running dvc push, a colleague of mine has been able to do it in a completely different environment.

@efiop
Copy link
Contributor

efiop commented Feb 4, 2020

@ogigoc But if you are able to dvc push/pull, then az cp should also work, or am I missing something? I don't think you need any special account on top of what you already have.

Thanks for clarifying about max_connections! It would really help to try out with az cp, as it will allow us to quickly understand if az already has a working recipe that we could then port to dvc.

@efiop efiop added the awaiting response we are waiting for your reply, please respond! :) label Feb 4, 2020
@efiop
Copy link
Contributor

efiop commented Feb 26, 2020

Closing as stale. Please feel free to reopen.

@efiop efiop closed this as completed Feb 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting response we are waiting for your reply, please respond! :) enhancement Enhances DVC p2-medium Medium priority, should be done, but less important
Projects
None yet
Development

No branches or pull requests

2 participants