Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with cert verification #3275

Closed
guerinclement opened this issue Feb 3, 2020 · 12 comments
Closed

Issue with cert verification #3275

guerinclement opened this issue Feb 3, 2020 · 12 comments
Labels
awaiting response we are waiting for your reply, please respond! :) bug Did we break something? p0-critical Critical issue. Needs to be fixed ASAP. research

Comments

@guerinclement
Copy link

Just tried out the Get Started DVC Tutorial and got stuck quite quickly...
I ran the following command:
$ dvc get https://github.com/iterative/dataset-registry get-started/data.xml -o data/data.xml
and got this error:
ERROR: failed to get 'get-started/data.xml' from 'https://github.com/iterative/dataset-registry' - could not perform a HEAD request
ERROR: unexpected error - HTTPSConnectionPool(host='analytics.dvc.org', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError(185090057, '[X509] PEM lib (_ssl.c:3732)'),))

I am behind my company proxy that rewrites SSL certificates on the fly.
I installed DVC through pip in a conda env.
I added my company ROOT_CA in any certifi/pem files that I could find in my virtualenv as it usually solves this kind of issue with requests powered Internet interactions.
It did not fix anything here though...

The command below works fine:
$ git clone https://github.com/iterative/dataset-registry

Here is the output of dvc version:
DVC version: 0.82.5
Python version: 3.6.10
Platform: Linux-3.10.0-957.5.1.el7.x86_64-x86_64-with-centos-7.6.1810-Core
Binary: False
Package: pip

Any help would be much appreciated!

Thanks!

@triage-new-issues triage-new-issues bot added the triage Needs to be triaged label Feb 3, 2020
@efiop efiop added bug Did we break something? p0-critical Critical issue. Needs to be fixed ASAP. labels Feb 3, 2020
@triage-new-issues triage-new-issues bot removed triage Needs to be triaged labels Feb 3, 2020
@ghost
Copy link

ghost commented Feb 4, 2020

@guerinclement , a work around would be to disable analytics: https://dvc.org/doc/user-guide/analytics#opting-out

@efiop
Copy link
Contributor

efiop commented Feb 4, 2020

@guerinclement Btw, could you please post dvc get https://github.com/iterative/dataset-registry get-started/data.xml -o data/data.xml -v log? Notice the -v I've added to enable verbose output.

@guerinclement
Copy link
Author

guerinclement commented Feb 4, 2020

@efiop here you go:

(dvc) [78176d@slhdg002 tuto-dvc]$ dvc get https://github.com/iterative/dataset-registry get-started/data.xml -o data/data2.xml -v
DEBUG: CREATE TABLE IF NOT EXISTS state (inode INTEGER PRIMARY KEY, mtime TEXT NOT NULL, size TEXT NOT NULL, md5 TEXT NOT NULL, timestamp TEXT NOT NULL)
DEBUG: CREATE TABLE IF NOT EXISTS state_info (count INTEGER)
DEBUG: CREATE TABLE IF NOT EXISTS link_state (path TEXT PRIMARY KEY, inode INTEGER NOT NULL, mtime TEXT NOT NULL)
DEBUG: INSERT OR IGNORE INTO state_info (count) SELECT 0 WHERE NOT EXISTS (SELECT * FROM state_info)
DEBUG: PRAGMA user_version = 3;
DEBUG: cache 'data/.Gnsb9idYN7RvZ9PHVeskMR/a3/04afb96060aad90176268345e10355' expected 'a304afb96060aad90176268345e10355' actual 'None'
DEBUG: Preparing to download data from 'https://remote.dvc.org/dataset-registry'
DEBUG: Preparing to collect status from https://remote.dvc.org/dataset-registry
DEBUG: Collecting information from local cache...
DEBUG: cache 'data/.Gnsb9idYN7RvZ9PHVeskMR/a3/04afb96060aad90176268345e10355' expected 'a304afb96060aad90176268345e10355' actual 'None'
DEBUG: Collecting information from remote cache...
DEBUG: SELECT count from state_info WHERE rowid=?
DEBUG: fetched: [(0,)]
DEBUG: UPDATE state_info SET count = ? WHERE rowid = ?
DEBUG: Removing '/tmp/tmpzaub36vodvc-erepo'
DEBUG: Removing '/home/78176d/workspace/tuto-dvc/data/.Gnsb9idYN7RvZ9PHVeskMR'
ERROR: failed to get 'get-started/data.xml' from 'https://github.com/iterative/dataset-registry' - could not perform a HEAD request
------------------------------------------------------------
Traceback (most recent call last):
  File "/home/78176d/.conda/envs/dvc/lib/python3.6/site-packages/urllib3/util/ssl_.py", line 336, in ssl_wrap_socket
    context.load_verify_locations(ca_certs, ca_cert_dir)
ssl.SSLError: [X509] PEM lib (_ssl.c:3732)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/78176d/.conda/envs/dvc/lib/python3.6/site-packages/urllib3/connectionpool.py", line 662, in urlopen
    self._prepare_proxy(conn)
  File "/home/78176d/.conda/envs/dvc/lib/python3.6/site-packages/urllib3/connectionpool.py", line 948, in _prepare_proxy
    conn.connect()
  File "/home/78176d/.conda/envs/dvc/lib/python3.6/site-packages/urllib3/connection.py", line 360, in connect
    ssl_context=context,
  File "/home/78176d/.conda/envs/dvc/lib/python3.6/site-packages/urllib3/util/ssl_.py", line 338, in ssl_wrap_socket
    raise SSLError(e)
urllib3.exceptions.SSLError: [X509] PEM lib (_ssl.c:3732)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/78176d/.conda/envs/dvc/lib/python3.6/site-packages/requests/adapters.py", line 449, in send
    timeout=timeout
  File "/home/78176d/.conda/envs/dvc/lib/python3.6/site-packages/urllib3/connectionpool.py", line 760, in urlopen
    **response_kw
  File "/home/78176d/.conda/envs/dvc/lib/python3.6/site-packages/urllib3/connectionpool.py", line 760, in urlopen
    **response_kw
  File "/home/78176d/.conda/envs/dvc/lib/python3.6/site-packages/urllib3/connectionpool.py", line 760, in urlopen
    **response_kw
  [Previous line repeated 2 more times]
  File "/home/78176d/.conda/envs/dvc/lib/python3.6/site-packages/urllib3/connectionpool.py", line 720, in urlopen
    method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
  File "/home/78176d/.conda/envs/dvc/lib/python3.6/site-packages/urllib3/util/retry.py", line 436, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='remote.dvc.org', port=443): Max retries exceeded with url: /dataset-registry/a3/04afb96060aad90176268345e10355 (Caused by SSLError(SSLError(185090057, '[X509] PEM lib (_ssl.c:3732)'),))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/78176d/.conda/envs/dvc/lib/python3.6/site-packages/dvc/remote/http.py", line 104, in _request
    res = self._session.request(method, url, **kwargs)
  File "/home/78176d/.conda/envs/dvc/lib/python3.6/site-packages/requests/sessions.py", line 533, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/78176d/.conda/envs/dvc/lib/python3.6/site-packages/requests/sessions.py", line 646, in send
    r = adapter.send(request, **kwargs)
  File "/home/78176d/.conda/envs/dvc/lib/python3.6/site-packages/requests/adapters.py", line 514, in send
    raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='remote.dvc.org', port=443): Max retries exceeded with url: /dataset-registry/a3/04afb96060aad90176268345e10355 (Caused by SSLError(SSLError(185090057, '[X509] PEM lib (_ssl.c:3732)'),))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/78176d/.conda/envs/dvc/lib/python3.6/site-packages/dvc/command/get.py", line 41, in _get_file_from_repo
    rev=self.args.rev,
  File "/home/78176d/.conda/envs/dvc/lib/python3.6/site-packages/dvc/repo/get.py", line 55, in get
    repo.pull_to(path, PathInfo(out))
  File "/home/78176d/.conda/envs/dvc/lib/python3.6/site-packages/dvc/external_repo.py", line 75, in pull_to
    self._pull_cached(out, to_info)
  File "/home/78176d/.conda/envs/dvc/lib/python3.6/site-packages/dvc/external_repo.py", line 90, in _pull_cached
    self.cloud.pull(out.get_used_cache())
  File "/home/78176d/.conda/envs/dvc/lib/python3.6/site-packages/dvc/data_cloud.py", line 97, in pull
    cache, jobs=jobs, remote=remote, show_checksums=show_checksums
  File "/home/78176d/.conda/envs/dvc/lib/python3.6/site-packages/dvc/remote/local.py", line 394, in pull
    download=True,
  File "/home/78176d/.conda/envs/dvc/lib/python3.6/site-packages/dvc/remote/local.py", line 358, in _process
    download=download,
  File "/home/78176d/.conda/envs/dvc/lib/python3.6/site-packages/dvc/remote/local.py", line 279, in status
    md5s, jobs=jobs, name=str(remote.path_info)
  File "/home/78176d/.conda/envs/dvc/lib/python3.6/site-packages/dvc/remote/base.py", line 849, in cache_exists
    ret = list(itertools.compress(checksums, in_remote))
  File "/home/78176d/.conda/envs/dvc/lib/python3.6/concurrent/futures/_base.py", line 586, in result_iterator
    yield fs.pop().result()
  File "/home/78176d/.conda/envs/dvc/lib/python3.6/concurrent/futures/_base.py", line 432, in result
    return self.__get_result()
  File "/home/78176d/.conda/envs/dvc/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
  File "/home/78176d/.conda/envs/dvc/lib/python3.6/concurrent/futures/thread.py", line 56, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/78176d/.conda/envs/dvc/lib/python3.6/site-packages/dvc/remote/base.py", line 842, in exists_with_progress
    ret = self.exists(path_info)
  File "/home/78176d/.conda/envs/dvc/lib/python3.6/site-packages/dvc/remote/http.py", line 53, in exists
    return bool(self._request("HEAD", path_info.url))
  File "/home/78176d/.conda/envs/dvc/lib/python3.6/site-packages/dvc/remote/http.py", line 121, in _request
    raise DvcException("could not perform a {} request".format(method))
dvc.exceptions.DvcException: could not perform a HEAD request
------------------------------------------------------------


Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!

@ghost
Copy link

ghost commented Feb 5, 2020

@guerinclement , can you try doing a wget or curl request to request.dvc.org ? looks like you have problems with our domain, I want to know if it is specific to dvc or you also have troubles with other tools.

Could you try with one of the followings?

$ wget https://remote.dvc.org/dataset-registry/a3/04afb96060aad90176268345e10355

$ curl -L https://remote.dvc.org/dataset-registry/a3/04afb96060aad90176268345e10355 > file.xml

@ghost
Copy link

ghost commented Feb 5, 2020

Also, @guerinclement , could you explain a bit more what do you usually do to solve this problem? (where do you put the ROOT_CA)

@ghost ghost added the awaiting response we are waiting for your reply, please respond! :) label Feb 5, 2020
@guerinclement
Copy link
Author

@MrOutis both wget and curl commands works fine.
I usually append my company root ca to the end of the certifi cacert.pem file.

In the current case, I created a new conda env, named dvc, and this file is located here:
/home/78176d/.conda/envs/dvc/lib/python3.6/site-packages/certifi/cacert.pem

@ghost ghost removed the awaiting response we are waiting for your reply, please respond! :) label Feb 5, 2020
@ghost
Copy link

ghost commented Feb 5, 2020

@guerinclement , I would take a look, thanks for the info!

Just checking, what version of certifi do you have installed? It should look something like this: https://gist.github.com/mroutis/feb56d618ade10f7d03bc7dd40ce9d29

What is the output of running python -m certifi?

Also, by any means, do you have a REQUESTS_CA_BUNDLE environment variable?

Could you try with executing the curl command that I posted above with the following env var setup:

$ CURL_CA_BUNDLE=/home/78176d/.conda/envs/dvc/lib/python3.6/site-packages/certifi/cacert.pem curl -L https://remote.dvc.org/dataset-registry/a3/04afb96060aad90176268345e10355 > file.xml

@ghost ghost added the awaiting response we are waiting for your reply, please respond! :) label Feb 6, 2020
@guerinclement
Copy link
Author

@MrOutis
Here is the output of:
(dvc) [78176d@slhdg002 ~]$ python -m certifi
/home/78176d/.conda/envs/dvc/lib/python3.6/site-packages/certifi/cacert.pem

Here is the installed version of certifi:
(dvc) [78176d@slhdg002 ~]$ conda list | grep certifi
ca-certificates 2019.11.27 0
certifi 2019.11.28 py36_0

No env variable named REQUESTS_CA_BUNDLE is set.
The curl command works fine with or without the CURL_CA_BUNDLE variable set.

@guerinclement
Copy link
Author

@MrOutis
BTW, printing the content of ca_certs on the line 336 (see stacktrace above) of file:

/home/78176d/.conda/envs/dvc/lib/python3.6/site-packages/urllib3/util/ssl_.py

gives:

/home/78176d/.conda/envs/dvc/lib/python3.6/site-packages/certifi/cacert.pem

which does contain my company ROOT_CA.

@ghost
Copy link

ghost commented Feb 12, 2020

@guerinclement , thanks for the info!

I'm quite lost right here, I thought that there was a missing cert on your cacert.pem, thus I tried to replicate the error by removing the Baltimore CyberTrust Root cert from cacert.pem, but I got a different result.

Mine: SSLError("bad handshake: Error([('SSL routines', 'tls_process_server_certificate', 'certificate verify failed')])"))

Yours: SSLError(185090057, '[X509] PEM lib (_ssl.c:3732)')

So I tried to look up the context where the error was raised: https://github.com/python/cpython/blob/3.6/Modules/_ssl.c#L3716-L3737

And it looks like SSL_CTX_load_verify_locations is the culprit, proceed to read the man page but couldn't understand what was going on:

For SSL_CTX_load_verify_locations the following return values can occur:

0
The operation failed because CAfile and CApath are NULL or the processing at one of the locations specified failed. Check the error stack to find out the reason.

1
The operation succeeded.

I thought that there was a problem with verifying cacert.pem, but when I tried with a corrupted cert, the error was different: OpenSSL.SSL.Error: [('PEM routines', 'PEM_read_bio_ex', 'bad base64 decode'), ('x509 certificate routines', 'X509_load_cert_crl_file', 'PEM lib')]

At this point, I don't know how to proceed 😥

@guerinclement , could you try using the system certificates $(curl-config --ca) instead of the ones provided with certifi?

REQUESTS_CA_BUNDLE=$(curl-config --ca) dvc get https://github.com/iterative/dataset-registry get-started/data.xml -o data/data2.xml -v

(cc: @efiop , any suggestions? 😞)

@ghost
Copy link

ghost commented Feb 12, 2020

@guerinclement , didn't ask before, but did you try the dvc get without modifying the certifi/cacert.pem?

Looks like X509 is complaining at a particular certificate, it would also help if you can run openssl x509 -in [your-company-cert] -text and see if you see an error or a message like: unable to load certificate

@efiop
Copy link
Contributor

efiop commented Feb 26, 2020

Closing as stale. Please feel free to reopen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting response we are waiting for your reply, please respond! :) bug Did we break something? p0-critical Critical issue. Needs to be fixed ASAP. research
Projects
None yet
Development

No branches or pull requests

2 participants