Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perceval behind a firewall #388

Closed
pixelpshr opened this issue May 30, 2018 · 27 comments
Closed

Perceval behind a firewall #388

pixelpshr opened this issue May 30, 2018 · 27 comments

Comments

@pixelpshr
Copy link

I'm not sure if this is a problem with Perceval or my brainspace. (New Python user.)
My corporate firewall routes http and https traffic through the same proxy, http://proxy.my.com:80. (It also inserts its on CA auth chain into SSL traffic, but I don't think I've gotten far enough to worry about that problem yet.)
Can anyone help me figure this out? Are firewall proxies supported by Perceval?

Running the Perceval example produces this output:

$ perceval git https://github.com/grimoirelab/perceval.git
[2018-05-30 14:24:28,450] - Sir Perceval is on his quest.
Traceback (most recent call last):
  File "/home/tdecarlo/venvs/grimoirelab/lib/python3.5/site-packages/perceval/backend.py", line 380, in run
    for item in items:
  File "/home/tdecarlo/venvs/grimoirelab/lib/python3.5/site-packages/perceval/backend.py", line 484, in fetch
    raise e
  File "/home/tdecarlo/venvs/grimoirelab/lib/python3.5/site-packages/perceval/backend.py", line 478, in fetch
    for item in items:
  File "/home/tdecarlo/venvs/grimoirelab/lib/python3.5/site-packages/perceval/backend.py", line 127, in fetch
    for item in self.fetch_items(category, **kwargs):
  File "/home/tdecarlo/venvs/grimoirelab/lib/python3.5/site-packages/perceval/backends/core/git.py", line 149, in fetch_items
    latest_items)
  File "/home/tdecarlo/venvs/grimoirelab/lib/python3.5/site-packages/perceval/backends/core/git.py", line 262, in __fetch_from_repo
    repo = self.__create_git_repository()
  File "/home/tdecarlo/venvs/grimoirelab/lib/python3.5/site-packages/perceval/backends/core/git.py", line 312, in __create_git_repository
    repo = GitRepository.clone(self.uri, self.gitpath)
  File "/home/tdecarlo/venvs/grimoirelab/lib/python3.5/site-packages/perceval/backends/core/git.py", line 810, in clone
    cls._exec(cmd, env=env)
  File "/home/tdecarlo/venvs/grimoirelab/lib/python3.5/site-packages/perceval/backends/core/git.py", line 1275, in _exec
    raise RepositoryError(cause=cause)
perceval.errors.RepositoryError: git command - Cloning into bare repository '/home/tdecarlo/.perceval/repositories/https://github.com/grimoirelab/perceval.git-git'...
fatal: unable to access 'https://github.com/grimoirelab/perceval.git/': Failed to connect to github.com port 443: Connection timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/tdecarlo/venvs/grimoirelab/bin/perceval", line 194, in <module>
    main()
  File "/home/tdecarlo/venvs/grimoirelab/bin/perceval", line 112, in main
    cmd.run()
  File "/home/tdecarlo/venvs/grimoirelab/lib/python3.5/site-packages/perceval/backend.py", line 387, in run
    raise RuntimeError(str(e))
RuntimeError: git command - Cloning into bare repository '/home/tdecarlo/.perceval/repositories/https://github.com/grimoirelab/perceval.git-git'...
fatal: unable to access 'https://github.com/grimoirelab/perceval.git/': Failed to connect to github.com port 443: Connection timed out

Similarly, using the example of perceval in a python script that looks like this:

#! /usr/bin/env python3

from perceval.backends.core.git import Git

http_proxy = 'http://proxy.my.com:80'
https_proxy = 'http://proxy.my.com:80'

# url for the git repo to analyze
repo_url = 'https://github.com/grimoirelab/perceval.git'
# directory for letting Perceval clone the git repo
repo_dir = '/tmp/perceval.git'

# create a Git object, pointing to repo_url, using repo_dir for cloning
repo = Git(uri=repo_url, gitpath=repo_dir)
# fetch all commits as an iteratoir, and iterate it printing each hash
for commit in repo.fetch():
    print(commit['data']['commit'])

produces very similar output:

$ python3 perceval_git.py
Traceback (most recent call last):
  File "perceval_git.py", line 16, in <module>
    for commit in repo.fetch():
  File "/home/tdecarlo/venvs/grimoirelab/lib/python3.5/site-packages/perceval/backend.py", line 127, in fetch
    for item in self.fetch_items(category, **kwargs):
  File "/home/tdecarlo/venvs/grimoirelab/lib/python3.5/site-packages/perceval/backends/core/git.py", line 149, in fetch_items
    latest_items)
  File "/home/tdecarlo/venvs/grimoirelab/lib/python3.5/site-packages/perceval/backends/core/git.py", line 262, in __fetch_from_repo
    repo = self.__create_git_repository()
  File "/home/tdecarlo/venvs/grimoirelab/lib/python3.5/site-packages/perceval/backends/core/git.py", line 312, in __create_git_repository
    repo = GitRepository.clone(self.uri, self.gitpath)
  File "/home/tdecarlo/venvs/grimoirelab/lib/python3.5/site-packages/perceval/backends/core/git.py", line 810, in clone
    cls._exec(cmd, env=env)
  File "/home/tdecarlo/venvs/grimoirelab/lib/python3.5/site-packages/perceval/backends/core/git.py", line 1275, in _exec
    raise RepositoryError(cause=cause)
perceval.errors.RepositoryError: git command - Cloning into bare repository '/tmp/perceval.git'...
fatal: unable to access 'https://github.com/grimoirelab/perceval.git/': Failed to connect to github.com port 443: Connection timed out
@sduenas
Copy link
Member

sduenas commented May 31, 2018

This is a little weird. Do you have problems cloning git repositories using git command? Under the hood, perceval is calling to that command.

@pixelpshr
Copy link
Author

no, no problems with git at all. In fact, I git cloned the perceval repo to try to figure out the problem, but it is still beyond me.

@valeriocos
Copy link
Member

I don't know if this can help, but you could try to:

@jgbarah
Copy link
Contributor

jgbarah commented May 31, 2018

Just a suggestion to try to figure out where the problem is: could you ensure you can clone the git repo * with the same link Perceval is using*? That is, just run:

$ git clone https://github.com/grimoirelab/perceval.git/
$ cd perceval
$ git fetch

Just to ensure that we don't have a problem on that side (since there are several ways of cloning a git repo, just to discard scenarios).

This said, the connection problem could be related to your firewall using their own CA (I assume it is not only its own CA, but a complete re-encryption, providing a fake certificate to the client), because maybe git is not being able of establishing a SSL connection because the check of the certificate is failing.

BTW, I assume the firewall is doing all of this transparently, right?

All this said, I suggest that you use the advice in w3c/epubcheck#771 . In short:

The right way to go is using https than should be enable by your firewall. However, you need to declare the proxy corporate to git. Depending on whether you need to authenticate or not you should add this information to the global configuration of git :
git config --global http.proxy http://proxyuser:[email protected]:8080
or
git config --global http.proxy http://proxy.server.com:8080

Then the command
git clone https://github.com/WSchindler/epubcheck.git
should work

You should run git config in the same account that you later run Perceval.

@pixelpshr
Copy link
Author

pixelpshr commented Jun 1, 2018

The "git clone..." line does work.

Hmmm... the git config might have fixed that part. At least, the python script (above) doesn't crash now.
Now to figure out how that mordred thingy is supposed to work. I'm trying to turn the Dockerfile-full into a bash script (since the Docker container can't reach through the firewalls, either).

@pixelpshr pixelpshr reopened this Jun 1, 2018
@pixelpshr
Copy link
Author

pixelpshr commented Jun 1, 2018

Ok, I know nothing at all about pipermail, but it seems to have the same proxy problems as perceval had with git. Should I open a different thread about that, or is the answer simple enough that it can be tacked onto this?

@jgbarah
Copy link
Contributor

jgbarah commented Jun 4, 2018

I've been researching a bit about this, and it seems using environment variables could be a solution, at least in some cases. I've done some testing and it seems to work with the 'grimoire/full' Docker image, although I'm not sure if my proxy setup is comparable to yours. Would you mind trying to run the container as:

docker run -p 127.0.0.1:5601:5601 -v $(pwd)/mordred-credentials-jgb.cfg:/override.cfg -e http_proxy=http://163.117.69.195:3128/ -e https_proxy=http://163.117.69.195:3128/ -e no_proxy="localhost,127.0.0.1" -t grimoirelab/full

(but instead of http://163.117.69.195:3128 and https://163.117.69.195:3128 use your proxy url).

If this works, it should work for git, github, mail archives, and very likely for other data sources (most Linux-based tools honor these environment variables).

Could you please try this, and let us know how it worked?

@pixelpshr
Copy link
Author

pixelpshr commented Jun 5, 2018

It looks like even with the explicit proxy definition I still get both proxy errors in pipermail and ssl certificate errors due to the corporate firewall examination and re-signing of ssl packets.

Starting container: 30ea9331ade2
Starting Elasticsearch
[ ok ] Starting Elasticsearch Server:.
Waiting for Elasticsearch to start...
tcp        0      0 0.0.0.0:9200            0.0.0.0:*               LISTEN      -
Elasticsearch started
Starting MariaDB
[ ok ] Starting MariaDB database server: mysqld . . . . ..
Waiting for MariaDB to start...
tcp6       0      0 :::3306                 :::*                    LISTEN      -
MariaDB started
Starting Kibiter
Waiting for Kibiter to start...
.................Kibiter started
Starting Mordred to build a GrimoireLab dashboard
This will usually take a while...
Kibiter/Kibana: version found is 6.1.0-3
Kibiter/Kibana: configured!
Dashboard panels, visualizations: uploading...
Dashboard panels, visualizations: uploaded!
Dashboard menu: uploading for 6 ...
Dashboard menu: uploaded!
Collection for pipermail: starting...
Collection for github: starting...
Collection for git: starting...
2018-06-05 13:06:29,217 - grimoire_elk.elk - ERROR - Error feeding ocean from pipermail
(https://lists.linuxfoundation.org/pipermail/oss-health-metrics/):
HTTPSConnectionPool(host='lists.linuxfoundation.org', port=443): Max retries exceeded with url: /pipermail/oss-health-metrics/ (Caused by SSLError(SSLError(1, '[SSL:
CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:720)'),))
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/urllib3/connectionpool.py", line 595, in urlopen
    self._prepare_proxy(conn)
  File "/usr/local/lib/python3.5/dist-packages/urllib3/connectionpool.py", line 816, in _prepare_proxy
    conn.connect()
  File "/usr/local/lib/python3.5/dist-packages/urllib3/connection.py", line 326, in connect
    ssl_context=context)
  File "/usr/local/lib/python3.5/dist-packages/urllib3/util/ssl_.py", line 329, in ssl_wrap_socket
    return context.wrap_socket(sock, server_hostname=server_hostname)
  File "/usr/lib/python3.5/ssl.py", line 385, in wrap_socket
    _context=self)
  File "/usr/lib/python3.5/ssl.py", line 760, in __init__
    self.do_handshake()
  File "/usr/lib/python3.5/ssl.py", line 996, in do_handshake
    self._sslobj.do_handshake()
  File "/usr/lib/python3.5/ssl.py", line 641, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:720)

@valeriocos
Copy link
Member

valeriocos commented Jun 5, 2018

To avoid CERTIFICATE_VERIFY_FAILED, you should use the parameter --no-verify (which has been recently included in Perceval codebase)

Hope this helps

@pixelpshr
Copy link
Author

pixelpshr commented Jun 5, 2018

How would I get that parameter into the docker container? Or, for that matter, where would it be added to the dashboard set up when running outside of docker?

@valeriocos
Copy link
Member

SInce you are using the grimoire/full docker image, I guess you won't have the latest version of Perceval (right, @jgbarah?).
Once the new image is ready, you just need to modify the setup.cfg. In the section called [pipermail], add the parameter no-verify = true

@jgbarah
Copy link
Contributor

jgbarah commented Jun 5, 2018

I didn't produce container images for 18.05-03 because of a bug in Mordred which prevents from having a working configuration with grimoirelab/secured. But, @pixelpshr I could produce a container image for you if you can have a spin at it, so we can check if it works with that kind of firewalls. However, very likely 18.05-04 will be out in two days, I'm planning to produce container images right afterwards.

@jgbarah
Copy link
Contributor

jgbarah commented Jun 5, 2018

It looks like even with the explicit proxy definition I still get both proxy errors in pipermail and ssl certificate errors due to the corporate firewall examination and re-signing of ssl packets.

Do you have access to the firewall signing CA certificate? I'm thinking that another (safest) option would be to install that certificate in the container image... If you do, let me know, and I can have a look at that. I'm a bit limited in testing, I'm not sure how to setup a testing environment like that (but I can try).

@pixelpshr
Copy link
Author

I don't think there is any need to rush the process. I can either grab a container for testing, or wait until the next release. I'll leave it up to you.

@pixelpshr
Copy link
Author

pixelpshr commented Jun 5, 2018

Do you have access to the firewall signing CA certificate? I'm thinking that another (safest) option would be to install that certificate in the container image...

I do have the addition *.pem file that would need to be added to the container to work in my environment. (But, I'm sure that I cannot distribute it.) I have been trying to figure out how to build some sort of wrapper around the grimoirelab/full container that would allow me to insert the certs, but having no luck.
I have been able to add the certs to the docker environment itself, so it can access the dockerhub, but I haven't figured out how to provide the components within the container with those certs.

@jgbarah
Copy link
Contributor

jgbarah commented Jun 5, 2018

I don't think there is any need to rush the process. I can either grab a container for testing, or wait until the next release. I'll leave it up to you.

Finally, I had to produce a container image for testing some other stuff, so I uploaded that image to DockerHub as grimoirelab/full:testing. It is produced with the head of the master branch for all GrimoireLab modules, so it should include the patch you need for testing no-verify = true in the config file, as @valeriocos suggests above.

If you have some time, please give it a try, and let us know if it worked.

@jgbarah
Copy link
Contributor

jgbarah commented Jun 5, 2018

I do have the addition *.pem file that would need to be added to the container to work in my environment.

What you need usually are the public certificates. I'm not familiar to certificates formats, but in my Debian box, root CA certs come in .crt format. You can see an example of how to install them in Debian in https://www.brightbox.com/blog/2014/03/04/add-cacert-ubuntu-debian/ (that's for CACert, but the process is the same). In the end, what is needed is to add the certificate to /usr/local/share/ca-certificates within the container, and then run update-ca-certificates (also within the container, of course), see https://manpages.debian.org/stretch/ca-certificates/update-ca-certificates.8.en.html .

If you can get your public cert in that format, I could either detail the process, or produce a script, so that you can test the container image with it.

@pixelpshr
Copy link
Author

@jgbarah, I can get the certs in *.crt format, too. It sounds like you know the magical incantation required to get the certs into the container! I think that's what I'm missing.
I will also grab that testing container to see how it works.

@pixelpshr
Copy link
Author

Using the grimoirelab/full:testing with the addition of the no-verify=true configuration does let perceval /pipermail get past the firewall. Yay!
However, now I'm noticing the same problem in the git connection, as that is failing now. I think I fixed this when running outside of the docker container by executing "git config --global https.proxy ..." But, I don't know how to run that inside of the container.

2018-06-06 18:07:04,612 - grimoire_elk.elk - ERROR - Error feeding ocean from git (https://github.com/chaoss/grimoirelab): git command - Cloning into bare repository '/home/grimoirelab/.perceval/repositories/https://github.com/chaoss/grimoirelab-git'...
fatal: unable to access 'https://github.com/chaoss/grimoirelab/': Failed to connect to github.com port 443: Connection timed out

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/grimoire_elk/elk.py", line 208, in feed_backend
    ocean_backend.feed()
  File "/usr/local/lib/python3.5/dist-packages/grimoire_elk/raw/elastic.py", line 204, in feed
    self.feed_items(items)
  File "/usr/local/lib/python3.5/dist-packages/grimoire_elk/raw/elastic.py", line 213, in feed_items
    for item in items:
  File "/usr/local/lib/python3.5/dist-packages/perceval/backend.py", line 127, in fetch
    for item in self.fetch_items(category, **kwargs):
  File "/usr/local/lib/python3.5/dist-packages/perceval/backends/core/git.py", line 149, in fetch_items
    latest_items)
  File "/usr/local/lib/python3.5/dist-packages/perceval/backends/core/git.py", line 262, in __fetch_from_repo
    repo = self.__create_git_repository()
  File "/usr/local/lib/python3.5/dist-packages/perceval/backends/core/git.py", line 312, in __create_git_repository
    repo = GitRepository.clone(self.uri, self.gitpath)
  File "/usr/local/lib/python3.5/dist-packages/perceval/backends/core/git.py", line 810, in clone
    cls._exec(cmd, env=env)
  File "/usr/local/lib/python3.5/dist-packages/perceval/backends/core/git.py", line 1275, in _exec
    raise RepositoryError(cause=cause)
perceval.errors.RepositoryError: git command - Cloning into bare repository '/home/grimoirelab/.perceval/repositories/https://github.com/chaoss/grimoirelab-git'...
fatal: unable to access 'https://github.com/chaoss/grimoirelab/': Failed to connect to github.com port 443: Connection timed out

@pixelpshr
Copy link
Author

Oh! I think I solved that git problem with this:

$ docker exec -it <container id> git config --global http.proxy http://proxy.mycompany.com:80

@jgbarah
Copy link
Contributor

jgbarah commented Jun 6, 2018

Oh! I think I solved that git problem with this:

Great! I'm opening an issue for Perceval, because it should use this trick when the no-verify flag is passed to it, so that you don't need to do that.

@jgbarah
Copy link
Contributor

jgbarah commented Jun 6, 2018

Ooops, I read too fast. I'm not sure I fully understand what you did. Let's recap, please:

  • For pipermail, you used 'no-verify=true' in the [pipermail] section of the SirMordred configuration file, right? Didn't you need specifying somehow where the proxy is?

  • For git, you didn't use no-verify=true in the [git] section, right? The only thing you had to do is run git config in the container as

docker exec -it <container id> git config --global http.proxy http://proxy.mycompany.com:80

And then run the container with no specific option?

All of this with the grimoirelab/full:testing container image, right?

@pixelpshr
Copy link
Author

pixelpshr commented Jun 7, 2018

Correct on all counts.
I'm still not really satisfied with bypassing the SSL checks in pipermail, though. I think this StackOverflow thread provides most of my answer.

docker run -v /host/path/to/certs:/container/path/to/certs -d IMAGE_ID "update-ca-certificates"

The only remaining question is, where do I put the certs in the container? (What is the "/container/path/to/certs"?)

@sduenas
Copy link
Member

sduenas commented Jun 7, 2018

Maybe you can try with this or this.

Perceval uses requests package to talk HTTP protocol. It can look for certificates in a given directory using the env variable REQUESTS_CA_BUNDLE. More info here

It also uses certifi to handle with certificates, so this post might be also useful.

@jgbarah
Copy link
Contributor

jgbarah commented Jun 7, 2018

For git, I'm pretty sure the certificates are those "in the system", which in the case of our container image, which is based on Debian, should be /usr/local/share/ca-certificates (see for example https://www.brightbox.com/blog/2014/03/04/add-cacert-ubuntu-debian/ ).

But I don't know if requests uses those certificates. According to this entry in StackOverflow it does use Debian certificates if requests own certificates are deleted, or if the next environment variable is used: REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt.

So, the whole process would be:

  • To install the certificate for your proxy in the path /usr/local/share/ca-certificates, using the -vtrick you showed above.
  • Run the container with the above environment variable, using -e REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt
  • Run update-ca-certificates in the container
  • Then run mordred

If you can try this procedure, I can modify the script that the container runs as default to conditionally do this.

@pixelpshr
Copy link
Author

Ok, that bit with the REQUESTS_CA_BUNDLE seems to have done the final trick! I've got a script now that runs the standard grimoirelab/full container, without the no-verify=true, and pipermail is happy.

Here is my script (notice that I'm copying the cert files into the container, rather than mounting the directory):

#!/bin/sh -v

HTTP_PROXY="http://proxy.mycompany.com:80/"
HTTPS_PROXY="http://proxy.mycompany.com:80/"
NO_PROXY="localhost,*.local,127.0.0.1"

IMAGE="grimoirelab/full"

# first, remove any old version of this container
docker rm myGrimoire

# start the main container
#   Notice the -d option to run detached from the console.
docker run --name myGrimoire \
    -p 5601:5601 \
    --env http_proxy=$HTTP_PROXY \
    --env https_proxy=$HTTPS_PROXY \
    --env no_proxy=$NO_PROXY \
    --env HTTP_PROXY=$HTTP_PROXY \
    --env HTTPS_PROXY=$HTTPS_PROXY \
    --env NO_PROXY=$NO_PROXY \
    --env REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt \
    -v $(pwd)/credentials.cfg:/override.cfg \
    -v $(pwd)/logs:/logs \
    -dt $IMAGE

# add corporate certs to the (Ubuntu-based) container
docker cp /usr/local/share/ca-certificates/BA-Root.crt myGrimoire:usr/local/share/ca-certificates
docker cp /usr/local/share/ca-certificates/BA-NPE-CA1.crt myGrimoire:usr/local/share/ca-certificates
docker cp /usr/local/share/ca-certificates/BA-NPE-CA3.crt myGrimoire:usr/local/share/ca-certificates
docker exec -it myGrimoire sudo update-ca-certificates

# add corporate http proxy to the container for git
docker exec -it myGrimoire git config --global http.proxy $HTTP_PROXY
docker exec -it myGrimoire git config --global https.proxy $HTTPS_PROXY

@jgbarah
Copy link
Contributor

jgbarah commented Jun 9, 2018

Thanks a lot! I will include these tricks in the container and/or in the documentation. That you could finally get your stuff working is great news for me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants