Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

100% processor usage during GET have to wait 60s for response #1910

Closed
e-manuel opened this issue Feb 11, 2014 · 39 comments
Closed

100% processor usage during GET have to wait 60s for response #1910

e-manuel opened this issue Feb 11, 2014 · 39 comments

Comments

@e-manuel
Copy link

When GET request have to wait 60s for remote service response, processor usage increases to 100% - version 1 of "requests" worked in this case better.
GET is configured with "cert" data and "timeout=120" over SSL connection.

@Lukasa
Copy link
Member

Lukasa commented Feb 11, 2014

Do you have a publicly-accessible URL that I can test against?

@e-manuel
Copy link
Author

More details: GET is fired by sub-process (multiprocessing) - GET is calling API exposed by GSM operator - by this API I am receiving SMS and MMS. Other tasks fired by sub-proceses are using CPU below 1% (server with 2 Xeons).
I can expose on my server, over valid SSL, PHP script with 60s delay if this is what You mean?
Unfortunately I can not give You access to API exposed by GSM operator :(

@Lukasa
Copy link
Member

Lukasa commented Feb 11, 2014

Hmm. In that case, can you show me your code? Feel free to remove identifying information (like URLs, header values etc.)

@e-manuel
Copy link
Author

Here You are - direct call of requests.get from my library
(which is launched by multiprocessing.Process(target=launch_function,
args=('arg1', 'arg2')) ):

p_par = {
'login': 'xxx',
'password': 'xxx',
'serviceId': '0000',
'timeout': 60000, # wait 60s for response
'deleteContent': 'true',
'manualconfirm': 'true'
}

p_uri_part = 'getsms.aspx'

def __request_api2(self, p_par, p_uri_part=None):
try:
# GSM
if self.__channel == '0':
v_url = '{0}/{1}'.format(self.__API_0_URL, p_uri_part)
v_crt = '{0}smsapi.crt'.format(self.__PATH_RES_SEC)
v_key = '{0}smsapi.key'.format(self.__PATH_RES_SEC)
try:
v_rsp = requests.get(
v_url, params=p_par
,cert=(v_crt, v_key)
,timeout=120

,verify='{0}ca_root_certificates.pem'.format(self.__PATH_RES_SEC)
)
v_res = v_rsp.text
except Exception as e:
self.__add_error('Error when invoking API: {0}'.format(str(e)))
v_res = 0
finally:
return v_res

ca_root_certificates.pem - is grabbed from Mozilla - CA bundle

Best regards,
Artur

@Lukasa
Copy link
Member

Lukasa commented Feb 11, 2014

GitHub's email formatting made that pretty difficult to read, so I copy-pasted it into a code block here:

p_par = {
   'login': 'xxx',
   'password': 'xxx',
   'serviceId': '0000',
   'timeout': 60000, # wait 60s for response
   'deleteContent': 'true',
   'manualconfirm': 'true'
}

p_uri_part = 'getsms.aspx'

def __request_api2(self, p_par, p_uri_part=None):
   try:
      # GSM
      if self.__channel == '0':
         v_url = '{0}/{1}'.format(self.__API_0_URL, p_uri_part)
         v_crt = '{0}smsapi.crt'.format(self.__PATH_RES_SEC)
         v_key = '{0}smsapi.key'.format(self.__PATH_RES_SEC)
         try:
            v_rsp = requests.get(
               v_url, params=p_par
              ,cert=(v_crt, v_key)
              ,timeout=120
              ,verify='{0}ca_root_certificates.pem'.format(self.__PATH_RES_SEC)
            )
            v_res = v_rsp.text
         except Exception as e:
            self.__add_error('Error when invoking API: {0}'.format(str(e)))
            v_res = 0
   finally:
      return v_res

@Lukasa
Copy link
Member

Lukasa commented Feb 11, 2014

Hmm, there's nothing obviously wrong with that. When I get a moment I'll try to reproduce something similar on my machine.

@t-8ch
Copy link
Contributor

t-8ch commented Feb 11, 2014

I am in my phone, so I'll be brief. I suspect this is pyopenssl interacting with tje timeout. Especially the makefile function in urllib3.contrib.pyopenssl busylooping when no data is being recieved

@Lukasa
Copy link
Member

Lukasa commented Feb 11, 2014

Wait, does it do that? That's pretty awful.

@t-8ch
Copy link
Contributor

t-8ch commented Feb 11, 2014

This is a guess (I think I stumbled upon this previously). I'll check it later. Maybe gevent is also an issue. Gevent and pyopenssl don't really work together.

@t-8ch
Copy link
Contributor

t-8ch commented Feb 11, 2014

Hm, I can't reproduce this at the moment.

@e-manuel Could you give more information about your environment:

  • Python version
  • Operating System
  • Using gevent? (or the like)
  • Using pyOpenSSL?
  • Does it work without multiprocessing?

@e-manuel
Copy link
Author

  • Python 2.7.6
  • FreeBSD 9.0
  • installed gevent 1.0 but in this case is not used
  • I have installed py27- openssl-0.13
  • my test: GET fired from python prompt work the same way (100% CPU after a few seconds) as sub-process fired from multiprocessing library

@t-8ch
Copy link
Contributor

t-8ch commented Feb 11, 2014

Do you also have pyasn1 and ndg-httpsclient installed? (If they and pyOpenSSL are installed they trigger a different codepath)

@e-manuel
Copy link
Author

Yes, I have: py27-asn1-0.1.4_1,1, py27-ndg_httpsclient-0.3.2, py-openssl 0.13 but following the FreeBSD ports tree:

  • py27-asn1 is required by py27-ndg_httpsclient
  • py27-ndg_httpsclient is required by py-urllib3 (mentioned by @t-8ch) and requires to run: py-openssl and py-asn1
    Of course, I can de-install any of them, but I like to keep full functionality of requests (SSL, cert. etc.). What I should to do - "to be or not to be, this is the question..." ;)

@t-8ch
Copy link
Contributor

t-8ch commented Feb 11, 2014

So I managed to reproduce it:

Terminal 1:

# netcat from the nmap project, should work with all ssl capable netcats
# We complete the SSL handshake but don't send a HTTP response line!
$ ncat --ssl -l 0.0.0.0 4000

Code:

request.get('https://127.0.0.1:4000', verify=False, timeout=120)

It seems this only happens when using PyOpenSSL, timeouts and FreeBSD.
I can't reproduce it any other way, permutating those conditions and on Linux.

@e-manuel Pick your poison :-)

@t-8ch
Copy link
Contributor

t-8ch commented Feb 11, 2014

Manually waiting for the buffer to be ready seems to fix it:

diff --git a/requests/packages/urllib3/contrib/pyopenssl.py b/requests/packages/urllib3/contrib/pyopenssl.py
index d9bda15..67eeee4 100644
--- a/requests/packages/urllib3/contrib/pyopenssl.py
+++ b/requests/packages/urllib3/contrib/pyopenssl.py
@@ -156,6 +156,7 @@ class fileobject(_fileobject):
                 try:
                     data = self._sock.recv(rbufsize)
                 except OpenSSL.SSL.WantReadError:
+                    select.select([self._sock], [], [])
                     continue
                 if not data:
                     break
@@ -183,6 +184,7 @@ class fileobject(_fileobject):
                 try:
                     data = self._sock.recv(left)
                 except OpenSSL.SSL.WantReadError:
+                    select.select([self._sock], [], [])
                     continue
                 if not data:
                     break
@@ -234,6 +236,7 @@ class fileobject(_fileobject):
                                 break
                             buffers.append(data)
                     except OpenSSL.SSL.WantReadError:
+                        select.select([self._sock], [], [])
                         continue
                     break
                 return "".join(buffers)
@@ -244,6 +247,7 @@ class fileobject(_fileobject):
                 try:
                     data = self._sock.recv(self._rbufsize)
                 except OpenSSL.SSL.WantReadError:
+                    select.select([self._sock], [], [])
                     continue
                 if not data:
                     break
@@ -271,7 +275,8 @@ class fileobject(_fileobject):
                 try:
                     data = self._sock.recv(self._rbufsize)
                 except OpenSSL.SSL.WantReadError:
-                        continue
+                    select.select([self._sock], [], [])
+                    continue
                 if not data:
                     break
                 left = size - buf_len

(Insert lengthy rant about platform differences, especially concerning openssl here)

@sigmavirus24
Copy link
Contributor

(Insert lengthy rant about platform differences, especially concerning openssl here)

Yeah, platforms stink. Let's all develop on Windows. =P

@Lukasa
Copy link
Member

Lukasa commented Feb 12, 2014

I ranted about this to my housemate last night, by the by. I don't understand why PyOpenSSL busy-waits on this socket instead of calling select like a good person. @t-8ch are you making changes in the urllib3 version, or are you going to submit your patch upstream?

@e-manuel
Copy link
Author

I would like to ask what next - there will be new release of requests ? I would not go back to pycurl :(

@sigmavirus24
Copy link
Contributor

@e-manuel all of the changes are in PyOpenSSL. Since you're using the packages from your distro, you'll need to bother the maintainer of that package when it gets released.

@Lukasa your question is especially relevant since FreeBSD seems to strip out the vendored packages so it should be sent upstream if possible.

@Lukasa
Copy link
Member

Lukasa commented Feb 13, 2014

Yeah, I think we want an answer on what PyOpenSSL is going to do.

@e-manuel In the meantime, you can patch your copy so this stops happening.

@e-manuel
Copy link
Author

@t-8ch thank You very much - Your patch works great, now when GET starts it uses below 2% of CPU and after few seconds less :)

@Lukasa
Copy link
Member

Lukasa commented Feb 13, 2014

Hurrah! Let's leave this open until we get an idea of what's happening upstream.

@t-8ch saves the day again! I should buy him a boat.

@t-8ch
Copy link
Contributor

t-8ch commented Feb 13, 2014

But you loose the timeouts I fear (althoug I'm not sure, if timeouts ever worked with pyopenssl).

@e-manuel
Copy link
Author

Timeouts works now in this manner:

  • when GETs target do not exists (is unreachable) timeout works like swiss quartz;
  • when target exists, but not responds, GET waits forever.

@Lukasa
Copy link
Member

Lukasa commented Feb 14, 2014

Hmm. Can we pass the timeout to the select call?

@pasha-r
Copy link

pasha-r commented Feb 14, 2014

Yep.
select.select([self._sock], [], [], self._sock.gettimeout())

@Lukasa
Copy link
Member

Lukasa commented Feb 14, 2014

@pasha-r That's a very neat solution, it didn't even occur to me. That, plus checking the return code from select, should add timeouts back.

@t-8ch
Copy link
Contributor

t-8ch commented Feb 18, 2014

I think we should indeed go with our calling select ourself. PyOpenSSL does not have the timeout functionality and we simply set the timeout on the non ssl socket.

This should work:

diff --git a/requests/packages/urllib3/contrib/pyopenssl.py b/requests/packages/urllib3/contrib/pyopenssl.py
index d9bda15..da44a29 100644
--- a/requests/packages/urllib3/contrib/pyopenssl.py
+++ b/requests/packages/urllib3/contrib/pyopenssl.py
@@ -43,7 +43,7 @@ from ndg.httpsclient.subj_alt_name import SubjectAltName as BaseSubjectAltName
 import OpenSSL.SSL
 from pyasn1.codec.der import decoder as der_decoder
 from pyasn1.type import univ, constraint
-from socket import _fileobject
+from socket import _fileobject, timeout
 import ssl
 import select
 from cStringIO import StringIO
@@ -139,6 +139,13 @@ def get_subj_alt_name(peer_cert):

 class fileobject(_fileobject):

+    def _wait_for_sock(self):
+        rd, wd, ed = select.select([self._sock], [], [],
+                                   self._sock.gettimeout())
+        if not rd:
+            raise timeout()
+
+
     def read(self, size=-1):
         # Use max, disallow tiny reads in a loop as they are very inefficient.
         # We never leave read() with any leftover data from a new recv() call
@@ -156,6 +163,7 @@ class fileobject(_fileobject):
                 try:
                     data = self._sock.recv(rbufsize)
                 except OpenSSL.SSL.WantReadError:
+                    self._wait_for_sock()
                     continue
                 if not data:
                     break
@@ -183,6 +191,7 @@ class fileobject(_fileobject):
                 try:
                     data = self._sock.recv(left)
                 except OpenSSL.SSL.WantReadError:
+                    self._wait_for_sock()
                     continue
                 if not data:
                     break
@@ -234,6 +243,7 @@ class fileobject(_fileobject):
                                 break
                             buffers.append(data)
                     except OpenSSL.SSL.WantReadError:
+                        self._wait_for_sock()
                         continue
                     break
                 return "".join(buffers)
@@ -244,6 +254,7 @@ class fileobject(_fileobject):
                 try:
                     data = self._sock.recv(self._rbufsize)
                 except OpenSSL.SSL.WantReadError:
+                    self._wait_for_sock()
                     continue
                 if not data:
                     break
@@ -271,7 +282,8 @@ class fileobject(_fileobject):
                 try:
                     data = self._sock.recv(self._rbufsize)
                 except OpenSSL.SSL.WantReadError:
-                        continue
+                    self._wait_for_sock()
+                    continue
                 if not data:
                     break
                 left = size - buf_len

The code seems suboptimal but I'd like to try keeping this close to the version from the stdlib (where 99% of the fileobject are taken from)

PS: It seems I was totally confused when not being able to reproduce this on Linux.
It is just the same there.

@Lukasa
Copy link
Member

Lukasa commented Feb 18, 2014

Suits me. =)

@e-manuel
Copy link
Author

@t-8ch You are great - now timeout works fine in every case - and may CPUs can "sleep" calmly ;)

@Lukasa
Copy link
Member

Lukasa commented Mar 23, 2014

This will be fixed in the next release of requests, so I'm closing this. Thanks for your help all!

@Lukasa Lukasa closed this as completed Mar 23, 2014
@tgalery
Copy link

tgalery commented Mar 26, 2014

Hi there, sorry to jump into this discussion, but I'm switching some legacy url resolver mechanism from urllib2 to requests 2.2.1 . Although requests works much better, the server's CPU started to scale in an erratic manner, whereas in the past it was kind of constant under 10% usage. We resolve, http and https urls, using a custom header and verify=True. Do you know if this is an issue and what might be causing it?

@Lukasa
Copy link
Member

Lukasa commented Mar 26, 2014

It's possible that this is your bug, if you're accessing HTTPS urls and have pyopenssl installed. If both of those things are true, try applying the above patch to your installed requests package and check if it resolves your problem.

@tgalery
Copy link

tgalery commented Mar 26, 2014

Nope, I don't use pyopenssl and this also happens, when I don't set the verify option. I wonder whether there's any overhead associated with get methods, as opposed to initialising a Session object and have different threads calling send. If none of this matters substantially, are you aware of any other high CPU issues related to requests?

@Lukasa
Copy link
Member

Lukasa commented Mar 26, 2014

There's definitely overhead when using get: we create a Session object behind the scenes for each call.

You should consider having a single Session object, but note that it's not really thread-safe. If using a single Session, you should either serialize access to it or have it run in its own thread and send messages in and out on queues.

Otherwise, we're not aware of any high CPU issues and it'd be interesting to get diags if you still hit it.

@dav009
Copy link

dav009 commented Mar 28, 2014

I have a similar issue. Basically there are cpu peaks after switching from urllib2 to request to make get and post petitions. Basically Im seeeing a lot of FUTEX WAIT after stracing my process.

I'm a bit confused with respect of the default behaviour of the connection pool.

  • Is there a global Session by default shared among all the requests made?.
  • Do requests to a single host share the same connection in teh pool?
  • Is that shared connection in the pool blocking?
  • Could the pool cache size and pool size affecting?

Thank you in advance

@Lukasa
Copy link
Member

Lukasa commented Mar 28, 2014

  • No. As I said above, we create a Session object behind the scenes if you aren't using one yourself.
  • Not always. Connection pools belong to Sessions, so if you aren't using Sessions you aren't using connection pools. We then have a pool of connections to each host. Whether requests to that host re-use a connection or not depends on whether or not the connection is still open.
  • Requests does not perform non-blocking IO.
  • Could they affect what?

@dav009
Copy link

dav009 commented Mar 28, 2014

Basically in the cpu peaks Im getting lots of futex(..., FUTEX_WAIT, 1, NULL)
I thought that could be related to a some lock on some variables inside requests

@Lukasa
Copy link
Member

Lukasa commented Mar 28, 2014

There is some locking around the connection pool and the cookies dictionary, so it's certainly possible.

zhuhaow referenced this issue in zhuhaow/goagent Sep 27, 2014
Should use sock instead of its file descriptor.
See more at https://github.com/kennethreitz/requests/issues/1910.
zhuhaow referenced this issue in zhuhaow/goagent Sep 27, 2014
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants