100% processor usage during GET have to wait 60s for response #1910

e-manuel · 2014-02-11T11:11:44Z

When GET request have to wait 60s for remote service response, processor usage increases to 100% - version 1 of "requests" worked in this case better.
GET is configured with "cert" data and "timeout=120" over SSL connection.

Lukasa · 2014-02-11T11:13:22Z

Do you have a publicly-accessible URL that I can test against?

e-manuel · 2014-02-11T11:45:43Z

More details: GET is fired by sub-process (multiprocessing) - GET is calling API exposed by GSM operator - by this API I am receiving SMS and MMS. Other tasks fired by sub-proceses are using CPU below 1% (server with 2 Xeons).
I can expose on my server, over valid SSL, PHP script with 60s delay if this is what You mean?
Unfortunately I can not give You access to API exposed by GSM operator :(

Lukasa · 2014-02-11T11:54:41Z

Hmm. In that case, can you show me your code? Feel free to remove identifying information (like URLs, header values etc.)

e-manuel · 2014-02-11T12:50:15Z

Here You are - direct call of requests.get from my library
(which is launched by multiprocessing.Process(target=launch_function,
args=('arg1', 'arg2')) ):

p_par = {
'login': 'xxx',
'password': 'xxx',
'serviceId': '0000',
'timeout': 60000, # wait 60s for response
'deleteContent': 'true',
'manualconfirm': 'true'
}

p_uri_part = 'getsms.aspx'

def __request_api2(self, p_par, p_uri_part=None):
try:
# GSM
if self.__channel == '0':
v_url = '{0}/{1}'.format(self.__API_0_URL, p_uri_part)
v_crt = '{0}smsapi.crt'.format(self.__PATH_RES_SEC)
v_key = '{0}smsapi.key'.format(self.__PATH_RES_SEC)
try:
v_rsp = requests.get(
v_url, params=p_par
,cert=(v_crt, v_key)
,timeout=120

,verify='{0}ca_root_certificates.pem'.format(self.__PATH_RES_SEC)
)
v_res = v_rsp.text
except Exception as e:
self.__add_error('Error when invoking API: {0}'.format(str(e)))
v_res = 0
finally:
return v_res

ca_root_certificates.pem - is grabbed from Mozilla - CA bundle

Best regards,
Artur

Lukasa · 2014-02-11T13:22:35Z

GitHub's email formatting made that pretty difficult to read, so I copy-pasted it into a code block here:

p_par = {
   'login': 'xxx',
   'password': 'xxx',
   'serviceId': '0000',
   'timeout': 60000, # wait 60s for response
   'deleteContent': 'true',
   'manualconfirm': 'true'
}

p_uri_part = 'getsms.aspx'

def __request_api2(self, p_par, p_uri_part=None):
   try:
      # GSM
      if self.__channel == '0':
         v_url = '{0}/{1}'.format(self.__API_0_URL, p_uri_part)
         v_crt = '{0}smsapi.crt'.format(self.__PATH_RES_SEC)
         v_key = '{0}smsapi.key'.format(self.__PATH_RES_SEC)
         try:
            v_rsp = requests.get(
               v_url, params=p_par
              ,cert=(v_crt, v_key)
              ,timeout=120
              ,verify='{0}ca_root_certificates.pem'.format(self.__PATH_RES_SEC)
            )
            v_res = v_rsp.text
         except Exception as e:
            self.__add_error('Error when invoking API: {0}'.format(str(e)))
            v_res = 0
   finally:
      return v_res

Lukasa · 2014-02-11T13:24:42Z

Hmm, there's nothing obviously wrong with that. When I get a moment I'll try to reproduce something similar on my machine.

t-8ch · 2014-02-11T13:39:54Z

I am in my phone, so I'll be brief. I suspect this is pyopenssl interacting with tje timeout. Especially the makefile function in urllib3.contrib.pyopenssl busylooping when no data is being recieved

Lukasa · 2014-02-11T14:50:53Z

Wait, does it do that? That's pretty awful.

t-8ch · 2014-02-11T14:54:04Z

This is a guess (I think I stumbled upon this previously). I'll check it later. Maybe gevent is also an issue. Gevent and pyopenssl don't really work together.

t-8ch · 2014-02-11T17:15:17Z

Hm, I can't reproduce this at the moment.

@e-manuel Could you give more information about your environment:

Python version
Operating System
Using gevent? (or the like)
Using pyOpenSSL?
Does it work without multiprocessing?

e-manuel · 2014-02-11T18:49:57Z

Python 2.7.6
FreeBSD 9.0
installed gevent 1.0 but in this case is not used
I have installed py27- openssl-0.13
my test: GET fired from python prompt work the same way (100% CPU after a few seconds) as sub-process fired from multiprocessing library

t-8ch · 2014-02-11T19:20:23Z

Do you also have pyasn1 and ndg-httpsclient installed? (If they and pyOpenSSL are installed they trigger a different codepath)

e-manuel · 2014-02-11T20:27:24Z

Yes, I have: py27-asn1-0.1.4_1,1, py27-ndg_httpsclient-0.3.2, py-openssl 0.13 but following the FreeBSD ports tree:

py27-asn1 is required by py27-ndg_httpsclient
py27-ndg_httpsclient is required by py-urllib3 (mentioned by @t-8ch) and requires to run: py-openssl and py-asn1
Of course, I can de-install any of them, but I like to keep full functionality of requests (SSL, cert. etc.). What I should to do - "to be or not to be, this is the question..." ;)

t-8ch · 2014-02-11T21:45:13Z

So I managed to reproduce it:

Terminal 1:

# netcat from the nmap project, should work with all ssl capable netcats
# We complete the SSL handshake but don't send a HTTP response line!
$ ncat --ssl -l 0.0.0.0 4000

Code:

request.get('https://127.0.0.1:4000', verify=False, timeout=120)

It seems this only happens when using PyOpenSSL, timeouts and FreeBSD.
I can't reproduce it any other way, permutating those conditions and on Linux.

@e-manuel Pick your poison :-)

t-8ch · 2014-02-11T22:01:07Z

Manually waiting for the buffer to be ready seems to fix it:

diff --git a/requests/packages/urllib3/contrib/pyopenssl.py b/requests/packages/urllib3/contrib/pyopenssl.py
index d9bda15..67eeee4 100644
--- a/requests/packages/urllib3/contrib/pyopenssl.py
+++ b/requests/packages/urllib3/contrib/pyopenssl.py
@@ -156,6 +156,7 @@ class fileobject(_fileobject):
                 try:
                     data = self._sock.recv(rbufsize)
                 except OpenSSL.SSL.WantReadError:
+                    select.select([self._sock], [], [])
                     continue
                 if not data:
                     break
@@ -183,6 +184,7 @@ class fileobject(_fileobject):
                 try:
                     data = self._sock.recv(left)
                 except OpenSSL.SSL.WantReadError:
+                    select.select([self._sock], [], [])
                     continue
                 if not data:
                     break
@@ -234,6 +236,7 @@ class fileobject(_fileobject):
                                 break
                             buffers.append(data)
                     except OpenSSL.SSL.WantReadError:
+                        select.select([self._sock], [], [])
                         continue
                     break
                 return "".join(buffers)
@@ -244,6 +247,7 @@ class fileobject(_fileobject):
                 try:
                     data = self._sock.recv(self._rbufsize)
                 except OpenSSL.SSL.WantReadError:
+                    select.select([self._sock], [], [])
                     continue
                 if not data:
                     break
@@ -271,7 +275,8 @@ class fileobject(_fileobject):
                 try:
                     data = self._sock.recv(self._rbufsize)
                 except OpenSSL.SSL.WantReadError:
-                        continue
+                    select.select([self._sock], [], [])
+                    continue
                 if not data:
                     break
                 left = size - buf_len

(Insert lengthy rant about platform differences, especially concerning openssl here)

sigmavirus24 · 2014-02-12T02:36:20Z

(Insert lengthy rant about platform differences, especially concerning openssl here)

Yeah, platforms stink. Let's all develop on Windows. =P

Lukasa · 2014-02-12T08:32:10Z

I ranted about this to my housemate last night, by the by. I don't understand why PyOpenSSL busy-waits on this socket instead of calling select like a good person. @t-8ch are you making changes in the urllib3 version, or are you going to submit your patch upstream?

e-manuel · 2014-02-13T00:38:57Z

I would like to ask what next - there will be new release of requests ? I would not go back to pycurl :(

sigmavirus24 · 2014-02-13T01:49:54Z

@e-manuel all of the changes are in PyOpenSSL. Since you're using the packages from your distro, you'll need to bother the maintainer of that package when it gets released.

@Lukasa your question is especially relevant since FreeBSD seems to strip out the vendored packages so it should be sent upstream if possible.

Lukasa · 2014-02-13T07:41:46Z

Yeah, I think we want an answer on what PyOpenSSL is going to do.

@e-manuel In the meantime, you can patch your copy so this stops happening.

e-manuel · 2014-02-13T13:05:41Z

@t-8ch thank You very much - Your patch works great, now when GET starts it uses below 2% of CPU and after few seconds less :)

Lukasa · 2014-02-13T13:21:38Z

Hurrah! Let's leave this open until we get an idea of what's happening upstream.

@t-8ch saves the day again! I should buy him a boat.

t-8ch · 2014-02-13T17:08:31Z

But you loose the timeouts I fear (althoug I'm not sure, if timeouts ever worked with pyopenssl).

e-manuel · 2014-02-14T07:14:44Z

Timeouts works now in this manner:

when GETs target do not exists (is unreachable) timeout works like swiss quartz;
when target exists, but not responds, GET waits forever.

Lukasa · 2014-02-14T08:21:10Z

Hmm. Can we pass the timeout to the select call?

pasha-r · 2014-02-14T11:12:30Z

Yep.
select.select([self._sock], [], [], self._sock.gettimeout())

Lukasa · 2014-02-14T11:26:29Z

@pasha-r That's a very neat solution, it didn't even occur to me. That, plus checking the return code from select, should add timeouts back.

t-8ch · 2014-02-18T20:03:35Z

I think we should indeed go with our calling select ourself. PyOpenSSL does not have the timeout functionality and we simply set the timeout on the non ssl socket.

This should work:

diff --git a/requests/packages/urllib3/contrib/pyopenssl.py b/requests/packages/urllib3/contrib/pyopenssl.py
index d9bda15..da44a29 100644
--- a/requests/packages/urllib3/contrib/pyopenssl.py
+++ b/requests/packages/urllib3/contrib/pyopenssl.py
@@ -43,7 +43,7 @@ from ndg.httpsclient.subj_alt_name import SubjectAltName as BaseSubjectAltName
 import OpenSSL.SSL
 from pyasn1.codec.der import decoder as der_decoder
 from pyasn1.type import univ, constraint
-from socket import _fileobject
+from socket import _fileobject, timeout
 import ssl
 import select
 from cStringIO import StringIO
@@ -139,6 +139,13 @@ def get_subj_alt_name(peer_cert):

 class fileobject(_fileobject):

+    def _wait_for_sock(self):
+        rd, wd, ed = select.select([self._sock], [], [],
+                                   self._sock.gettimeout())
+        if not rd:
+            raise timeout()
+
+
     def read(self, size=-1):
         # Use max, disallow tiny reads in a loop as they are very inefficient.
         # We never leave read() with any leftover data from a new recv() call
@@ -156,6 +163,7 @@ class fileobject(_fileobject):
                 try:
                     data = self._sock.recv(rbufsize)
                 except OpenSSL.SSL.WantReadError:
+                    self._wait_for_sock()
                     continue
                 if not data:
                     break
@@ -183,6 +191,7 @@ class fileobject(_fileobject):
                 try:
                     data = self._sock.recv(left)
                 except OpenSSL.SSL.WantReadError:
+                    self._wait_for_sock()
                     continue
                 if not data:
                     break
@@ -234,6 +243,7 @@ class fileobject(_fileobject):
                                 break
                             buffers.append(data)
                     except OpenSSL.SSL.WantReadError:
+                        self._wait_for_sock()
                         continue
                     break
                 return "".join(buffers)
@@ -244,6 +254,7 @@ class fileobject(_fileobject):
                 try:
                     data = self._sock.recv(self._rbufsize)
                 except OpenSSL.SSL.WantReadError:
+                    self._wait_for_sock()
                     continue
                 if not data:
                     break
@@ -271,7 +282,8 @@ class fileobject(_fileobject):
                 try:
                     data = self._sock.recv(self._rbufsize)
                 except OpenSSL.SSL.WantReadError:
-                        continue
+                    self._wait_for_sock()
+                    continue
                 if not data:
                     break
                 left = size - buf_len

The code seems suboptimal but I'd like to try keeping this close to the version from the stdlib (where 99% of the fileobject are taken from)

PS: It seems I was totally confused when not being able to reproduce this on Linux.
It is just the same there.

Lukasa · 2014-02-18T21:14:41Z

Suits me. =)

e-manuel · 2014-02-19T09:06:20Z

@t-8ch You are great - now timeout works fine in every case - and may CPUs can "sleep" calmly ;)

Lukasa · 2014-03-23T11:49:27Z

This will be fixed in the next release of requests, so I'm closing this. Thanks for your help all!

tgalery · 2014-03-26T16:20:50Z

Hi there, sorry to jump into this discussion, but I'm switching some legacy url resolver mechanism from urllib2 to requests 2.2.1 . Although requests works much better, the server's CPU started to scale in an erratic manner, whereas in the past it was kind of constant under 10% usage. We resolve, http and https urls, using a custom header and verify=True. Do you know if this is an issue and what might be causing it?

Lukasa · 2014-03-26T16:26:39Z

It's possible that this is your bug, if you're accessing HTTPS urls and have pyopenssl installed. If both of those things are true, try applying the above patch to your installed requests package and check if it resolves your problem.

tgalery · 2014-03-26T16:33:22Z

Nope, I don't use pyopenssl and this also happens, when I don't set the verify option. I wonder whether there's any overhead associated with get methods, as opposed to initialising a Session object and have different threads calling send. If none of this matters substantially, are you aware of any other high CPU issues related to requests?

Lukasa · 2014-03-26T16:40:42Z

There's definitely overhead when using get: we create a Session object behind the scenes for each call.

You should consider having a single Session object, but note that it's not really thread-safe. If using a single Session, you should either serialize access to it or have it run in its own thread and send messages in and out on queues.

Otherwise, we're not aware of any high CPU issues and it'd be interesting to get diags if you still hit it.

dav009 · 2014-03-28T09:56:18Z

I have a similar issue. Basically there are cpu peaks after switching from urllib2 to request to make get and post petitions. Basically Im seeeing a lot of FUTEX WAIT after stracing my process.

I'm a bit confused with respect of the default behaviour of the connection pool.

Is there a global Session by default shared among all the requests made?.
Do requests to a single host share the same connection in teh pool?
Is that shared connection in the pool blocking?
Could the pool cache size and pool size affecting?

Thank you in advance

Lukasa · 2014-03-28T10:01:45Z

No. As I said above, we create a Session object behind the scenes if you aren't using one yourself.
Not always. Connection pools belong to Sessions, so if you aren't using Sessions you aren't using connection pools. We then have a pool of connections to each host. Whether requests to that host re-use a connection or not depends on whether or not the connection is still open.
Requests does not perform non-blocking IO.
Could they affect what?

dav009 · 2014-03-28T10:14:17Z

Basically in the cpu peaks Im getting lots of futex(..., FUTEX_WAIT, 1, NULL)
I thought that could be related to a some lock on some variables inside requests

Lukasa · 2014-03-28T10:19:58Z

There is some locking around the connection pool and the cookies dictionary, so it's certainly possible.

Should use sock instead of its file descriptor. See more at https://github.com/kennethreitz/requests/issues/1910.

t-8ch mentioned this issue Feb 19, 2014

contrib/pyopenssl: fix busyloop with timeouts urllib3/urllib3#340

Merged

Lukasa closed this as completed Mar 23, 2014

zhuhaow referenced this issue in zhuhaow/goagent Sep 27, 2014

fix 100% CPU usage on BSD system

6c9cf5f

Should use sock instead of its file descriptor. See more at https://github.com/kennethreitz/requests/issues/1910.

zhuhaow referenced this issue in zhuhaow/goagent Sep 27, 2014

fix 100% CPU usage on BSD system (OS X)

aa4dd79

Should use sock instead of its file descriptor. See more at https://github.com/kennethreitz/requests/issues/1910.

github-actions bot locked as resolved and limited conversation to collaborators Sep 8, 2021

100% processor usage during GET have to wait 60s for response #1910

100% processor usage during GET have to wait 60s for response #1910

Comments

e-manuel commented Feb 11, 2014

Lukasa commented Feb 11, 2014

e-manuel commented Feb 11, 2014

Lukasa commented Feb 11, 2014

e-manuel commented Feb 11, 2014

Lukasa commented Feb 11, 2014

Lukasa commented Feb 11, 2014

t-8ch commented Feb 11, 2014

Lukasa commented Feb 11, 2014

t-8ch commented Feb 11, 2014

t-8ch commented Feb 11, 2014

e-manuel commented Feb 11, 2014

t-8ch commented Feb 11, 2014

e-manuel commented Feb 11, 2014

t-8ch commented Feb 11, 2014

t-8ch commented Feb 11, 2014

sigmavirus24 commented Feb 12, 2014

Lukasa commented Feb 12, 2014

e-manuel commented Feb 13, 2014

sigmavirus24 commented Feb 13, 2014

Lukasa commented Feb 13, 2014

e-manuel commented Feb 13, 2014

Lukasa commented Feb 13, 2014

t-8ch commented Feb 13, 2014

e-manuel commented Feb 14, 2014

Lukasa commented Feb 14, 2014

pasha-r commented Feb 14, 2014

Lukasa commented Feb 14, 2014

t-8ch commented Feb 18, 2014

Lukasa commented Feb 18, 2014

e-manuel commented Feb 19, 2014

Lukasa commented Mar 23, 2014

tgalery commented Mar 26, 2014

Lukasa commented Mar 26, 2014

tgalery commented Mar 26, 2014

Lukasa commented Mar 26, 2014

dav009 commented Mar 28, 2014

Lukasa commented Mar 28, 2014

dav009 commented Mar 28, 2014

Lukasa commented Mar 28, 2014