Skip to content
This repository has been archived by the owner on Sep 7, 2023. It is now read-only.

pycurl instead of requests #1725

Closed
wants to merge 8 commits into from
Closed

Conversation

dalf
Copy link
Contributor

@dalf dalf commented Oct 17, 2019

Implement #503

The requests API has been partially implemented: requests package can be replaced by searx.httpclient
Noticable exceptions:

  • stream=True parameter.
  • PreparedRequest object.

Speed:

  • Seems as fast as aiohttp when HTTP2 is used.
  • Faster than requests especially on slow server.

Use more memory than aiohttp and a little more than requests.
Compatible with python 2 and 3.

It is possible to make asynchronous HTTP Requests:

  • either with a concurrent.futures.Future ( searx.httpclient.Session )
  • or asyncio.Future ( searx.httpclient.AsyncioSession )

Implementation details:

  • one thread is created per Session/AsyncioSession
  • curl_multi is used to execute requests
  • use HTTP/2 when possible, but is disabled on Ubuntu 18.04 (segfault, see comment in searx/httpclient/misc.py )

Debug information example:

DEBUG:searx.httpclient:"GET https://www.wikidata.org/w/index.php?search=searx&ns0=1 HTTP/1.1" 200 15025
  total: 0.656s, pycurl: 0.647s, namelookup: 0.108s, connect: 0.174s, appconnect: 0.295s, pretransfer: 0.296s, starttransfer: 0.595s, redirect: 0.000s

Notes:

  • a lot of code have been borrowed from requests, and some from human_curl (see comments in the code).
  • lack of tests.
  • memory leak not checked.
  • most probably should be a separate package.
  • curl comes with a lot of different flavors: so your mileage may vary.
  • it has helped to create full a quick & dirty asyncio prototype: see PR Proof of concept: Quart #1724
  • this PR is a bookmark (rather code to be merged in the master branch).

Then requests API has been partially implemented: requests package can be replaced by searx.httpclient
Noticable exceptions:
* stream=True parameter.
* PreparedRequest object.
* cchardet is used instead of chardet.

As fast as aiohttp, faster than requests especially on slow server.
Use more memory than aiohttp and a little more than requests.
Compatible with python 2 and 3.
The futures package must be installed when using python 2.

Requests can run asynchronously:
* either with a concurrent.futures.Future ( searx.httpclient.Session )
* or asyncio.Future ( searx.httpclient.AsyncioSession )

Notes:
* a lot of code have been borrowed from requests, and some from human_curl (see comments in the code).
* lack of tests.
* memory leak not checked.

Implementation details:
* one thread is created per Session/AsyncioSession
* curl_multi is used to execute requests

Use HTTP/2 when possible (disable on Ubuntu 18.04 because segfault).

Most probably should be a separate package (and just switch to aiohttp).
* curl options : more curl version checking
* fix Python2 (unicode vs str)
* no exception on TimeoutError
* AsyncioSession is available for Python 3.5
@dalf dalf mentioned this pull request Jun 20, 2020
@dalf dalf added the core label Sep 19, 2020
@dalf dalf mentioned this pull request Oct 3, 2020
Closed
@dalf dalf closed this Dec 3, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant