Replace all urllib / urllib2 imports and uses with `requests` #2852

hornc · 2020-01-13T23:43:34Z

urllib2 is old, and a bit clunky -- I've found recently it can't do some of things I needed it to do, and it also causes problem for moving to Python3, and generally seems to make the code more complex.

Instead of re-using it, and adding new features that use the older mechanisms, can we consider refactoring to use requests instead?

My PR where I stripped out urllib2 and StringIO etc to enable HEAD requests: #2838

a similar PR that went with using requests to avoid old bugs: #2779

By using requests (rather than adding imports from six) we should be getting a better tool for making requests, and reducing code complexity, and the number of imports.

Relates to #2308 since a common use of simplejson is to parse urllib2 responses. requests has a builtin json() method so an extra import is not required.

Describe the problem that you'd like solved

Proposal & Constraints

Additional context

Stakeholders

The text was updated successfully, but these errors were encountered:

tfmorris · 2020-01-14T00:05:31Z

I'm strongly supportive of switching to requests since it has things like built-in (or easy to add) support for things like retries, timeouts, caching, etc.

Having said that, there's so much critical work pending that I'm not sure we can afford to being doing "nice to have" work on existing code that works. Refactoring modules as they get worked on seems fine though.

hornc · 2020-01-14T00:21:59Z

Refactoring modules as they get worked on seems fine though.

Agree.
Refactoring is why I wanted to publicise this as a common solution. When devs work on a module -- If they find themselves banging heads against urllib2 for any reason, first thought should be to remove it and use requests instead. I just had this happen a couple of times in a row, and wanted to share it as a common time saving way forward, and perhaps we can standardise some of the retry mechanisms etc.

hornc · 2020-01-14T04:16:13Z

possibly unused urllib + urllib2 in openlibrary/plugins/upstream/addbook.py ?

cdrini · 2020-01-14T05:36:53Z

PyCharm identifies them as unused (amongst other things)

tfmorris · 2020-01-21T00:52:26Z

#2894 shows some code smells which need to be avoided when doing any upgrade of this type.

If you still have an import urllib2, it's probably wrong.

cclauss · 2020-07-23T17:30:36Z

% grep urlopen **/*.py

openlibrary/admin/numbers.py: sqlite_contents = urllib.request.urlopen(url).read()
openlibrary/catalog/amazon/add_covers.py: ret = simplejson.load(urlopen(url))
openlibrary/catalog/amazon/other_editions.py: return urllib.request.urlopen(url).read()
openlibrary/catalog/utils/edit.py: prev = unmarshal(json.load(urllib.request.urlopen(url)))
openlibrary/core/lending.py: content = urllib.request.urlopen(url=url, timeout=config_http_request_timeout).read()
openlibrary/core/lending.py: content = urllib.request.urlopen(request, timeout=config_http_request_timeout).read()
openlibrary/core/lending.py: content = urllib.request.urlopen(url=url, timeout=config_http_request_timeout).read()
openlibrary/core/lending.py: response = simplejson.loads(urllib.request.urlopen(url).read())
openlibrary/core/lending.py: return simplejson.loads(urllib.request.urlopen(url).read())
openlibrary/core/lending.py: jsontext = urllib.request.urlopen(config_ia_loan_api_url, payload,
openlibrary/core/models.py: d = simplejson.loads(urllib.request.urlopen(url).read())
openlibrary/coverstore/code.py: text = urlopen("https://archive.org/metadata/" + item).read()
openlibrary/coverstore/code.py: jsontext = urlopen(url).read()
openlibrary/plugins/admin/code.py: response = urllib.request.urlopen(s).read()
openlibrary/plugins/admin/code.py: json = urllib.request.urlopen("http://www.archive.org/download/stats/numUniqueIPsOL.json").read()
openlibrary/plugins/admin/services.py: self.data = BeautifulSoup(urllib.request.urlopen(url).read(), "lxml")
openlibrary/plugins/books/readlinks.py: json_data = urllib.request.urlopen(solr_select).read()
openlibrary/plugins/books/readlinks.py: json_data = urllib.request.urlopen(solr_select).read()
openlibrary/plugins/books/readlinks.py: json_data = urllib.request.urlopen(solr_select).read()
openlibrary/plugins/books/readlinks.py: json_data = urllib.request.urlopen(solr_select).read()
openlibrary/plugins/openlibrary/code.py: return urllib.request.urlopen(url).read()
openlibrary/plugins/upstream/borrow.py: response = simplejson.loads(urllib.request.urlopen(url).read())
openlibrary/plugins/upstream/borrow.py: response = simplejson.loads(urllib.request.urlopen(url).read())
openlibrary/plugins/upstream/covers.py: response = urllib.request.urlopen(upload_url, urllib.parse.urlencode(params))
openlibrary/plugins/upstream/data.py: return urllib.request.urlopen(url).read()
openlibrary/plugins/upstream/models.py: data = urllib.request.urlopen(url).read()
openlibrary/plugins/worksearch/subjects.py: response = json.load(urllib.request.urlopen(solr_url))['response']
openlibrary/plugins/worksearch/subjects.py: response = json.load(urllib.request.urlopen(solr_url))['response']
openlibrary/solr/find_modified_works.py: ret.append(extract_works(json.load(urllib.request.urlopen(url))))
openlibrary/solr/find_modified_works.py: changes = list(json.load(urllib.request.urlopen(url)))
openlibrary/utils/solr.py: data = urllib.request.urlopen(url, timeout=10).read()
openlibrary/utils/solr.py: data = urllib.request.urlopen(request, timeout=10).read()
openlibrary/accounts/model.py: f = urllib.request.urlopen(req)
openlibrary/api.py: return urllib.request.urlopen(req)
openlibrary/catalog/add_book/init.py: res = urllib.request.urlopen(upload_url, urllib.parse.urlencode(params))
openlibrary/catalog/amazon/crawl.py: #page = urlopen('http://amazon.com/dp/' + asin).read()
openlibrary/catalog/amazon/load_merge.py: result = urllib.request.urlopen(ureq).read(100000)
openlibrary/catalog/get_ia.py:def urlopen_keep_trying(url):
openlibrary/catalog/get_ia.py: f = urllib.request.urlopen(url)
openlibrary/catalog/get_ia.py: return '<!--' in urlopen_keep_trying(IA_DOWNLOAD_URL + loc).read()
openlibrary/catalog/get_ia.py: data = urlopen_keep_trying(item_base + marc_xml_filename).read()
openlibrary/catalog/get_ia.py: data = urlopen_keep_trying(item_base + marc_bin_filename).read()
openlibrary/catalog/get_ia.py: tree = etree.parse(urlopen_keep_trying(url))
openlibrary/catalog/get_ia.py: tree = etree.parse(urlopen_keep_trying(url))
openlibrary/catalog/get_ia.py: f = urlopen_keep_trying(ureq)
openlibrary/catalog/get_ia.py: f = urlopen_keep_trying(url)
openlibrary/catalog/get_ia.py: f = urlopen_keep_trying(url)
openlibrary/catalog/importer/db_read.py: data = urlopen(url).read()
openlibrary/catalog/marc/download.py: f = urllib.request.urlopen("http://archive.org/download/bpl_marc/" + file)
openlibrary/catalog/marc/download.py: f = urllib.request.urlopen(ureq)
openlibrary/catalog/marc/marc_subject.py: f = urlopen_keep_trying(url)
openlibrary/catalog/marc/marc_subject.py: f = urlopen_keep_trying(url)
openlibrary/catalog/utils/query.py:def urlopen(url, data=None):
openlibrary/catalog/utils/query.py: return urllib.request.urlopen(req)
openlibrary/catalog/utils/query.py: return json.load(urlopen(url))
openlibrary/catalog/utils/query.py: return urlopen(url).read()
openlibrary/core/fulltext.py: json_data = urllib.request.urlopen(search_select, timeout=30).read()
openlibrary/core/loanstats.py: logger.info("urlopen %s", url)
openlibrary/core/loanstats.py: response = urllib.request.urlopen(url).read()
openlibrary/coverstore/utils.py: r = urlopen(req)
openlibrary/plugins/akismet/akismet.py: h = urllib.request.urlopen(req)
openlibrary/plugins/ol_infobase.py: response = urllib.request.urlopen(url, json).read()
openlibrary/plugins/openlibrary/tests/test_listapi.py: def urlopen(self, path, data=None, method=None, headers={}):
openlibrary/plugins/openlibrary/tests/test_listapi.py: self.urlopen("/account/login", data=urllib.parse.urlencode(data), method="POST")
openlibrary/plugins/openlibrary/tests/test_listapi.py: response = self.urlopen(
openlibrary/plugins/openlibrary/tests/test_listapi.py: data = self.urlopen("/people/" + self.username + "/lists.json").read()
openlibrary/plugins/openlibrary/tests/test_listapi.py: data = self.urlopen(key + ".json").read()
openlibrary/plugins/openlibrary/tests/test_listapi.py: data = self.urlopen(key + "/seeds.json").read()
openlibrary/plugins/openlibrary/tests/test_listapi.py: response = self.urlopen(key + "/seeds.json", json)
openlibrary/plugins/openlibrary/tests/test_ratingsapi.py: def urlopen(self, path, data=None, method=None, headers={}):
openlibrary/plugins/openlibrary/tests/test_ratingsapi.py: self.urlopen("/account/login", data=urllib.parse.urlencode(data), method="POST")
openlibrary/plugins/openlibrary/tests/test_ratingsapi.py: r = self.urlopen(
openlibrary/plugins/search/solr_client.py: ru = urlopen(query_url)
openlibrary/plugins/search/solr_client.py: ru = urlopen(query_url)
openlibrary/plugins/upstream/adapter.py: response = urllib.request.urlopen(req)
openlibrary/plugins/upstream/utils.py: tree = etree.parse(urllib.request.urlopen(url))
openlibrary/plugins/worksearch/code.py: solr_result = urllib.request.urlopen(url, timeout=10)
openlibrary/solr/update_work.py:def urlopen(url, data=None):
openlibrary/solr/update_work.py: return urllib.request.urlopen(req)
openlibrary/solr/update_work.py: result = json.load(urlopen(url))
openlibrary/solr/update_work.py: logger.info("urlopen %s", url)
openlibrary/solr/update_work.py: reply = json.load(urlopen(url))
openlibrary/solr/update_work.py: reply = json.load(urlopen(url))
openlibrary/tests/catalog/test_get_ia.py: monkeypatch.setattr(get_ia, 'urlopen_keep_trying', return_test_marc_xml)
openlibrary/tests/catalog/test_get_ia.py: monkeypatch.setattr(get_ia, 'urlopen_keep_trying', return_test_marc_bin)
openlibrary/tests/catalog/test_get_ia.py: monkeypatch.setattr(get_ia, 'urlopen_keep_trying', return_test_marc_bin)
openlibrary/tests/solr/test_update_work.py: monkeypatch.setattr(update_work, 'urlopen', lambda url: StringIO(solr_response))
openlibrary/utils/httpserver.py: >>> response = urllib.request.urlopen('http://0.0.0.0:8090/hello/world')
openlibrary/utils/httpserver.py: >>> urllib.request.urlopen('http://0.0.0.0:8090/hello/world')
openlibrary/views/showmarc.py: data = urllib.request.urlopen(url).read()
openlibrary/views/showmarc.py: data = urllib.request.urlopen(url).read()
openlibrary/views/showmarc.py: result = urllib.request.urlopen(ureq).read(100000)

imskr · 2020-07-30T17:43:54Z

Thanks, @cclauss for guiding. I'm on it!

cclauss · 2020-08-15T07:50:01Z

Refactor to use requests json Refactor to use requests json #3709 Approved by @cclauss
Requests refactor only query.py Requests refactor only query.py #3666
refactor update_work and test_update_work to use requests refactor update_work and test_update_work to use requests #3696 Approved by @cclauss
Refactor .read() to use requests Refactor .read() to use requests #3710
converted only simple GETs to requests converted only simple GETs to requests #3694
use requests instead of stdlib use requests instead of stdlib #3658
Fix portability and dependency issue Fix portability and dependency issue #3559

cclauss · 2020-08-21T18:28:50Z

This is going to need SEVERAL PRs before it is finally closed.

cclauss · 2021-02-01T14:43:22Z

dherbst · 2021-02-02T13:27:07Z

I don't think this script works any more scripts/z3950_view.py because PyZ3950 in not installed in the requirements any more. I can't find PyZ3950 on pypi.org. I cannot find reference to z3950_view in the codebase. Given this, should we remove z3950_view.py?

cclauss · 2021-02-02T13:48:31Z

Last released in 2004 https://pypi.org/project/PyZ3950 --> http://www.panix.com/~asl2/software/PyZ3950 --> https://github.com/asl2/PyZ3950 (see the PRs) which is far from Python 3 compatible.

dherbst · 2021-02-04T13:13:35Z

I don't think this file openlibrary/catalog/marc/download.py is used any more. Because it tries to use web.run which is no longer in the version of web.py installed. Also it tries to call http://wiki-beta.us.archive.org:9090 which is not able to be accessed.

Should openlibrary/catalog/marc/download.py be deleted?

RayBB · 2024-04-19T00:42:47Z

I went ahead and did a little grepping and such to see that these are all the places urllib is used (excluding infogami).

I'm not sure which of these functions we should still try to migrate away from.
At least some of the urlencodes could be changed to just passing the dict to requests.
@cclauss what do you think?

Filename	Count	16	5	4	3	3	3	2	2	2	2	2	1	1	1	1
./plugins/upstream/utils.py	8	urlencode	quote		parse_qs	urlparse		urlunparse	quote_plus
./plugins/upstream/borrow.py	8	urlencode	quote
./coverstore/utils.py	8	urlencode		urlsplit						unquote			urlunsplit	unquote_plus	parse_qsl
./utils/solr.py	6	urlencode		urlsplit
./plugins/openlibrary/code.py	8	urlencode			parse_qs	urlparse		urlunparse
./plugins/openlibrary/tests/test_listapi.py	8	urlencode					Request				build_opener	HTTPCookieProcessor
./plugins/openlibrary/tests/test_ratingsapi.py	7	urlencode					Request				build_opener	HTTPCookieProcessor
./plugins/upstream/adapter.py	4	urlencode					Request									urlopen
./core/fulltext.py	2	urlencode
./core/ia.py	2	urlencode
./core/models.py	2	urlencode
./core/sponsorships.py	2	urlencode
./coverstore/tests/test_webapp.py	2	urlencode
./plugins/admin/graphs.py	2	urlencode
./plugins/importapi/code.py	2	urlencode
./plugins/worksearch/code.py	3	urlencode
./plugins/upstream/addbook.py	4		quote			urlparse
./core/processors/readableurls.py	3		quote						quote_plus
./catalog/utils/query.py	3		quote
./plugins/books/code.py	3			urlsplit						unquote
./core/helpers.py	2			urlsplit
./plugins/openlibrary/lists.py	1				parse_qs

hornc added Type: Feature Request Issue describes a feature or enhancement we'd like to implement. [managed] Needs: Triage This issue needs triage. The team needs to decide who should own it, what to do, by when. [managed] State: Backlogged labels Jan 13, 2020

xayhewalo removed the State: Backlogged label Mar 17, 2020

This was referenced Aug 4, 2020

use requests instead of stdlib #3658

Closed

Requests refactor only query.py #3666

Merged

This was referenced Aug 13, 2020

converted only simple GETs to requests #3694

Closed

refactor update_work and test_update_work to use requests #3696

Merged

Refactor to use requests json #3709

Merged

Refactor .read() to use requests #3710

Merged

mekarpeles closed this as completed in #3666 Aug 21, 2020

cclauss reopened this Aug 21, 2020

cdrini closed this as completed in #3696 Sep 8, 2020

cclauss mentioned this issue Dec 14, 2020

Author page fixes: #4215, #4121 #4278

Merged

cclauss removed the Theme: Upgrade to Python 3 label Dec 17, 2020

cclauss mentioned this issue Dec 20, 2020

Use requests in solr_client.py #4317

Closed

dherbst mentioned this issue Jan 19, 2021

Refactor get_ia.py to use requests instead of urllib.urlopen #4436

Merged

dherbst mentioned this issue Feb 1, 2021

Replace urlopen and simplejson with requests. #4515

Merged

dherbst mentioned this issue Feb 2, 2021

Replace urlopen with requests in openlibrary/views/showmarc #4527

Merged

cclauss changed the title ~~Replace all urllib / urllib2 + StringIO + simplejson imports and uses with requests~~ Replace all urllib / urllib2 imports and uses with requests Feb 3, 2021

dherbst mentioned this issue Feb 4, 2021

Refactor db_read.py to replace urlopen with requests. #4539

Merged

This was referenced Feb 13, 2021

solr_client.py: Replace urlopen with requests.get #4593

Closed

worksearch code.py: Replace urlopen with requests.get #4604

Merged

new-solr-updater.py: Replace urlopen with requests.get #4605

Closed

cclauss added Lead: @cclauss Issues overseen by Chris (Python3 & Dev-ops lead 2019-2021) [managed] and removed Lead: @cdrini Issues overseen by Drini (Staff: Team Lead & Solr, Library Explorer, i18n) [managed] labels Mar 9, 2021

cclauss mentioned this issue Mar 12, 2021

Upgrade infogami/infobase/client.py from urllib to requests internetarchive/infogami#145

Closed

cclauss mentioned this issue Mar 26, 2021

Use requests in openlibrary/plugins/upstream/utils.py #4895

Merged

xayhewalo mentioned this issue Jul 29, 2021

Edition.add_bookcover() returning 500 response for valid urls internetarchive/openlibrary-client#246

Open

mekarpeles added the Needs: Help Issues, typically substantial ones, that need a dedicated developer to take them on. [managed] label Sep 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace all urllib / urllib2 imports and uses with `requests` #2852

Replace all urllib / urllib2 imports and uses with `requests` #2852

hornc commented Jan 13, 2020

tfmorris commented Jan 14, 2020

hornc commented Jan 14, 2020 •

edited

Loading

hornc commented Jan 14, 2020

cdrini commented Jan 14, 2020

tfmorris commented Jan 21, 2020 •

edited

Loading

cclauss commented Jul 23, 2020 •

edited

Loading

imskr commented Jul 30, 2020

cclauss commented Aug 15, 2020 •

edited

Loading

cclauss commented Aug 21, 2020

cclauss commented Feb 1, 2021 •

edited

Loading

dherbst commented Feb 2, 2021

cclauss commented Feb 2, 2021 •

edited

Loading

dherbst commented Feb 4, 2021

RayBB commented Apr 19, 2024

Replace all urllib / urllib2 imports and uses with requests #2852

Replace all urllib / urllib2 imports and uses with requests #2852

Comments

hornc commented Jan 13, 2020

Describe the problem that you'd like solved

Proposal & Constraints

Additional context

Stakeholders

tfmorris commented Jan 14, 2020

hornc commented Jan 14, 2020 • edited Loading

hornc commented Jan 14, 2020

cdrini commented Jan 14, 2020

tfmorris commented Jan 21, 2020 • edited Loading

cclauss commented Jul 23, 2020 • edited Loading

imskr commented Jul 30, 2020

cclauss commented Aug 15, 2020 • edited Loading

cclauss commented Aug 21, 2020

cclauss commented Feb 1, 2021 • edited Loading

dherbst commented Feb 2, 2021

cclauss commented Feb 2, 2021 • edited Loading

dherbst commented Feb 4, 2021

RayBB commented Apr 19, 2024

Replace all urllib / urllib2 imports and uses with `requests` #2852

Replace all urllib / urllib2 imports and uses with `requests` #2852

hornc commented Jan 14, 2020 •

edited

Loading

tfmorris commented Jan 21, 2020 •

edited

Loading

cclauss commented Jul 23, 2020 •

edited

Loading

cclauss commented Aug 15, 2020 •

edited

Loading

cclauss commented Feb 1, 2021 •

edited

Loading

cclauss commented Feb 2, 2021 •

edited

Loading