Error 403 - Forbidden for url: https://www.craigslist.org/about/sites #105

luisandrecunha · 2021-01-05T17:26:23Z

Hi Julio,

I have used your code before (early 2020), but now I'm getting the error below when trying to import CraigslistHousing, using "from craigslist import CraigslistHousing":

HTTPError: 403 Client Error: Forbidden for url: https://www.craigslist.org/about/sites

Not sure why, it seems that could be related with this issue: https://stackoverflow.com/questions/16627227/http-error-403-in-python-3-web-scraping.

Do you happen to know why this is happening?

Thanks,

irahorecka · 2021-01-05T17:56:41Z

Seems like this works on my end. Did you upgrade python-craigslist to the latest version? I have a feeling this issue might be agnostic of package upgrade, but it doesn't hurt..

luisandrecunha · 2021-01-05T21:22:25Z

Yep, I did the upgrade and continue to have the same issue. Using v1.1.0 and python 3.6, I'm using Google's Colab notebooks.

irahorecka · 2021-01-05T21:37:36Z

Ah, this looks to be a problem with the requests library in your environment, not python-craigslist, per se.
I'm guessing the same exception would be thrown if you executed this:

import requests
requests.get("https://www.craigslist.org/about/sites")

luisandrecunha · 2021-01-05T22:00:01Z

You are completely right, I also tried in a new colab and got "<Response [403]>"

If I run the code below I get a successful response and the page code. I believe it's related with the web scraping issue in this page.

from urllib.request import Request, urlopen
req = Request('https://www.craigslist.org/about/sites', headers={'User-Agent': 'XYZ/3.0'})
webpage = urlopen(req, timeout=10).read()

print(webpage)

juliomalegria · 2021-01-07T16:49:14Z

Thanks for reporting @luisandrecunha.

Interesting. Seems like Craigslist is blocking requests coming from your IP (or Google's Colab IPs). I'm guessing the IP hit a max number of requests per day/hour/minute.

Do you mind running the code suggested by @irahorecka but setting a User-Agent like you did with urllib:

import requests
requests.get("https://www.craigslist.org/about/sites", headers={'User-Agent': 'python-craigslist/1.1.0'})

If this works fine, I'll add a default User-Agent to all requests to prevent this from happening in the future.

Thanks!

luisandrecunha · 2021-01-08T23:24:37Z

Hi @juliomalegria ,

It seems that Google's Colab IPs is blocked by Craigslist... I successfully ran the code in a local jupyter notebook and it worked like a charm.

I tried the code you suggested in Colab and continued to get the 403 response... However I receive the right page if I use the code below, not sure if somehow the code could be adapted.

from urllib.request import Request, urlopen
req = Request('https://www.craigslist.org/about/sites', headers={'User-Agent': 'XYZ/3.0'})
webpage = urlopen(req, timeout=10).read()

print(webpage)

Thank you again,

jraVette · 2021-02-18T19:42:20Z

Just a heads up, I've got the exact same issue. I've been running my code for more than a year and this just happened this week. So, something must have changed on the craigslist side? I'll have to dig into the code. I can cut and paste the url into a browser and it works fine. Just wanted to let you know of another user with the same issues.

>>> import requests
>>> requests.get('https://boston.craigslist.org')
<Response [200]>
>>> requests.get('https://boston.craigslist.org/search')
<Response [403]>
>>> requests.get('https://boston.craigslist.org/search',headers={'User-Agent': 'XYZ/3.0'})
<Response [403]>

I tried it on a couple of computers, so I don't think it's IP related. Guess how the servers are seeing the 'requests' library versus a regular library.

Thanks!

juliomalegria · 2021-02-19T10:12:05Z

Hey everyone! Sorry for the inactivity. I've released a new version (1.1.1) adding a User-Agent to requests.get. Hopefully that will solve the issue, please report back if it does or doesn't. If it doesn't I'll have to change libraries to urllib.
Thanks!

cwittwer · 2021-02-20T05:38:03Z

I am still getting the 403 error with the updated utils.py.

KeeonTabrizi · 2021-02-21T03:46:10Z

+1 Having the same behavior - 403s on /search paths through just a general requests.get() call so the library/class is also not functioning.

Also note I tried taking the headers object from the cURL to /search which loads in a regular browser and used that for the requests call which they also blocked.

I used a selenium driver I had with some mods I've used in the past and I was able to load /search just fine so I don't suspect they are doing something super sophisticated to block the request.

KeeonTabrizi · 2021-02-22T06:58:50Z

Okay I've dug into it a bit more - I don't think this has anything do to with user agents or anything they are blocking like that. I recommend upgrading both the requests and urlib3 library pip install urllib3 --upgrade pip install requests --upgrade. Once I did that things started working again. So not sure the actual issue - as older versions of those libraries were working - but with the updates it looks fine to me.

After I did that I tested the request function (which is effectively requests.get()) works:

import requests
import urllib3
from craigslist import utils

>> requests.__version__
Out[5]: '2.25.1'

>>urllib3.__version__
Out[6]: '1.26.3'

>> utils.requests_get('https://boston.craigslist.org/search')
Out[8]: <Response [200]>

juliomalegria · 2021-02-23T12:14:55Z

Thanks @KeeonTabrizi! That's a very good point.
I've updated the requirements to include some minimum version for requirements (requests and beautifulsoup4).
Can anyone having issues try updating their library (pip install python-craigslist --upgrade) and let me know if this fixed the issue.
Thanks again!

usctzen · 2021-02-23T12:43:27Z

Hey guys. I am not a power user, but I have found that the latest *idna* version is incompatible with *requests*. If you installed the latest *idna* then just run *requests* upgrade and it will revert the *idna* version. I have no clue that it could be your troubles, but it could be a factor. Hope this helps. Le mar. 23 févr. 2021 à 13:15, Julio M. Alegria <[email protected]> a écrit :

…

Thanks @KeeonTabrizi <https://github.com/KeeonTabrizi>! That's a very good point. I've updated the requirements to include some minimum version for requirements (requests and beautifulsoup4). Can anyone having issues try updating their library (pip install python-craigslist --upgrade) and let me know if this fixed the issue. Thanks again! — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#105 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADXNCUNQTBYAWRMKIOJJWQDTAOL47ANCNFSM4VVVT3VQ> .

jraVette · 2021-02-23T14:27:51Z

Hey y'all, thanks so much for taking the time to fix this! So, it could just be how my packages were managed, but, when I performed (pip install python-craigslist --upgrade) it updated requests but not urllib3. I guess urllib3 is used by requests. So, it did not work with just upgrading python-craigslist. But, after updating both request and urllib3 to the latest, back up and running! Maybe consider adding urllib to the requirements? Thanks again!!

These versions are what got my code working:

>>> requests.__version__
'2.25.1'
>>> urllib3.__version__
'1.26.3'

PS. great module, it's helped me get some great deals on Craiglist!

cwittwer · 2021-03-11T04:13:43Z

Hey y'all, thanks so much for taking the time to fix this! So, it could just be how my packages were managed, but, when I performed (pip install python-craigslist --upgrade) it updated requests but not urllib3. I guess urllib3 is used by requests. So, it did not work with just upgrading python-craigslist. But, after updating both request and urllib3 to the latest, back up and running! Maybe consider adding urllib to the requirements? Thanks again!!

These versions are what got my code working:
>>> requests.__version__
'2.25.1'
>>> urllib3.__version__
'1.26.3'
PS. great module, it's helped me get some great deals on Craiglist!

+1 this fixed everything. Good catch!

irahorecka · 2021-03-30T16:41:59Z

@cwittwer, @jraVette, @usctzen, @KeeonTabrizi, @luisandrecunha If you guys are interested in a new Craigslist API format, check out pycraigslist.
I enjoy python-craigslist, but there were some features I wanted to implement immediately. Some additional features are in the works.

usctzen · 2021-03-30T17:35:49Z

Thanks, I'll check it out. Le mar. 30 mars 2021 à 18:42, Ira Horecka ***@***.***> a écrit :

…

@cwittwer <https://github.com/cwittwer>, @jraVette <https://github.com/jraVette>, @usctzen <https://github.com/usctzen>, @KeeonTabrizi <https://github.com/KeeonTabrizi>, @luisandrecunha <https://github.com/luisandrecunha> If you guys are interested in a new Craigslist API format, check out pycraigslist <https://github.com/irahorecka/pycraigslist>. I enjoy python-craigslist, but there were some features I wanted to implement immediately. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#105 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADXNCULAQFAEV7YGYKK2MNDTGH5OTANCNFSM4VVVT3VQ> .

usctzen · 2021-03-30T19:33:58Z

Ira, Just gave it a quick try and I am getting an error. The script finds the forsale.mca but does not recognize the forsale.mcy mca is motorcycle all and mcy is motorcycles by owner. *Traceback (most recent call last): File "C:/Users/mgpd/PycharmProjects/molivo/py_clist.py", line 3, in <module> print(pycraigslist.forsale.mcy.get_filters())AttributeError: type object 'forsale' has no attribute 'mcy'* Marc @usctzen Le mar. 30 mars 2021 à 18:42, Ira Horecka ***@***.***> a écrit :

…

@cwittwer <https://github.com/cwittwer>, @jraVette <https://github.com/jraVette>, @usctzen <https://github.com/usctzen>, @KeeonTabrizi <https://github.com/KeeonTabrizi>, @luisandrecunha <https://github.com/luisandrecunha> If you guys are interested in a new Craigslist API format, check out pycraigslist <https://github.com/irahorecka/pycraigslist>. I enjoy python-craigslist, but there were some features I wanted to implement immediately. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#105 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADXNCULAQFAEV7YGYKK2MNDTGH5OTANCNFSM4VVVT3VQ> .

irahorecka · 2021-03-30T19:36:03Z

Hey @usctzen, I always appreciate your feedback. Could you post the same issue in pycraigslist issues? I’ll address it there :)

usctzen · 2021-03-30T19:56:51Z

Sure thing! Le mar. 30 mars 2021 à 21:36, Ira Horecka ***@***.***> a écrit :

…

Hey @usctzen <https://github.com/usctzen>, I always appreciate your feedback. Could you post the same issue in pycraigslist issues? I’ll address it there :) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#105 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADXNCUOF36FRDJWJCHYHW23TGIR3FANCNFSM4VVVT3VQ> .

juliomalegria · 2021-04-06T19:26:54Z

Hey everyone! Sorry for the delay, I've updated the requirements in 88a6b73 and pushed a new version in PyPI. Could anyone confirm if the issue is fixed with this?
Thanks for all the patience!

Agwebberley · 2022-11-21T22:59:59Z

I am still having this issue

irahorecka mentioned this issue Mar 11, 2021

ValueError: 'portland' is not a valid site #106

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error 403 - Forbidden for url: https://www.craigslist.org/about/sites #105

Error 403 - Forbidden for url: https://www.craigslist.org/about/sites #105

luisandrecunha commented Jan 5, 2021

irahorecka commented Jan 5, 2021 •

edited

Loading

luisandrecunha commented Jan 5, 2021

irahorecka commented Jan 5, 2021

luisandrecunha commented Jan 5, 2021

juliomalegria commented Jan 7, 2021

luisandrecunha commented Jan 8, 2021

jraVette commented Feb 18, 2021 •

edited

Loading

juliomalegria commented Feb 19, 2021

cwittwer commented Feb 20, 2021

KeeonTabrizi commented Feb 21, 2021 •

edited

Loading

KeeonTabrizi commented Feb 22, 2021

juliomalegria commented Feb 23, 2021

usctzen commented Feb 23, 2021 via email

jraVette commented Feb 23, 2021 •

edited

Loading

cwittwer commented Mar 11, 2021

irahorecka commented Mar 30, 2021 •

edited

Loading

usctzen commented Mar 30, 2021 via email

usctzen commented Mar 30, 2021 via email

irahorecka commented Mar 30, 2021

usctzen commented Mar 30, 2021 via email

juliomalegria commented Apr 6, 2021

Agwebberley commented Nov 21, 2022

Error 403 - Forbidden for url: https://www.craigslist.org/about/sites #105

Error 403 - Forbidden for url: https://www.craigslist.org/about/sites #105

Comments

luisandrecunha commented Jan 5, 2021

irahorecka commented Jan 5, 2021 • edited Loading

luisandrecunha commented Jan 5, 2021

irahorecka commented Jan 5, 2021

luisandrecunha commented Jan 5, 2021

juliomalegria commented Jan 7, 2021

luisandrecunha commented Jan 8, 2021

jraVette commented Feb 18, 2021 • edited Loading

juliomalegria commented Feb 19, 2021

cwittwer commented Feb 20, 2021

KeeonTabrizi commented Feb 21, 2021 • edited Loading

KeeonTabrizi commented Feb 22, 2021

juliomalegria commented Feb 23, 2021

usctzen commented Feb 23, 2021 via email

jraVette commented Feb 23, 2021 • edited Loading

cwittwer commented Mar 11, 2021

irahorecka commented Mar 30, 2021 • edited Loading

usctzen commented Mar 30, 2021 via email

usctzen commented Mar 30, 2021 via email

irahorecka commented Mar 30, 2021

usctzen commented Mar 30, 2021 via email

juliomalegria commented Apr 6, 2021

Agwebberley commented Nov 21, 2022

irahorecka commented Jan 5, 2021 •

edited

Loading

jraVette commented Feb 18, 2021 •

edited

Loading

KeeonTabrizi commented Feb 21, 2021 •

edited

Loading

jraVette commented Feb 23, 2021 •

edited

Loading

irahorecka commented Mar 30, 2021 •

edited

Loading