-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error 403 - Forbidden for url: https://www.craigslist.org/about/sites #105
Comments
Seems like this works on my end. Did you upgrade |
Yep, I did the upgrade and continue to have the same issue. Using v1.1.0 and python 3.6, I'm using Google's Colab notebooks. |
Ah, this looks to be a problem with the import requests
requests.get("https://www.craigslist.org/about/sites") |
You are completely right, I also tried in a new colab and got "<Response [403]>" If I run the code below I get a successful response and the page code. I believe it's related with the web scraping issue in this page.
|
Thanks for reporting @luisandrecunha. Interesting. Seems like Craigslist is blocking requests coming from your IP (or Google's Colab IPs). I'm guessing the IP hit a max number of requests per day/hour/minute. Do you mind running the code suggested by @irahorecka but setting a User-Agent like you did with import requests
requests.get("https://www.craigslist.org/about/sites", headers={'User-Agent': 'python-craigslist/1.1.0'}) If this works fine, I'll add a default Thanks! |
Hi @juliomalegria , It seems that Google's Colab IPs is blocked by Craigslist... I successfully ran the code in a local jupyter notebook and it worked like a charm. I tried the code you suggested in Colab and continued to get the 403 response... However I receive the right page if I use the code below, not sure if somehow the code could be adapted.
Thank you again, |
Just a heads up, I've got the exact same issue. I've been running my code for more than a year and this just happened this week. So, something must have changed on the craigslist side? I'll have to dig into the code. I can cut and paste the url into a browser and it works fine. Just wanted to let you know of another user with the same issues.
I tried it on a couple of computers, so I don't think it's IP related. Guess how the servers are seeing the 'requests' library versus a regular library. Thanks! |
Hey everyone! Sorry for the inactivity. I've released a new version ( |
I am still getting the 403 error with the updated utils.py. |
+1 Having the same behavior - 403s on Also note I tried taking the headers object from the cURL to I used a selenium driver I had with some mods I've used in the past and I was able to load |
Okay I've dug into it a bit more - I don't think this has anything do to with user agents or anything they are blocking like that. I recommend upgrading both the After I did that I tested the request function (which is effectively
|
Thanks @KeeonTabrizi! That's a very good point. |
Hey guys.
I am not a power user, but I have found that the latest *idna* version is
incompatible with *requests*. If you installed the latest *idna* then just
run *requests* upgrade and it will revert the *idna* version. I have no
clue that it could be your troubles, but it could be a factor.
Hope this helps.
Le mar. 23 févr. 2021 à 13:15, Julio M. Alegria <[email protected]>
a écrit :
… Thanks @KeeonTabrizi <https://github.com/KeeonTabrizi>! That's a very
good point.
I've updated the requirements to include some minimum version for
requirements (requests and beautifulsoup4).
Can anyone having issues try updating their library (pip install
python-craigslist --upgrade) and let me know if this fixed the issue.
Thanks again!
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#105 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADXNCUNQTBYAWRMKIOJJWQDTAOL47ANCNFSM4VVVT3VQ>
.
|
Hey y'all, thanks so much for taking the time to fix this! So, it could just be how my packages were managed, but, when I performed ( These versions are what got my code working:
PS. great module, it's helped me get some great deals on Craiglist! |
+1 this fixed everything. Good catch! |
@cwittwer, @jraVette, @usctzen, @KeeonTabrizi, @luisandrecunha If you guys are interested in a new Craigslist API format, check out pycraigslist. |
Thanks, I'll check it out.
Le mar. 30 mars 2021 à 18:42, Ira Horecka ***@***.***> a
écrit :
… @cwittwer <https://github.com/cwittwer>, @jraVette
<https://github.com/jraVette>, @usctzen <https://github.com/usctzen>,
@KeeonTabrizi <https://github.com/KeeonTabrizi>, @luisandrecunha
<https://github.com/luisandrecunha> If you guys are interested in a new
Craigslist API format, check out pycraigslist
<https://github.com/irahorecka/pycraigslist>.
I enjoy python-craigslist, but there were some features I wanted to
implement immediately.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#105 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADXNCULAQFAEV7YGYKK2MNDTGH5OTANCNFSM4VVVT3VQ>
.
|
Ira,
Just gave it a quick try and I am getting an error. The script finds the
forsale.mca but does not recognize the forsale.mcy
mca is motorcycle all and mcy is motorcycles by owner.
*Traceback (most recent call last): File
"C:/Users/mgpd/PycharmProjects/molivo/py_clist.py", line 3, in <module>
print(pycraigslist.forsale.mcy.get_filters())AttributeError: type object
'forsale' has no attribute 'mcy'*
Marc @usctzen
Le mar. 30 mars 2021 à 18:42, Ira Horecka ***@***.***> a
écrit :
… @cwittwer <https://github.com/cwittwer>, @jraVette
<https://github.com/jraVette>, @usctzen <https://github.com/usctzen>,
@KeeonTabrizi <https://github.com/KeeonTabrizi>, @luisandrecunha
<https://github.com/luisandrecunha> If you guys are interested in a new
Craigslist API format, check out pycraigslist
<https://github.com/irahorecka/pycraigslist>.
I enjoy python-craigslist, but there were some features I wanted to
implement immediately.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#105 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADXNCULAQFAEV7YGYKK2MNDTGH5OTANCNFSM4VVVT3VQ>
.
|
Hey @usctzen, I always appreciate your feedback. Could you post the same issue in pycraigslist issues? I’ll address it there :) |
Sure thing!
Le mar. 30 mars 2021 à 21:36, Ira Horecka ***@***.***> a
écrit :
… Hey @usctzen <https://github.com/usctzen>, I always appreciate your
feedback. Could you post the same issue in pycraigslist issues? I’ll
address it there :)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#105 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADXNCUOF36FRDJWJCHYHW23TGIR3FANCNFSM4VVVT3VQ>
.
|
Hey everyone! Sorry for the delay, I've updated the requirements in 88a6b73 and pushed a new version in PyPI. Could anyone confirm if the issue is fixed with this? |
I am still having this issue |
Hi Julio,
I have used your code before (early 2020), but now I'm getting the error below when trying to import CraigslistHousing, using "from craigslist import CraigslistHousing":
HTTPError: 403 Client Error: Forbidden for url: https://www.craigslist.org/about/sites
Not sure why, it seems that could be related with this issue: https://stackoverflow.com/questions/16627227/http-error-403-in-python-3-web-scraping.
Do you happen to know why this is happening?
Thanks,
The text was updated successfully, but these errors were encountered: