-
-
Notifications
You must be signed in to change notification settings - Fork 262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot fetch url with more than 2 redirections #450
Labels
enhancement
New feature or request
Comments
Hi @julienlambert42, the maximum number of redirects is currently set to 2. This page needs a few more, that's why the download fails. We could put this number in the |
adbar
changed the title
Cannot fetch url redirections
Cannot fetch url with more than 2 redirections
Dec 1, 2023
vbarbaresi
added a commit
to vbarbaresi/trafilatura
that referenced
this issue
Dec 30, 2023
…dling Fixes issue adbar#450 After setting `MAX_REDIRECTS` to 5, I could fetch the original URL from the issue: `trafilatura -u https://www.hydrogeninsight.com/production/breaking-us-reveals-the-seven-regional-hydrogen-hubs-to-receive-7bn-of-government-funding/2-1-1534596` I also fixed this old issue: adbar#128 The underlying urllib3 bug has not been fixed: urllib3/urllib3#2475 I had to pass the retry strategy to the actual request method: it doesn't propagate from the pool maanger
adbar
added a commit
that referenced
this issue
Jan 4, 2024
…dling (#461) * introduce `MAX_REDIRECTS` config setting and fix urllib3 redirect handling Fixes issue #450 After setting `MAX_REDIRECTS` to 5, I could fetch the original URL from the issue: `trafilatura -u https://www.hydrogeninsight.com/production/breaking-us-reveals-the-seven-regional-hydrogen-hubs-to-receive-7bn-of-government-funding/2-1-1534596` I also fixed this old issue: #128 The underlying urllib3 bug has not been fixed: urllib3/urllib3#2475 I had to pass the retry strategy to the actual request method: it doesn't propagate from the pool maanger * use httpbin everywhere instead of httpbun * restore MAX_REDIRECTS defaut config value to 2 * reset global objects after test_fetch to avoid side-effects * pin lxml to < 5 * test no_ssl True & False to fix coverage * move _reset_global_objects() to the test file * rewrite assignment in multiple lines for readability --------- Co-authored-by: Adrien Barbaresi <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hello,
The fetch_url() function returns None when a redirection url is used as argument.
For example :
Redirecting url returns None because this url redirects to Actual url
The requests.get(url).text function from requests library allow redirections.
--> Is this behaviour done on purpose (for security reasons for example) ?
--> If yes, could you add an argument to the function to allow redirections (actually it would even be better to have a num_redirection argument set to 0 by default, so we can set a threshold to avoid scam redirections)
Thanks !
The text was updated successfully, but these errors were encountered: