-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable user to use .export for PDF download #87
Comments
Out of curiosity, did you run into rate-limiting yourself? Do you know when it kicked in (roughly)? There's an export.arxiv.org record for every result from the API, so it should be safe to add the We also need to confirm the download behavior when a PDF does not already exist for the These cases must be handled gracefully. |
I honestly think this library should default to using |
@brandonrobertz this library does use Line 513 in 678ba9f
The difference is that it receives download URLs from the API instead of building them. Digression: let's chat limits.
The default If you're interpreting the
Did you call |
Interesting, sorry about the bad assumption, I didn't realize this used the export site. That's even more perplexing, then. And no I didn't call download_pdf 300k times. I got 403 after attempting to do I can open separate PR. |
@brandonrobertz No worries! Happy to advise. |
Motivation
The arxiv library uses the .export.arxiv.org subdomain for querying a paper, but downloads the paper directly from arxiv.org. This can result in the problem that the user gets blocked from arxiv, when downloading too many papers.
Solution
A solution would be to modify the paper PDF url to point to the corresponding .export subdomain. In the code for my personal use I simply use:
where paper is a
Result
instance. This solution is lacking though, since the export subdomain does not have to exist. This would need to be checked. I would add this functionality into the_get_pdf_url
method. A boolean flaguser_export
could be introduced, if some users wish to download directy from arxiv.org, even though it is not adviced according to: https://arxiv.org/help/bulk_data under the "Play Nice" section.The text was updated successfully, but these errors were encountered: