-
Notifications
You must be signed in to change notification settings - Fork 17
medrxiv biorxiv download
petermr edited this page May 5, 2020
·
8 revisions
These *rxiv
s have no API but can be accessed by a restful query. The process of download is shown in AMIDownloadTool
and AMIDownloadTest
. These examples can be seen as PoC; our current strategy is to use Ferret
if possible.
These rxiv
s work in 3 or 4 steps when run by a human:
- search/query generates a paged hitlist (e.g. 25 hits per page).
- foreach hitlist link create a landingpage.
- foreach landingpage retrieve (a)
fulltext.html
(b)fulltext.pdf
- (optional) foreach fulltext.html retrieve supplemental files
This should work in a similar way to getpapers
:
download -q "my query" -o myproject --site medrxiv --limit 100
should generate a directory of myproject
containing
-
metadata.json
and a logfile - and 100 subdirectories (named from URLs) each containing
-
fulltext.pdf
if it exists, -
metadata.json
if it exists
-