medrxiv biorxiv download

medrxiv and biroxiv

These *rxivs have no API but can be accessed by a restful query. The process of download is shown in AMIDownloadTool and AMIDownloadTest. These examples can be seen as PoC; our current strategy is to use Ferret if possible.

overview

These rxiv s work in 3 or 4 steps when run by a human:

search/query generates a paged hitlist (e.g. 25 hits per page).
foreach hitlist link create a landingpage.
foreach landingpage retrieve (a) fulltext.html (b) fulltext.pdf
(optional) foreach fulltext.html retrieve supplemental files

desired operation

This should work in a similar way to getpapers:

download -q "my query" -o myproject --site medrxiv --limit 100

should generate a directory of myproject containing

metadata.jsonand a logfile
and 100 subdirectories (named from URLs) each containing
1. fulltext.pdf if it exists,
2. metadata.json if it exists

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

medrxiv biorxiv download

medrxiv and biroxiv

overview

desired operation

Clone this wiki locally