Skip to content

medrxiv biorxiv download

petermr edited this page May 5, 2020 · 8 revisions

medrxiv and biroxiv

These *rxivs have no API but can be accessed by a restful query. The process of download is shown in AMIDownloadTool and AMIDownloadTest. These examples can be seen as PoC; our current strategy is to use Ferret if possible.

overview

These rxiv s work in 3 or 4 steps when run by a human:

  1. search/query generates a paged hitlist (e.g. 25 hits per page).
  2. foreach hitlist link create a landingpage.
  3. foreach landingpage retrieve (a) fulltext.html (b) fulltext.pdf
  4. (optional) foreach fulltext.html retrieve supplemental files
Clone this wiki locally