Skip to content

AlphaTest (petermr) 2

petermr edited this page Jul 25, 2021 · 8 revisions

Alphatest continued

Not covered in first alphatest

  -z, --zip             download files from ftp endpoint if available (only eupmc supported)
  -l LOGLEVEL, --loglevel LOGLEVEL
  -f LOGFILE, --logfile LOGFILE
  -k LIMIT, --limit LIMIT
  -r RESTART, --restart RESTART
  -u UPDATE, --update UPDATE
  --onlyquery           Saves json file containing the result of the query in storage. (only eupmc
  -c, --makecsv         Stores the per-document metadata as csv.
  --makehtml            Stores the per-document metadata as html.
  --synonym             Results contain synonyms as well.
  --startdate STARTDATE
  --enddate ENDDATE     Gives papers till given date. Format: YYYY-MM-DD
  --terms TERMS         Location of the txt file which contains terms serperated by a comma which
  --api API             API to search [eupmc, crossref,arxiv,biorxiv,medrxiv,rxivist] (default:
  --filter FILTER       filter by key value pair (only crossref supported)

zip

not easily testable (need some pointers to PMCIDs)

-k LIMIT

pygetpapers -q 'TPS30' -k 10 -o TPS30 INFO: Final query is TPS30 INFO: Total Hits are 18 0it [00:00, ?it/s]WARNING: Keywords not found for paper 4 WARNING: Keywords not found for paper 5 WARNING: Keywords not found for paper 10 1it [00:00, 268.38it/s] 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:04<00:00, 2.28it/s] (base) pm286macbook:petermr pm286$ tree TPS30/ TPS30/ ├── PMC4457800 │   └── eupmc_result.json ├── PMC5122590 │   └── eupmc_result.json ├── PMC5161391 │   └── eupmc_result.json ├── PMC5655044 │   └── eupmc_result.json ├── PMC6266747 │   └── eupmc_result.json ├── PMC6742361 │   └── eupmc_result.json ├── PMC7305226 │   └── eupmc_result.json ├── PMC7600171 │   └── eupmc_result.json ├── PMC8036305 │   └── eupmc_result.json ├── PMC8201348 │   └── eupmc_result.json └── eupmc_results.json

10 directories, 11 files

### update
Takes the partial output of previous search (limited to 10) and increases to 100 (actually only 18 hits). I couldn't get it right.

$ pygetpapers -q 'TPS30' --update TPS30/eupmc_results.json -k 100 -o TPS30 INFO: Final query is TPS30 INFO: Please ensure that you are providing the same --api as the one in the corpus or you may get errors Traceback (most recent call last): File "/opt/anaconda3/bin/pygetpapers", line 8, in sys.exit(main()) File "/opt/anaconda3/lib/python3.8/site-packages/pygetpapers/pygetpapers.py", line 583, in main callpygetpapers.handlecli() File "/opt/anaconda3/lib/python3.8/site-packages/pygetpapers/pygetpapers.py", line 564, in handlecli self.handle_update(args) File "/opt/anaconda3/lib/python3.8/site-packages/pygetpapers/pygetpapers.py", line 73, in handle_update self.europe_pmc.eupmc_update(args) File "/opt/anaconda3/lib/python3.8/site-packages/pygetpapers/europe_pmc.py", line 190, in eupmc_update os.chdir(os.path.dirname(args.update)) FileNotFoundError: [Errno 2] No such file or directory: 'TPS30'

Don't understand error: current files:

ls 2021_07_24_19_18_09 README.md TPS31 tps_terms_3.txt 2021_07_25_18_21_12 TPS30 tps_terms_2.txt tps_terms_50.txt