Skip to content

AlphaTest (petermr) 2

petermr edited this page Jul 25, 2021 · 8 revisions

Alphatest continued

Not covered in first alphatest

  -z, --zip             download files from ftp endpoint if available (only eupmc supported)
  -l LOGLEVEL, --loglevel LOGLEVEL
  -f LOGFILE, --logfile LOGFILE
  -k LIMIT, --limit LIMIT
  -r RESTART, --restart RESTART
  -u UPDATE, --update UPDATE
  --onlyquery           Saves json file containing the result of the query in storage. (only eupmc
  -c, --makecsv         Stores the per-document metadata as csv.
  --makehtml            Stores the per-document metadata as html.
  --synonym             Results contain synonyms as well.
  --startdate STARTDATE
  --enddate ENDDATE     Gives papers till given date. Format: YYYY-MM-DD
  --terms TERMS         Location of the txt file which contains terms serperated by a comma which
  --api API             API to search [eupmc, crossref,arxiv,biorxiv,medrxiv,rxivist] (default:
  --filter FILTER       filter by key value pair (only crossref supported)

zip

not easily testable (need some pointers to PMCIDs)

-k LIMIT

pygetpapers -q 'TPS30' -k 10 -o TPS30
INFO: Final query is TPS30
INFO: Total Hits are 18
0it [00:00, ?it/s]WARNING: Keywords not found for paper 4
WARNING: Keywords not found for paper 5
WARNING: Keywords not found for paper 10
1it [00:00, 268.38it/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:04<00:00,  2.28it/s]
(base) pm286macbook:petermr pm286$ tree TPS30/
TPS30/
├── PMC4457800
│   └── eupmc_result.json
├── PMC5122590
│   └── eupmc_result.json
├── PMC5161391
│   └── eupmc_result.json
├── PMC5655044
│   └── eupmc_result.json
├── PMC6266747
│   └── eupmc_result.json
├── PMC6742361
│   └── eupmc_result.json
├── PMC7305226
│   └── eupmc_result.json
├── PMC7600171
│   └── eupmc_result.json
├── PMC8036305
│   └── eupmc_result.json
├── PMC8201348
│   └── eupmc_result.json
└── eupmc_results.json

10 directories, 11 files

update

Takes the partial output of previous search (limited to 10) and increases to 100 (actually only 18 hits). I couldn't get it right.

$ pygetpapers -q 'TPS30' --update TPS30/eupmc_results.json -k 100 -o TPS30
INFO: Final query is TPS30
INFO: Please ensure that you are providing the same --api as the one in the corpus or you may get errors
Traceback (most recent call last):
  File "/opt/anaconda3/bin/pygetpapers", line 8, in <module>
    sys.exit(main())
  File "/opt/anaconda3/lib/python3.8/site-packages/pygetpapers/pygetpapers.py", line 583, in main
    callpygetpapers.handlecli()
  File "/opt/anaconda3/lib/python3.8/site-packages/pygetpapers/pygetpapers.py", line 564, in handlecli
    self.handle_update(args)
  File "/opt/anaconda3/lib/python3.8/site-packages/pygetpapers/pygetpapers.py", line 73, in handle_update
    self.europe_pmc.eupmc_update(args)
  File "/opt/anaconda3/lib/python3.8/site-packages/pygetpapers/europe_pmc.py", line 190, in eupmc_update
    os.chdir(os.path.dirname(args.update))
FileNotFoundError: [Errno 2] No such file or directory: 'TPS30'

Don't understand error: current files:

ls
2021_07_24_19_18_09	README.md		TPS31			tps_terms_3.txt
2021_07_25_18_21_12	TPS30			tps_terms_2.txt		tps_terms_50.txt