-
Notifications
You must be signed in to change notification settings - Fork 17
getpapers
https://github.com/ContentMine/getpapers
queries a repository with RESTful API and downloads content in bulk
overview and installation https://github.com/ContentMine/getpapers/blob/master/README.md For example https://github.com/petermr/tigr2ess/blob/master/getpapers/OVERVIEW.md
Installation of getpapers
involves steps for every operation system.
Instructions followed: https://github.com/ContentMine/getpapers/blob/master/README.md
See also: https://github.com/petermr/tigr2ess/blob/master/installation/INSTALLATION.md
Simple: https://github.com/ContentMine/getpapers/blob/master/README.md
For a full example: https://github.com/petermr/tigr2ess/tree/master/getpapers
-
getpapers
uses a headless browser (Phantom.js
) which still works but is no longer maintained. It is customised forEPMC
,IEEE
,Crossref
and ?arXiv
. It needs a RESTful API. - the query syntax is different on different sites. Also escape characters (
"
or'
) - default query format is EPMC
help in downloading large files with full text content in bulk at a short time duration.
Users can face various problems during the installation process of getpapers
. They may encounter errors in their process. Follow the instructions and in case of any installation problem, post an issue about the same in the issue section, or refer to an existing issue if it matches the problem.
For users facing any usage problems in getpapers
they can create an issue regarding the same or may refer the existing ones.
(please add some queries involving DATE, OR, AND, NOT)
Kareena Singh
Windows 10
https://github.com/petermr/tigr2ess/blob/master/installation/windows/INSTALLATION.md
Go to the downloads page and download latest version of nvm-setup.zip.
Unzip the downloaded file and run the included installer.
successfully installed and run
https://github.com/petermr/tigr2ess/blob/master/installation/windows/INSTALLATION.md
Open your command prompt, and run the following commands one after the other.
nvm install 7 nvm use 7.10.1
successful
The following installation problem occured when I put node installation command in command line
Error: Access to the registry path is denied
-
reason insufficient privileges to install (requires "root" permission in windows)
-
solution
-
https://medium.com/@github.gkarthiks/how-to-install-nodejs-and-npm-in-non-admin-access-windows-machines-102fd461b54c The problem occurred because i required "root" or "superuser". If this is your own machine you can probably get this permission. Using the link provided, I was able to find solution to my problem and the issue was resolved after following the instructions given.
-
commands issued :
npm install --global getpapers
successful
node --version
version 11.11.0
https://github.com/petermr/tigr2ess/blob/master/installation/windows/INSTALLATION.md
Run the following command at command prompt:
npm install --global getpapers Now run the command getpapers at the command prompt, and you should see something as below:
none reported
You can run the test of installation by putting the command getpapers --version If you get the following, then installation is succesful. 0.4.17
Lakshmi Devi Priya
Windows 10
Instructions from: https://github.com/ContentMine/blob/master/README.md
Successful
Successful
C:\Users>node -v
v12.16.3
Instructions from: https://github.com/ContentMine/blob/master/README.md
Successful
C:\Users>$ npm install --global getpapers
`$` is not recognized as an internal or external command,
operable program or batch file.
-
'$' is not a part of the command(it's UNIX prompt).
-
So just try as
npm install --global getpapers
- getpapers is installed.
Use
C:\Users>getpapers --help
-
The command option used for getpapers are viewed.
-
Installed getpapers.
Followed example from:
https://github.com/petermr/tigr2ess/tree/master/getpapers
To search query on a specified task
getpapers -q <query> -n -k 100
-q, --query : search query(required)
-n, --noexecute : only reports how many queries match the query, but don't actually download anything
For eg: for the query of COVID-19
Use as
getpapers -q COVID-19 -n -k <int>
The results will be shown as below:
https://drive.google.com/file/d/1DP0_xcjC5GMQ2CflM7TQUoyIO3J3MyCW/view?usp=drivesdk
Output - Founds 46887 open acesss results. This much result cannot be downloaded, so the number of downloads should be limited.
-k, --limit : limits the number of hits and downloads
<int>
refers to an integer. Hence, the number of files to be downloaded should be represented.
getpapers -q <query> -k <int> -o <path> -x -p
-o, --outdir : output directory(required - will be created if not found).
This command gives the path to the directory created in the system for the downloaded files.
-p, --pdf : downloads fulltext PDFs if available.
-x, --xml : downloads fulltext XMLs if available.
Thus, for the query COVID-19 the syntax
getpapers -q COVID-19 -k 100 -o covid -x -p
gives the result as follows.
https://drive.google.com/file/d/1H5k8ZooTFD1dHnMOK6-eTxckc95iJ6lG/view?usp=drivesdk
.xml
files in the resultant folder are both machine-readable and human-readable.
Expected 100 .xml
files were downloaded.
But only 76 .pdf
files were downloaded.
Pruthiv rajan
Windows 10
Instructions from: https://github.com/ContentMine/blob/master/README.md
Successful
Successful
C:\Users>node -v
v12.16.3
Instructions from: https://github.com/ContentMine/blob/master/README.md
Successful
No problems.
Use
C:\Users>getpapers --help
-
The command option used for getpapers are viewed.
-
Installed getpapers.
Followed example from:
https://github.com/petermr/tigr2ess/tree/master/getpapers
To search query on a specified task
getpapers -q <query> -n -k 100
-q, --query : search query(required)
-n, --noexecute : only reports how many queries match the query, but don't actually download anything
For eg: for the query of COVID-19
Use as
getpapers -q COVID-19 -n -k <int>
The results will be shown as below:
https://drive.google.com/file/d/1DP0_xcjC5GMQ2CflM7TQUoyIO3J3MyCW/view?usp=drivesdk
Output - Founds 46887 open acesss results. This much result cannot be downloaded, so the number of downloads should be limited.
-k, --limit : limits the number of hits and downloads
<int>
refers to an integer. Hence, the number of files to be downloaded should be represented.
getpapers -q <query> -k <int> -o <path> -x -p
-o, --outdir : output directory(required - will be created if not found).
This command gives the path to the directory created in the system for the downloaded files.
-p, --pdf : downloads fulltext PDFs if available.
-x, --xml : downloads fulltext XMLs if available.
Thus, for the query Human genome project the syntax
getpapers -q “human genome project ” -k 100 -o covid -x -hgp
gives the result as follows.
Expected 100 .xml
files were downloaded.
But only 84 .pdf
files were downloaded.
Name: Ambreen Hamadani
Operating System: Windows 10
Preinstalled on the System
Source of Instruction: ContentMine / getpapers
Steps in the Installation:
- Open Comand Prompt
- Run the command
https://github.com/ContentMine/getpapers
Installation: Successful
Test of the Installation:
- Type
getpapers
in Command Prompt - Usage and options displayed
The tool was used to retrieve 100 papers on the topic, 'masks' with the output directory specified as 'test1'
Command used:
getpapers --query 'masks ' --limit 100 --outdir test1
Results
- A new directory (test1) created within the home directory
- 100 folders (PMC###) created within 'test1' each containing a JSON file (eupmc_result)
- 1 text file (eupmc_fulltext_html_urls) containing the URLs of all downloaded documents
- 1 JSON file (eupmc_results) created **Command line output **
- 0 error messages
- No warnings
The tool was used to retrieve 200 papers on the topic, 'viral epidemics' with the output directory specified as 'test3'
Command used:
getpapers --query 'viral epidemics' --limit 200 --outdir test3
Results
- A new directory (test3) created within the home directory
- 200 folders (PMC###) created within 'test3' each containing a JSON file (eupmc_result)
- 1 text file (eupmc_fulltext_html_urls) containing the URLs of all downloaded documents
- 1 JSON file (eupmc_results) created **Command line output **
- 0 error messages
- 2 warnings (
warn: This version of getpapers wasn't built with this version of the EuPMC api in mind; warn: getpapers EuPMCVersion: 5.3.2 vs. 6.3 reported by api)
-
Installation of
Node.Js
Reference :https://github.com/petermr/tigr2ess/blob/master/installation/INSTALLATION.md -
Installation of
getpapers
Reference :https://github.com/petermr/tigr2ess/blob/master/installation/INSTALLATION.md
Successful
I.Type getpapers
in Command Prompt
1.** Downloaded 100 papers on the topic, 'COVID-19' (PDF Files)**
getpapers -q "COVID-19" -p -k 100 -o covid_19
Successfully downloaded 100 papers with 1(.json file) and 1(.txt file)
- 0 error messages
- 2 Warnings
**Downloaded 100 (.xml) files on 'COVID deaths' with the directory cdeaths
**
getpapers -q "COVID deaths" -o cdeaths -x -k 100
with 1 (.json file) and 1 (.txt file)
- 0 error messages
- 2 Warnings
Reference:https://github.com/petermr/tigr2ess/blob/master/getpapers/TUTORIAL.md