-
Notifications
You must be signed in to change notification settings - Fork 17
getpapers
https://github.com/ContentMine/getpapers
queries a repository with RESTful API and downloads content in bulk
overview and installation https://github.com/ContentMine/getpapers/blob/master/README.md For example https://github.com/petermr/tigr2ess/blob/master/getpapers/OVERVIEW.md
Installation of getpapers
involves steps for every operation system.
Instructions followed: https://github.com/ContentMine/getpapers/blob/master/README.md
See also: https://github.com/petermr/tigr2ess/blob/master/installation/INSTALLATION.md
Simple: https://github.com/ContentMine/getpapers/blob/master/README.md
For a full example: https://github.com/petermr/tigr2ess/tree/master/getpapers
-
getpapers
uses a headless browser (Phantom.js
) which still works but is no longer maintained. It is customised forEPMC
,IEEE
,Crossref
and ?arXiv
. It needs a RESTful API. - the query syntax is different on different sites. Also escape characters (
"
or'
) - default query format is EPMC
help in downloading large files with full text content in bulk at a short time duration.
Users can face various problems during the installation process of getpapers
. They may encounter errors in their process. Follow the instructions and in case of any installation problem, post an issue about the same in the issue section, or refer to an existing issue if it matches the problem.
For users facing any usage problems in getpapers
they can create an issue regarding the same or may refer the existing ones.
(please add some queries involving DATE, OR, AND, NOT)
Kareena Singh
Windows 10
https://github.com/petermr/tigr2ess/blob/master/installation/windows/INSTALLATION.md
Go to the downloads page and download latest version of nvm-setup.zip.
Unzip the downloaded file and run the included installer.
successfully installed and run
https://github.com/petermr/tigr2ess/blob/master/installation/windows/INSTALLATION.md
Open your command prompt, and run the following commands one after the other.
nvm install 7 nvm use 7.10.1
successful
The following installation problem occured when I put node installation command in command line
Error: Access to the registry path is denied
-
reason insufficient privileges to install (requires "root" permission in windows)
-
solution
-
https://medium.com/@github.gkarthiks/how-to-install-nodejs-and-npm-in-non-admin-access-windows-machines-102fd461b54c The problem occurred because i required "root" or "superuser". If this is your own machine you can probably get this permission. Using the link provided, I was able to find solution to my problem and the issue was resolved after following the instructions given.
-
commands issued :
npm install --global getpapers
successful
node --version
version 11.11.0
https://github.com/petermr/tigr2ess/blob/master/installation/windows/INSTALLATION.md
Run the following command at command prompt:
npm install --global getpapers Now run the command getpapers at the command prompt, and you should see something as below:
none reported
You can run the test of installation by putting the command getpapers --version If you get the following, then installation is succesful. 0.4.17
Lakshmi Devi Priya
Windows 10
Instructions from: https://github.com/ContentMine/blob/master/README.md
Successful
Successful
C:\Users>node -v
v12.16.3
Instructions from: https://github.com/ContentMine/blob/master/README.md
Successful
C:\Users>$ npm install --global getpapers
`$` is not recognized as an internal or external command,
operable program or batch file.
-
'$' is not a part of the command(it's UNIX prompt).
-
So just try as
npm install --global getpapers
- getpapers is installed.
Use
C:\Users>getpapers --help
-
The command option used for getpapers are viewed.
-
Installed getpapers.
Followed example from:
https://github.com/petermr/tigr2ess/tree/master/getpapers
To search query on a specified task
getpapers -q <query> -n -k 100
-q, --query : search query(required)
-n, --noexecute : only reports how many queries match the query, but don't actually download anything
For eg: for the query of COVID-19
Use as
getpapers -q COVID-19 -n -k <int>
The results will be shown as below:
https://drive.google.com/file/d/1DP0_xcjC5GMQ2CflM7TQUoyIO3J3MyCW/view?usp=drivesdk
Output - Founds 46887 open acesss results. This much result cannot be downloaded, so the number of downloads should be limited.
-k, --limit : limits the number of hits and downloads
<int>
refers to an integer. Hence, the number of files to be downloaded should be represented.
getpapers -q <query> -k <int> -o <path> -x -p
-o, --outdir : output directory(required - will be created if not found).
This command gives the path to the directory created in the system for the downloaded files.
-p, --pdf : downloads fulltext PDFs if available.
-x, --xml : downloads fulltext XMLs if available.
Thus, for the query COVID-19 the syntax
getpapers -q COVID-19 -k 100 -o covid -x -p
gives the result as follows.
https://drive.google.com/file/d/1H5k8ZooTFD1dHnMOK6-eTxckc95iJ6lG/view?usp=drivesdk
.xml
files in the resultant folder are both machine-readable and human-readable.
Expected 100 .xml
files were downloaded.
But only 76 .pdf
files were downloaded.