-
Notifications
You must be signed in to change notification settings - Fork 17
getpapers
https://github.com/ContentMine/getpapers
queries a repository with RESTful API and downloads content in bulk
overview and installation https://github.com/ContentMine/getpapers/blob/master/README.md For example https://github.com/petermr/tigr2ess/blob/master/getpapers/OVERVIEW.md
Installation of getpapers
involves steps for every operation system.
Instructions followed: https://github.com/ContentMine/getpapers/blob/master/README.md
See also: https://github.com/petermr/tigr2ess/blob/master/installation/INSTALLATION.md
Simple: https://github.com/ContentMine/getpapers/blob/master/README.md
For a full example: https://github.com/petermr/tigr2ess/tree/master/getpapers
-
getpapers
uses a headless browser (Phantom.js
) which still works but is no longer maintained. It is customised forEPMC
,IEEE
,Crossref
and ?arXiv
. It needs a RESTful API. - the query syntax is different on different sites. Also escape characters (
"
or'
) - default query format is EPMC
help in downloading large files with full text content in bulk at a short time duration.
Users can face various problems during the installation process of getpapers
. They may encounter errors in their process. Follow the instructions and in case of any installation problem, post an issue about the same in the issue section, or refer to an existing issue if it matches the problem.
For users facing any usage problems in getpapers
they can create an issue regarding the same or may refer the existing ones.
For users using macOS X or higher, a profile needs to be created before downloading nvm. See Tester 8: Charles Li's section.
(please add some queries involving DATE, OR, AND, NOT)
Kareena Singh
Windows 10
https://github.com/petermr/tigr2ess/blob/master/installation/windows/INSTALLATION.md
Go to the downloads page and download latest version of nvm-setup.zip.
Unzip the downloaded file and run the included installer.
successfully installed and run
https://github.com/petermr/tigr2ess/blob/master/installation/windows/INSTALLATION.md
Open your command prompt, and run the following commands one after the other.
nvm install 7
nvm use 7.10.1
successful
The following installation problem occured when I put node installation command in command line
Error: Access to the registry path is denied
-
reason insufficient privileges to install (requires "root" permission in windows)
-
solution
-
https://medium.com/@github.gkarthiks/how-to-install-nodejs-and-npm-in-non-admin-access-windows-machines-102fd461b54c The problem occurred because i required "root" or "superuser". If this is your own machine you can probably get this permission. Using the link provided, I was able to find solution to my problem and the issue was resolved after following the instructions given.
-
commands issued :
npm install --global getpapers
successful
node --version
version 11.11.0
https://github.com/petermr/tigr2ess/blob/master/installation/windows/INSTALLATION.md
Run the following command at command prompt:
npm install --global getpapers Now run the command getpapers at the command prompt, and you should see something as below:
none reported
You can run the test of installation by putting the command getpapers --version
If you get the following, then installation is successful.
0.4.17
Lakshmi Devi Priya
Windows 10
Instructions from: https://github.com/ContentMine/blob/master/README.md
Successful
Successful
C:\Users>node -v
v12.16.3
Instructions from: https://github.com/ContentMine/blob/master/README.md
Successful
C:\Users>$ npm install --global getpapers
`$` is not recognized as an internal or external command,
operable program or batch file.
-
'$' is not a part of the command(it's UNIX prompt).
-
So just try as
npm install --global getpapers
- getpapers is installed.
Use
C:\Users>getpapers --help
-
The command option used for getpapers are viewed.
-
Installed getpapers.
Followed example from:
https://github.com/petermr/tigr2ess/tree/master/getpapers
To search query on a specified task
getpapers -q <query> -n -k 100
-q, --query : search query(required)
-n, --noexecute : only reports how many queries match the query, but don't actually download anything
For eg: for the query of COVID-19
Use as
getpapers -q COVID-19 -n -k <int>
The results will be shown as below:
https://drive.google.com/file/d/1DP0_xcjC5GMQ2CflM7TQUoyIO3J3MyCW/view?usp=drivesdk
Output - Founds 46887 open acesss results. This much result cannot be downloaded, so the number of downloads should be limited.
-k, --limit : limits the number of hits and downloads
<int>
refers to an integer. Hence, the number of files to be downloaded should be represented.
getpapers -q <query> -k <int> -o <path> -x -p
-o, --outdir : output directory(required - will be created if not found).
This command gives the path to the directory created in the system for the downloaded files.
-p, --pdf : downloads fulltext PDFs if available.
-x, --xml : downloads fulltext XMLs if available.
Thus, for the query COVID-19 the syntax
getpapers -q COVID-19 -k 100 -o covid -x -p
gives the result as follows.
https://drive.google.com/file/d/1H5k8ZooTFD1dHnMOK6-eTxckc95iJ6lG/view?usp=drivesdk
.xml
files in the resultant folder are both machine-readable and human-readable.
Expected 100 .xml
files were downloaded.
But only 76 .pdf
files were downloaded.
Pruthiv rajan
Windows 10
Instructions from: https://github.com/ContentMine/blob/master/README.md
Successful
Successful
C:\Users>node -v
v12.16.3
Instructions from: https://github.com/ContentMine/blob/master/README.md
Successful
No problems.
Use
C:\Users>getpapers --help
-
The command option used for getpapers are viewed.
-
Installed getpapers.
Followed example from:
https://github.com/petermr/tigr2ess/tree/master/getpapers
To search query on a specified task
getpapers -q <query> -n -k 100
-q, --query : search query(required)
-n, --noexecute : only reports how many queries match the query, but don't actually download anything
For eg: for the query of COVID-19
Use as
getpapers -q COVID-19 -n -k <int>
The results will be shown as below:
https://drive.google.com/file/d/1DP0_xcjC5GMQ2CflM7TQUoyIO3J3MyCW/view?usp=drivesdk
Output - Founds 46887 open acesss results. This much result cannot be downloaded, so the number of downloads should be limited.
-k, --limit : limits the number of hits and downloads
<int>
refers to an integer. Hence, the number of files to be downloaded should be represented.
getpapers -q <query> -k <int> -o <path> -x -p
-o, --outdir : output directory(required - will be created if not found).
This command gives the path to the directory created in the system for the downloaded files.
-p, --pdf : downloads fulltext PDFs if available.
-x, --xml : downloads fulltext XMLs if available.
Thus, for the query Human genome project the syntax
getpapers -q “human genome project ” -k 100 -o covid -x -hgp
gives the result as follows.
Expected 100 .xml
files were downloaded.
But only 84 .pdf
files were downloaded.
Name: Ambreen Hamadani
Operating System: Windows 10
Preinstalled on the System
Source of Instruction: ContentMine / getpapers
Steps in the Installation:
- Open Comand Prompt
- Run the command
https://github.com/ContentMine/getpapers
Installation: Successful
Test of the Installation:
- Type
getpapers
in Command Prompt - Usage and options displayed
The tool was used to retrieve 100 papers on the topic, 'masks' with the output directory specified as 'test1'
Command used:
getpapers --query 'masks ' --limit 100 --outdir test1
Results
- A new directory (test1) created within the home directory
- 100 folders (PMC###) created within 'test1' each containing a JSON file (eupmc_result)
- 1 text file (eupmc_fulltext_html_urls) containing the URLs of all downloaded documents
- 1 JSON file (eupmc_results) created **Command line output **
- 0 error messages
- No warnings
The tool was used to retrieve 200 papers on the topic, 'viral epidemics' with the output directory specified as 'test3'
Command used:
getpapers --query 'viral epidemics' --limit 200 --outdir test3
Results
- A new directory (test3) created within the home directory
- 200 folders (PMC###) created within 'test3' each containing a JSON file (eupmc_result)
- 1 text file (eupmc_fulltext_html_urls) containing the URLs of all downloaded documents
- 1 JSON file (eupmc_results) created **Command line output **
- 0 error messages
- 2 warnings (
warn: This version of getpapers wasn't built with this version of the EuPMC api in mind; warn: getpapers EuPMCVersion: 5.3.2 vs. 6.3 reported by api)
-
Installation of
Node.Js
Reference :https://github.com/petermr/tigr2ess/blob/master/installation/INSTALLATION.md -
Installation of
getpapers
= Reference :https://github.com/petermr/tigr2ess/blob/master/installation/INSTALLATION.md
Successful
I.Type getpapers
in Command Prompt
1.** Downloaded 100 papers on the topic, 'COVID-19' (PDF Files)**
getpapers -q "COVID-19" -p -k 100 -o covid_19
Successfully downloaded 100 papers with 1(.json file) and 1(.txt file)
- 0 error messages
- 2 Warnings
RESULTS: https://drive.google.com/file/d/1rKgNGojNacMPLeViSPykpXgGsJg0zFUk/view?usp=sharing
**Downloaded 100 (.xml) files on 'COVID deaths' with the directory cdeaths
**
getpapers -q "COVID deaths" -o cdeaths -x -k 100
with 1 (.json file) and 1 (.txt file)
- 0 error messages
- 2 Warnings
Reference:https://github.com/petermr/tigr2ess/blob/master/getpapers/TUTORIAL.md
Vanisha Arora
Windows 10
Source of instructions: https://github.com/petermr/tigr2ess/blob/master/installation/INSTALLATION.md
https://github.com/ContentMine/blob/master/README.md
Successful
Put the command getpapers
--version in the command prompt.
Getting 0.4.17 confirms installation.
getpapers -q "query" -n -k 50 (If 50 articles are to be downloaded)
For eg: for the query of viral epidemics
Use as
getpapers -q "viral epidemics" -n -k 50
-p, --pdf : (For downloading pdfs) -x, --xml : (For downloading .xml)
Thus, for the query viral epidemics the syntax
getpapers -q viral epidemics -k 50 -o viral epidemics -x -p
Downloaded 50 (pdf and xml files )with viral epidemics under the directory viral epidemics
SANA SAIFI
WINDOWS 10
SOURCE:https://github.com/petermr/tigr2ess/blob/master/installation/INSTALLATION.md
A. Scroll and go on section Software Installation. And click on the appropriate link, depending on your Operating system.
B. Go to download page (https://github.com/coreybutler/nvm-windows/releases) & download latest version of nvm-setup.zip
.
c. Run the file and install it in your windows.
SOURCE: https://github.com/petermr/tigr2ess/blob/master/installation/windows/INSTALLATION.md
A. Run the command getpapers
in the command prompt.
B. Various usage options are displayed with their meanings.
Successful
4 warnings 2 errors
- Why we are using?
To search query on a specified task and download n numbers of research paper from an open source.
- How to use?
To download 100 pdfs/ .xml files on viral epidemics,
open the command prompt and
type the syntax getpapers -q viral epidemics -k 100 -o viral epidemics -x -p
Downloaded 77 files out of 100 from open source under the directory of Viral Epidemics.
Followed instructions in https://github.com/blahah/installing-node-tools#macos
Successfully installed Xcode but nvm installation showed problems.
Typed in
curl -o- https://raw.githubusercontent.com/creationix/nvm/v0.30.1/install.sh | bash
cURL command was built in so it did not show any errors, but if your mac system did not have cURL built in then you need to add cURL command first, using
sudo apt-get install curl
However it shows error of failed connection to url in previous line (Error 1 in shared folder)
Then I copy and pasted the whole content in https://raw.githubusercontent.com/creationix/nvm/v0.30.1/install.sh to terminal, without the curl command. This ensures the nvm file to be downloaded but it did not find a profile. (Error 2)
So a profile is needed prior to downloading nvm
touch ~/.bash_profile
This creates a profile for nvm and then we can copy & paste content in https://raw.githubusercontent.com/creationix/nvm/v0.30.1/install.sh to run nvm installation again.
After following the instructions on terminal, you can test nvm by
nvm --version
which should say the current version of nvm installed on your computer (Final in shared folder)
You can test if Node.js is installed after the lines by
node -v
If you have other problems installing nvm on macOS, see this discussion page
https://github.com/nvm-sh/nvm/issues/576
npm install --global getpapers
(remove the $ in front of line)
During installation, 9 warnings were shown and no errors occurred.
Getpapers is then tested with a test query in EPMC search format
getpapers --query 'viral epidemic' -k 50 --outdir test
https://drive.google.com/drive/folders/1d3PJM-bpBco0kmeyTB-kG3FTP-XDpciL?usp=sharing
Existing Error: EPMC timeout when fetching papers