sci-paper-miner

Generate datasets of academic research papers (including full-text) based on Core[1] research paper text mines.

Setup

Run pip install -r requirements.txt
Obtain an api key here for Core.

Usage

Set the target query in crawl_core.py. This currently can include the repository number (set to arXiv by default), range of years and topics (arXiv specific).
Default is set for all CS papers from arXiv between 2006-2018
Run python crawl_core.py <your-api-key>

Future

Extract citation information (currently not extracted for arXiv, contacted CORE and apparently citation extraction is limited today due to computational constraints but will be completed over the next few months)

[1] Knoth, P. and Zdrahal, Z. (2012) CORE: Three Access Levels to Underpin Open Access, D-Lib Magazine, 18, 11/12, Corporation for National Research Initiatives.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md
configs.py		configs.py
core_data_wrapper.py		core_data_wrapper.py
core_requestor.py		core_requestor.py
crawl_core.py		crawl_core.py
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

sci-paper-miner

Setup

Usage

Future

About

Releases

Packages

Contributors 2

Languages

ronentk/sci-paper-miner

Folders and files

Latest commit

History

Repository files navigation

sci-paper-miner

Setup

Usage

Future

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages