ackextract

Acknowledgement and its name entities extraction from scholarly papers.

Catalog

Covid-19Ground_truth: 100 covid-19 papers,each one has a JSON file (full paper in JSON format), a txt file(ackowledgment section) and an ann file (entities annotation), some also has a pdf file(full paper in PDF format).

SBSpaper200Ground_truth: 100 SBS(Social Behavior Science) papers,each one has a PDF file (full paper in PDF format), a txt file(ackowledgment section), a tei file(full paper in XML format generated by GROBID from PDF file) and an ann file (entities annotation), some also has a pdf file.

Testdata: testing data for evaluate NER tools, sentence segmentation tools and sentence classification tools.

Result: entities extracted from all covid-19 papers and some advanced analysis, all are csv files.

Code:

ackextract_stanfordNLP_pysbd_JSON.py stanfoldNLP NER and Pragmatic Segmenter on JSON files.

ackextract_stanfordNLP_pysbd_XML.py stanfoldNLP NER and Pragmatic Segmenter on XML files.

ackextract_stanza_JSON.py Stanza NER and Stanza sentence segmenter on JSON files.

ackextract_stanza_XML.py Stanza NER and Stanza sentence segmenter on XML files.

ackextract_stanza_relationfilter_XML.py Stanza NER with word relation based filter and Stanza sentence segmenter on XML files.

ackextract_stanza_relationfilter_JSON.pyStanza NER with word relation based filter and Stanza sentence segmenter on JSON files.

Operating System:

This code works on Windows and Linux so far, it should work on Mac as well

Installation

Dependencies: Make sure that both Python 3.0+ and Java 1.8+ are installed on your system.

Requirement:

Install StanfordNLP(https://github.com/stanfordnlp),

Install Pragmatic Segmenter(https://pypi.org/project/pysbd/#files,)

Install Grobid(https://github.com/kermitt2/grobid/releases),

Grobid 0.5.5 works better so far. go to grobid directory like C:\downloads\grobid-0.5.5\grobid-0.5.5 and run

gradlew clean install

Install Stanza(https://stanfordnlp.github.io/stanza/installation_usage.html)

Preparation:

Before importing stanfordcorenlp, make sure running stanfordnlp server at first:

go to stanfordnlp directory

and run

java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000

go to grobid directory

and run

gradlew run

Functions

Check the code and comments for more details

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
__pycache__		__pycache__
code		code
csvfiles		csvfiles
testdatasets		testdatasets
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ackextract

Catalog

Operating System:

Installation

Dependencies: Make sure that both Python 3.0+ and Java 1.8+ are installed on your system.

Requirement:

Install StanfordNLP(https://github.com/stanfordnlp),

Install Pragmatic Segmenter(https://pypi.org/project/pysbd/#files,)

Install Grobid(https://github.com/kermitt2/grobid/releases),

Install Stanza(https://stanfordnlp.github.io/stanza/installation_usage.html)

Preparation:

go to stanfordnlp directory

go to grobid directory

Functions

About

Releases

Packages

Contributors 2

Languages

License

lamps-lab/ackextract

Folders and files

Latest commit

History

Repository files navigation

ackextract

Catalog

Operating System:

Installation

Dependencies: Make sure that both Python 3.0+ and Java 1.8+ are installed on your system.

Requirement:

Install StanfordNLP(https://github.com/stanfordnlp),

Install Pragmatic Segmenter(https://pypi.org/project/pysbd/#files,)

Install Grobid(https://github.com/kermitt2/grobid/releases),

Install Stanza(https://stanfordnlp.github.io/stanza/installation_usage.html)

Preparation:

go to stanfordnlp directory

go to grobid directory

Functions

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages