GitHub - amrohendawi/spacy_wikipedia_kg_generator: This tool scrapes wikipedia for a target article, then uses spacy to generate entity-relation KG

Generating entity-relation KG graphs from wikipedia articles

This repository provides a tool for scraping wikipedia for any topic and generating a knowledge graph from the scraped articles.

Adaptation to new Neuralcoref version

The new Neuralcoref from explosion.ai uses the state-of-the-art clustering algorithm MentionRank to cluster mentions in a document. This algorithm is much more accurate than the previous one, but it is also much slower. The new version of Neuralcoref is not compatible with the old version, so the code in this repository has been adapted to the new version.

However, the end results are not the same as before. The new version of Neuralcoref delivers low-quality clusters, so the results are not as good as before. This is a known issue, and the developers are working on it.

Installation

Requirements

Preparation

1. Conda environment

We recommend using conda. Create a new environment from the environment.yml file in the root of this repository:

conda env create -f environment.yml

Then, activate the environment:

conda activate spacy_pos_kg

2. Virtualenv environment

Alternatively, you can use virtualenv. Create a new environment from the requirements.txt file in the root of this repository:

virtualenv -p python3.7 venv
source venv/bin/activate
pip install -r requirements.txt

Running the code

Below is an example of how to run the code. The code will scrape wikipedia for the text query 2008 recession, generate a knowledge graph and plot it.

python demo.py --target "2008 recession" --sub-graph-target "The federal reserve"

Output:

The graph was generated using the en_core_web_lg model from spaCy and plotted with networkx and matplotlib.

The sub-graph was generated for the entity "The federal reserve"

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
experimental		experimental
images		images
.gitignore		.gitignore
EntityExtractor.py		EntityExtractor.py
LICENSE		LICENSE
README.md		README.md
demo.py		demo.py
environment.yml		environment.yml
kg_visualization.py		kg_visualization.py
requirements.txt		requirements.txt
web_scraper.py		web_scraper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Generating entity-relation KG graphs from wikipedia articles

Adaptation to new Neuralcoref version

Installation

Requirements

Preparation

1. Conda environment

2. Virtualenv environment

Running the code

About

Releases

Packages

Languages

License

amrohendawi/spacy_wikipedia_kg_generator

Folders and files

Latest commit

History

Repository files navigation

Generating entity-relation KG graphs from wikipedia articles

Adaptation to new Neuralcoref version

Installation

Requirements

Preparation

1. Conda environment

2. Virtualenv environment

Running the code

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages