Use NLP to extract facts from GOV.UK content. The aim is to build up an extensive number of facts about the world from GOV.UK content - as it's trusted and reliable. It'll then be possible to put it into the Knowledge Graph For example:
- "HMRC is a government department" ->
(HMRC)-[:INSTANCE_OF]->(department)
- "Council tax is a tax" ->
(Council tax)-[:INSTANCE_OF]->(tax)
- "Council tax is a tax" ->
(Council tax)-[:INSTANCE_OF]->(tax)
- "Boris Johnson became Prime Minister on 24 July 2019" ->
(Boris Johnson)-[:IS { from: "24 July 2019" }]->(Prime Minister)
There are many other potential relationships that are far more interesting and powerful. One thing at a time though
Knowledge is power
You will need the python interpreter version Python 3.7.0 or greater. You may need pyenv
to specify the python version; which you can get using pip install pyenv
.
Create a new python 3.7.0 virtual environment. For up to date python 3 installations, you can do this by running the following from the command line:
python -m venv ~/.venv/govuk-knowledge-graph
This will create a new virtual environment in the ~/.venv/govuk-knowledge-graph
. It will create the directory if it does not yet exist.
Activate your virtual environment from the command line:
. ~/.venv/govuk-knowledge-graph/bin/activate
An alternative if new to python, is to use the PyCharm community edition. Open this repo as a project and then specify what python interpreter to use.
You'll need to download the govuk-production-mirror-replica mirrors. This can only be done if you're a member of GOV.UK. It's also 25gb unzipped!
Then install required python packages:
pip install -r requirements.txt
You'll need a copy of the govuk-knowledge-graph
repo as a sibling folder to this repo.
You'll also need to install it's requirements.
It's used as a (current) source of the GovNER NER model