Skip to content

Latest commit

 

History

History
144 lines (112 loc) · 7.02 KB

README.md

File metadata and controls

144 lines (112 loc) · 7.02 KB

REL: Radboud Entity Linker

API status build

REL is a modular Entity Linking package that is provided as a Python package as well as a web API. REL has various meanings - one might first notice that it stands for relation, which is a suiting name for the problems that can be tackled with this package. Additionally, in Dutch a 'rel' means a disturbance of the public order, which is exactly what we aim to achieve with the release of this package.

REL utilizes English Wikipedia as a knowledge base and can be used for the following tasks:

  • Entity linking (EL): Given a text, the system outputs a list of mention-entity pairs, where each mention is a n-gram from text and each entity is an entity in the knowledge base.
  • Entity Disambiguation (ED): Given a text and a list of mentions, the system assigns an entity (or NIL) to each mention.

Setup API

This section elaborates on how a user may utilize our API. Steps include obtaining an API key and querying our API.

Obtaining a key

At the moment we do not require obtaining a key; please continue to the next step.

Querying our API

Users may access our API by using the example script below. For EL, the spans field is not required. For ED, however, the spans field should consist of a list of tuples, where each tuple refers to the start position and length of a mention.

import requests

IP_ADDRESS = "https://rel.cs.ru.nl/api"
text_doc = "If you're going to try, go all the way - Charles Bukowski"

# Example EL.
document = {
    "text": text_doc,
}

# Example ED.
document = {
    "text": text_doc,
    "spans": [(41, 16)]
}


API_result = requests.post("{}".format(IP_ADDRESS), json=document).json()

Setup package

This section describes how to deploy REL on a local machine and setup the API. If you want to do anything more than simply running our API locally, you can skip the Docker steps and continue with installation from source.

Installation using Docker

Prebuilt images

To use our prebuilt default images, run:

# Pull the image for Wikipedia 2014:
docker pull informagi/rel:2014
# Or Wikipedia 2019:
docker pull informagi/rel:2019

To run the API locally:

# Map container port 5555 to local port 5555, and use Wikipedia 2019
docker run -p 5555:5555 --rm -it informagi/rel:2019
# Or automatically generate port mapping
docker run -P --rm -it informagi/rel:2019

Now you can make requests to http://localhost:5555 (or another port if you use a different mapping) in the format described in the example above.

Build your own

To build the Docker image yourself, run:

# Clone the repository
git clone https://github.com/informagi/REL && cd REL
# Build the Docker image
docker build - -t informagi/rel < Dockerfile

The build process will automatically download all necessary files. Wikipedia version 2019 is used by default - to specify the Wikipedia version (either 2019 or 2014), pass e.g. --build-arg WIKI_YEAR=2014 to the docker build command:

docker build - -t informagi/rel --build-arg WIKI_YEAR=2014 < Dockerfile

To run the API locally, use the same commands as mentioned in the previous section.

Installation from source

Run the following command in a terminal to install REL:

pip install git+https://github.com/informagi/REL

You will also need to manually download the files described in the next section.

Download

The files used for this project can be divided into three categories. The first is a generic set of documents and embeddings that was used throughout the project. This folder includes the GloVe embeddings used by Le et al. and the unprocessed datasets that were used to train the ED model. The second and third category are Wikipedia corpus related files, which in our case either originate from a 2014 or 2019 corpus. Alternatively, users may use their own corpus, for which we refer to the tutorials.

Download generic files

Download Wikipedia corpus (2014)

Download ED model 2014

Download Wikipedia corpus (2019)

Download ED model 2019

Tutorials

To promote usage of this package we developed various tutorials. If you simply want to use our API, then we refer to the section above. If you feel one is missing or unclear, then please create an issue, which is much appreciated :)! The first two tutorials are for users who simply want to use our package for EL/ED and will be using the data files that we provide. The remainder of the tutorials are optional and for users who wish to e.g. train their own Embeddings.

  1. How to get started (project folder and structure).
  2. End-to-End Entity Linking.
  3. Evaluate on GERBIL.
  4. Deploy REL for a new Wikipedia corpus:
    1. Extracting a new Wikipedia corpus and creating a p(e|m) index.
    2. Training your own Embeddings.
    3. Generating training, validation and test files.
    4. Training your own Entity Disambiguation model.
  5. Reproducing our results
  6. REL as systemd service
  7. Notes on using custom models

Cite

If you are using REL, please cite the following paper:

@inproceedings{vanHulst:2020:REL,
 author =    {van Hulst, Johannes M. and Hasibi, Faegheh and Dercksen, Koen and Balog, Krisztian and de Vries, Arjen P.},
 title =     {REL: An Entity Linker Standing on the Shoulders of Giants},
 booktitle = {Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval},
 series =    {SIGIR '20},
 year =      {2020},
 publisher = {ACM}
}

Contact

Please email your questions or comments to Mick van Hulst

Acknowledgements

Our thanks go out to the authors that open-sourced their code, enabling us to create this package that can hopefully be of service to many.