The entity linking task aims at identifying all the small text fragments in a document referring to an entity contained in a given knowledge base, e.g., Wikipedia. The annotation is usually organized in three tasks. Given an input document the first task consists in discovering the fragments that could refer to an entity. Since a mention could refer to multiple entities, it is necessary to perform a disambiguation step, where the correct entity is selected among the candidates. Finally, discovered entities are ranked by some measure of relevance. Many entity linking algorithms have been proposed, but unfortunately only a few authors have released the source code or some APIs. As a result, evaluating today the performance of a method on a single subtask, or comparing different techniques is difficult.
For these reasons we implemented Dexter, a framework that implements some popular algorithms and provides all the tools needed to develop any entity linking technique. We believe that a shared framework is fundamental to perform fair comparisons and improve the state of the art.
For more information about the team and the framework please refer to the website.
A simple demo of the system is running at this address. The tagger used in the demo is our implemented version of TAGME, please note that some annotations could be different since the two frameworks use different Wikipedia dumps and different methods for extracting the spots.
We are currently working on improving the quality of the code, that we plan to publicly release (under Apache License V2) in the mid of September (but if you can't wait send us an email and we will give you access to our internal repo).
In the meanwhile, you can download the binary jar containing all the resources for running Dexter.