Using the Semantic Web to Understand Persons’ Networks extracted from Text

This project is the result of a work describing a methodology to interpret large persons’ networks extracted from text by classifying cliques using the DBpedia ontology. The approach relies on a combination of NLP, Semantic web technologies and network analysis. The classification methodology that first starts from single nodes and then generalises to cliques is effective in terms of performance and is able to deal also with nodes that are not linked to Wikipedia. The gold standard manually developed for evaluation shows that groups of co-occurring entities share in most of the cases a category that can be automatically assigned. This holds for both languages considered in this study. The outcome of this work may be of interest to enhance the readability of large networks and to provide an additional semantic layer on top of cliques. Furthermore, it represents an unsupervised approach to automatically extend DBpedia starting from a corpus.

Datasets

Cliques gold standard (English)
Cliques gold standard (Italian)
Dataset containing the original Nixon and Kennedy speech transcriptions (released under the NARA public domain license) along with the linguistic annotations applied in the pre-processing step (in NAF format)

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
src/main		src/main
.gitignore		.gitignore
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Using the Semantic Web to Understand Persons’ Networks extracted from Text

Datasets

About

Releases

Packages

Languages

dkmfbk/cliques

Folders and files

Latest commit

History

Repository files navigation

Using the Semantic Web to Understand Persons’ Networks extracted from Text

Datasets

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages