Skip to content

songjiang0909/awesome-knowledge-graph-construction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 

Repository files navigation

Awesome Knowledge Graph Construction Awesome

A collection of knowledge graph construction resources. [Last update: Jan 2020]

Contents

Research Trends and Surveys

  • From Information to Knowledge: Harvesting Entities and Relationships from Web Sources (Weikum et al, 2010) [paper]
  • Advances in Automated Knowledge Base Construction (Suchanek et al, 2012) [paper]
  • TAC-Knowledge Base Population challenge (Ji et al) [2019] [2017] [2016] [2015]
  • A Survey on Open Information Extraction (Niklaus el al 2018) [paper]

Papers

Curated Approaches

Triples are collected by domain experts.

  • CYC: A Large-scale Investment in Knowledge Infrastructure [paper]
    • Brief introduction: A universal schema of roughly 105 general concepts spanning human reality.
    • Authors: Douglas B. Lenat
    • Venue: Communications of the ACM, 1995
  • WordNet: A Lexical Database for English [paper]
    • Brief introduction: WordNet is an online lexical database under program control.
    • Authors: GA Miller (Princeton University)
    • Venue: Communications of the ACM, 1995
  • The Unified Medical Language System (UMLS): integrating biomedical terminology [paper]
    • Brief introduction: A biomedical vocabularies developed by the US National Library of Medicine. The UMLS integrates over 900000 concepts, as well as 12 million relations among these concepts.
    • Authors: Olivier Bodenreider (Lister Hill National Center for Biomedical Communications)
    • Venue: Nucleic acids research, 2004

Collaborative Approaches

Triples are collected by volunteers.

  • Wikidata: a free collaborative knowledgebase [paper]
    • Wikidata is a collaborative knowledge base, collecting structured data to provide support for Wikipedia, Wikimedia Commons.
    • Authors: DENNY VRANDECˇIC´ and MARKUS KRÖTZSCH
    • Venue: Communications of the ACM, 2014
  • Freebase: a collaboratively created graph database for structuring human knowledge [paper]
    • Brief introduction: Freebase is a tuple knowledge base used to structure general human knowledge, which is collaboratively created, structured, and maintained.
    • Authors: Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, Jamie Taylor (Metaweb Technologies, Inc)
    • Venue: SIGMOD'08

Automated Semi-structured Approaches

Triples are collected from the semi-structured data source via some rule based methods.

  • YAGO: A Core of Semantic Knowledge [paper]
    • Brief introduction: Triples are automatically extracted from Wikipedia and unified with WordNet, using a combination of rule-based and heuristic methods.
    • Authors: Fabian M. Suchanek, Gjergji Kasneci, Gerhard Weikum (Max-Planck-Institut)
    • Venue: WWW'07
  • YAGO2: A spatially and temporally enhanced knowledge base from Wikipedia [paper]
    • Brief introduction: An extension of the YAGO knowledge base, in which triples are anchored in both time and space. YAGO2 is built automatically from Wikipedia, GeoNames, and WordNet.
    • Authors: Johannes Hoffart, Fabian M. Suchanek, Klaus Berberich and Gerhard Weikum (Max-Planck-Institut)
    • Venue: Artificial Intelligence, 2013
  • DBpedia: A Nucleus for a Web of Open Data [paper]
    • Brief introduction: Extract triples from Wikipedia encyclopedia based on a templated pattern matching method.
    • Authors: S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. Ives (University of Pennsylvania & Universit¨at Leipzig)
    • Venue: The Semantic Web'07
  • CERES: Distantly Supervised Relation Extraction from the Semi-Structured Web [paper]
    • Brief introduction: Propose an automatic knowledge extraction framework that improves the distant supervision assumption for triples extraction.
    • Authors: Colin Lockard, Xin Luna Dong, Arash Einolghozati and Arash Einolghozati
    • Venue: VLDB'18

Automated Unstructured Approaches

Triples are extracted from unstructured data via data-driven techniques

Schema-based Approaches

  • NELL: Toward an Architecture for Never-Ending Language Learning [paper]
    • Brief introduction: Continuously extract extract new knowledge from the Web through self-learning on a small number of samples.
    • Authors: Andrew Carlson, Justin Betteridge, Bryan Kisiel, Burr Settles, Estevam R. Hruschka Jr., and Tom M. Mitchell (CMU)
    • Venue: AAAI'10
  • PROSPERA: Scalable knowledge harvesting with high precision and high recall [paper]
    • Brief introduction: Reconcile precision, recall and scalability by extended n-gram patten matching.
    • Authors: Ndapandula Nakashole, Martin Theobald, Gerhard Weikum (Max Planck Institute)
    • Venue: WSDM'11
  • DeepDive/Elementary: Large-scale knowledge-base construction via machine learning and statistical inference [paper]
    • Brief introductions: Propose a Markov logic-based model and architecture for knowledge base construction (KBC) by integrating different kinds of data resources and KBC techniques.
    • Authors: Feng Niu, Ce Zhang, Christopher Ré, and Jude Shavlik (University of Wisconsin-Madison, Stanford University)
    • Venue: IJSWIS'12
  • Knowledge Vault: A Web-scale Approach to Probabilistic Knowledge Fusion [paper]
    • Brief introduction: Build Knowledge Vault, a Web-scale probabilistic knowledge base that combines extractions from Web content with prior knowledge derived from existing knowledge repositories based on distant supervision method.
    • Authors: Xin Luna Dong et al (Google)
    • Venue: KDD'14
  • Sealing Pipeline Leaks and Understanding Chinese [paper]
    • Brief introudction: Propose a combinational system consists of several ruled-based relation extractors and a distantly supervised extractor.
    • Authors: Yuhao Zhang, Arun Chaganty, Ashwin Paranjape, Danqi Chen, Jason Bolton, Peng Qi, Christopher D. Manning (Stanford University)
    • Venue: TAC'16
  • CoType: Joint Extraction of Typed Entities and Relations with Knowledge Bases [paper]
    • Brief introduction: Joint extraction of typed entities and relations with labeled data obtained from knowledge bases with distant supervision.
    • Authors: Xiang Ren, Zeqiu Wu, Wenqi He, Meng Qu, Clare R. Voss, Heng Ji, Tarek F. Abdelzaher, Jiawei Han (UIUC & Army Research Laboratory)
    • Venue: WWW'17
  • Discovering Implicit Knowledge with Unary Relations [paper]
    • Brief introduction: Extract the implicit relation in text through coverting binary relations to unary relations.
    • Authors: Michael Glass, Alfio Gliozzo (IBM Research)
    • Venue: ACL'18

Open Information Extraction

  • Open Information Extraction from the Web [paper]
    • Brief introduction: First paper for open information extraction with a rule based method.
    • Authors: Michele Banko, Michael J Cafarella, Stephen Soderland, Matt Broadhead and Oren Etzioni (University of Washington)
    • Venue: AAAI'07
  • Identifying relations for open information extraction [paper]
    • Brief introduction: Introduce syntactic and lexical constraints on binary relations expressed by verbs to reduce the uninformative and incoherent extractions.
    • Authors: Anthony Fader, Stephen Soderland, and Oren Etzioni (University of Washington)
    • Venue: EMNLP'11
  • Open Language Learning for Information Extraction [paper]
    • Brief introduction: An extention of OpenIE by adding noun, adjectives mediated relation, as well as taking context into consideration.
    • Authors: Mausam, Michael Schmitz, Robert Bart, Stephen Soderland, and Oren Etzioni (University of Washington)
    • Venue: EMNLP'12
  • Neural Open Information Extraction [paper]
    • Brief introduction: Propose a neural encoder-decoder OpenIE framework. The model is trained with highly confident binary extractions bootstrapped from a state-of-the-art Open IE system, therefore can generate highquality tuples without any hand-crafted patterns.
    • Authors: Lei Cui, Furu Wei, and Ming Zhou (MSRA)
    • Veune: ACL'18
  • COMET: Commonsense Transformers for Automatic Knowledge Graph Construction [paper]
    • Brief introduction: Commonsense knowledge graph construction by using existing tuples as a seed set of knowledge for training. Using this seed set, a pre-trained language model (ELMO) learns to adapt its learned representations to knowledge generation, and produces novel tuples.
    • Authors: Antoine Bosselut, Hannah Rashkin, Maarten Sap, Chaitanya Malaviya, Asli Celikyilmaz and Yejin Choi (University of Washington)
    • Venue: ACL'19

Lectures

Tutorials

  • Mining Knowledge Graphs from Text. [link]
    • Jay Pujara (USC), Sameer Singh (UCI)
    • WSDM'18
  • Constructing Domain-specific Knowledge Graphs. [link]
    • Craig Knoblock (USC), Pedro Szekely (USC), Mayank Kejriwal (USC)
    • AAAI'18

Videos and Slides

Datasets

  • New York Times (NYT) Corpus [paper] [download]
    • This dataset was generated by aligning Freebase relations with the NYT corpus, with sentences from the years 2005-2006 used as the training corpus and sentences from 2007 used as the testing corpus.
  • FewRel: Few-Shot Relation Classification Dataset [paper] [Website]
    • This dataset is a supervised few-shot relation classification dataset. The corpus is Wikipedia and the knowledge base used to annotate the corpus is Wikidata.
  • TupleInf Open IE Dataset [Website]
    • The TupleInf Open IE dataset contains Open IE tuples extracted from 263K sentences that were used by the solver in "Answering Complex Questions Using Open Information Extraction".

Systems and Tools

  • DeepDive (Christopher Ré el al, Stanford University) [paper] [System]
  • Open Information Extraction (Stanford University NLP) [System]

References

Back to Top

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published