A collection of knowledge graph construction resources. [Last update: Jan 2020]
- From Information to Knowledge: Harvesting Entities and Relationships from Web Sources (Weikum et al, 2010) [paper]
- Advances in Automated Knowledge Base Construction (Suchanek et al, 2012) [paper]
- TAC-Knowledge Base Population challenge (Ji et al) [2019] [2017] [2016] [2015]
- A Survey on Open Information Extraction (Niklaus el al 2018) [paper]
Triples are collected by domain experts.
- CYC: A Large-scale Investment in Knowledge Infrastructure [paper]
- Brief introduction: A universal schema of roughly 105 general concepts spanning human reality.
- Authors: Douglas B. Lenat
- Venue: Communications of the ACM, 1995
- WordNet: A Lexical Database for English [paper]
- Brief introduction: WordNet is an online lexical database under program control.
- Authors: GA Miller (Princeton University)
- Venue: Communications of the ACM, 1995
- The Unified Medical Language System (UMLS): integrating biomedical terminology [paper]
- Brief introduction: A biomedical vocabularies developed by the US National Library of Medicine. The UMLS integrates over 900000 concepts, as well as 12 million relations among these concepts.
- Authors: Olivier Bodenreider (Lister Hill National Center for Biomedical Communications)
- Venue: Nucleic acids research, 2004
Triples are collected by volunteers.
- Wikidata: a free collaborative knowledgebase [paper]
- Wikidata is a collaborative knowledge base, collecting structured data to provide support for Wikipedia, Wikimedia Commons.
- Authors: DENNY VRANDECˇIC´ and MARKUS KRÖTZSCH
- Venue: Communications of the ACM, 2014
- Freebase: a collaboratively created graph database for structuring human knowledge [paper]
- Brief introduction: Freebase is a tuple knowledge base used to structure general human knowledge, which is collaboratively created, structured, and maintained.
- Authors: Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, Jamie Taylor (Metaweb Technologies, Inc)
- Venue: SIGMOD'08
Triples are collected from the semi-structured data source via some rule based methods.
- YAGO: A Core of Semantic Knowledge [paper]
- Brief introduction: Triples are automatically extracted from Wikipedia and unified with WordNet, using a combination of rule-based and heuristic methods.
- Authors: Fabian M. Suchanek, Gjergji Kasneci, Gerhard Weikum (Max-Planck-Institut)
- Venue: WWW'07
- YAGO2: A spatially and temporally enhanced knowledge base from Wikipedia [paper]
- Brief introduction: An extension of the YAGO knowledge base, in which triples are anchored in both time and space. YAGO2 is built automatically from Wikipedia, GeoNames, and WordNet.
- Authors: Johannes Hoffart, Fabian M. Suchanek, Klaus Berberich and Gerhard Weikum (Max-Planck-Institut)
- Venue: Artificial Intelligence, 2013
- DBpedia: A Nucleus for a Web of Open Data [paper]
- Brief introduction: Extract triples from Wikipedia encyclopedia based on a templated pattern matching method.
- Authors: S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. Ives (University of Pennsylvania & Universit¨at Leipzig)
- Venue: The Semantic Web'07
- CERES: Distantly Supervised Relation Extraction from the Semi-Structured Web [paper]
- Brief introduction: Propose an automatic knowledge extraction framework that improves the distant supervision assumption for triples extraction.
- Authors: Colin Lockard, Xin Luna Dong, Arash Einolghozati and Arash Einolghozati
- Venue: VLDB'18
Triples are extracted from unstructured data via data-driven techniques
- NELL: Toward an Architecture for Never-Ending Language Learning [paper]
- Brief introduction: Continuously extract extract new knowledge from the Web through self-learning on a small number of samples.
- Authors: Andrew Carlson, Justin Betteridge, Bryan Kisiel, Burr Settles, Estevam R. Hruschka Jr., and Tom M. Mitchell (CMU)
- Venue: AAAI'10
- PROSPERA: Scalable knowledge harvesting with high precision and high recall [paper]
- Brief introduction: Reconcile precision, recall and scalability by extended n-gram patten matching.
- Authors: Ndapandula Nakashole, Martin Theobald, Gerhard Weikum (Max Planck Institute)
- Venue: WSDM'11
- DeepDive/Elementary: Large-scale knowledge-base construction via machine learning and statistical inference [paper]
- Brief introductions: Propose a Markov logic-based model and architecture for knowledge base construction (KBC) by integrating different kinds of data resources and KBC techniques.
- Authors: Feng Niu, Ce Zhang, Christopher Ré, and Jude Shavlik (University of Wisconsin-Madison, Stanford University)
- Venue: IJSWIS'12
- Knowledge Vault: A Web-scale Approach to Probabilistic Knowledge Fusion [paper]
- Brief introduction: Build Knowledge Vault, a Web-scale probabilistic knowledge base that combines extractions from Web content with prior knowledge derived from existing knowledge repositories based on distant supervision method.
- Authors: Xin Luna Dong et al (Google)
- Venue: KDD'14
- Sealing Pipeline Leaks and Understanding Chinese [paper]
- Brief introudction: Propose a combinational system consists of several ruled-based relation extractors and a distantly supervised extractor.
- Authors: Yuhao Zhang, Arun Chaganty, Ashwin Paranjape, Danqi Chen, Jason Bolton, Peng Qi, Christopher D. Manning (Stanford University)
- Venue: TAC'16
- CoType: Joint Extraction of Typed Entities and Relations with Knowledge Bases [paper]
- Brief introduction: Joint extraction of typed entities and relations with labeled data obtained from knowledge bases with distant supervision.
- Authors: Xiang Ren, Zeqiu Wu, Wenqi He, Meng Qu, Clare R. Voss, Heng Ji, Tarek F. Abdelzaher, Jiawei Han (UIUC & Army Research Laboratory)
- Venue: WWW'17
- Discovering Implicit Knowledge with Unary Relations [paper]
- Brief introduction: Extract the implicit relation in text through coverting binary relations to unary relations.
- Authors: Michael Glass, Alfio Gliozzo (IBM Research)
- Venue: ACL'18
- Open Information Extraction from the Web [paper]
- Brief introduction: First paper for open information extraction with a rule based method.
- Authors: Michele Banko, Michael J Cafarella, Stephen Soderland, Matt Broadhead and Oren Etzioni (University of Washington)
- Venue: AAAI'07
- Identifying relations for open information extraction [paper]
- Brief introduction: Introduce syntactic and lexical constraints on binary relations expressed by verbs to reduce the uninformative and incoherent extractions.
- Authors: Anthony Fader, Stephen Soderland, and Oren Etzioni (University of Washington)
- Venue: EMNLP'11
- Open Language Learning for Information Extraction [paper]
- Brief introduction: An extention of OpenIE by adding noun, adjectives mediated relation, as well as taking context into consideration.
- Authors: Mausam, Michael Schmitz, Robert Bart, Stephen Soderland, and Oren Etzioni (University of Washington)
- Venue: EMNLP'12
- Neural Open Information Extraction [paper]
- Brief introduction: Propose a neural encoder-decoder OpenIE framework. The model is trained with highly confident binary extractions bootstrapped from a state-of-the-art Open IE system, therefore can generate highquality tuples without any hand-crafted patterns.
- Authors: Lei Cui, Furu Wei, and Ming Zhou (MSRA)
- Veune: ACL'18
- COMET: Commonsense Transformers for Automatic Knowledge Graph Construction [paper]
- Brief introduction: Commonsense knowledge graph construction by using existing tuples as a seed set of knowledge for training. Using this seed set, a pre-trained language model (ELMO) learns to adapt its learned representations to knowledge generation, and produces novel tuples.
- Authors: Antoine Bosselut, Hannah Rashkin, Maarten Sap, Chaitanya Malaviya, Asli Celikyilmaz and Yejin Choi (University of Washington)
- Venue: ACL'19
- Mining Knowledge Graphs from Text. [link]
- Jay Pujara (USC), Sameer Singh (UCI)
- WSDM'18
- Constructing Domain-specific Knowledge Graphs. [link]
- Craig Knoblock (USC), Pedro Szekely (USC), Mayank Kejriwal (USC)
- AAAI'18
- Stanford University: CS124, Dan Jurafsky
- Washington University: CSE517, Luke Zettlemoyer
- (Slide) Relation Extraction 1
- (Slide) Relation Extraction 2
- New York University: CSCI-GA.2590, Ralph Grishman
- Michigan University: Coursera, Dragomir R. Radev
- (Video) Lecture 48: Relation Extraction
- New York Times (NYT) Corpus [paper] [download]
- This dataset was generated by aligning Freebase relations with the NYT corpus, with sentences from the years 2005-2006 used as the training corpus and sentences from 2007 used as the testing corpus.
- FewRel: Few-Shot Relation Classification Dataset [paper] [Website]
- This dataset is a supervised few-shot relation classification dataset. The corpus is Wikipedia and the knowledge base used to annotate the corpus is Wikidata.
- TupleInf Open IE Dataset [Website]
- The TupleInf Open IE dataset contains Open IE tuples extracted from 263K sentences that were used by the solver in "Answering Complex Questions Using Open Information Extraction".
- DeepDive (Christopher Ré el al, Stanford University) [paper] [System]
- Open Information Extraction (Stanford University NLP) [System]
- This repo is built based on Sargur N. Srihari's slides. Many thanks!