The assignments I finished and the material used in USC DSCI-558 Building Knowledge Graphs course, which brought by USC Knowledge Graph Center.
Subject | Library | Technique | Description | |
---|---|---|---|---|
1 | Web Scraping | Scrapy | Using Scrapy, crawl 10k pages from IMDB, extract attributes from each page, store the outcome into Json-lines files. | |
2 | Information Extraction | spaCy | NLP | Using spaCy, form actor's biography text, for each attribute, build one Lexical extractor and one Syntactic extractor. |
3 | Entity Resolution, Blocking & Knowledge Representation | The Record Linkage ToolKit (RLTK), RDFLib | Given two datasets of IMDB and AFI, and a dev dataset. Match records from these 2 datasets (record linkage). Use Blocking to reduce the number of pairs need to compare. Design a model in RDF Schema, store the result in a turtle using the designed model. |
|
4 | RDF query | Apache Jena | SPARQL, WikiData Query | Write SPARQL queries to solve several intricate requests. |
5 | IE - Revisit Weak Supervision and Distant Supervision |
Snorkel | Weak Supervision, Distant Supervision | Hand label a small set of dev data, write label functions using Snorkel, combined with distance supervision, label training set. Output a Generative Model. |
6 | PSL and OWL | PSL, Protégé | Probabilistic Soft Logic, OWL | Write the PSL model to link the same paper. Using Protege to build an OWL ontology and try some reasoning. |
7 | Tabular Data & Knowledge Graph Embedding | AmpliGraph | RDF Data Cube, KG Embedding |
The W3C documents are hard to read: poorly typography and impossible to mark up. KG is a rapid-growing domain, lacks well-written documents. I put my organized W3C documents and other useful materials here.