This page shares Ziqi Zhang's research datasets. Please follow the links below to find the datasets you need. All data are distributed under the Creative Commons CC-BY Licence, unless otherwise stated. Please also read the 'readme' file downloaded with each dataset. I would be grateful if you cite our work (see below) when using data shared on this site. Thanks.
NOTE: you are recommended NOT to check out the entire respository, but nagivate to specific dataset and download them there. This is because some datasets can be very large but maybe irrelevant to your research.
- Hate Speech
- Ontology Mapping
- Procedural Knowledge
- Scholarly Data Linking
- Terminology Extraction
- Webtable Entity Linking
If you use the RM dataset within this collection, please cite: Zhang, Z., Robinson, D., Tepper, J. (2018). Detecting Hate Speech on Twitter Using a Convolution-GRU Based Deep Neural Network. Proceedings of the 2018 Extended Semantic Web Conference. For other datasets included in the collection please give credits to their original distributors.
NOTE Due to a recent change in our University's research data sharing policy, we can no longer share the 'RM' dataset (refugees and muslim) described in this paper.
Description: dataset used for evaluating hate speech on Twitter.
Keywords: hate speech, Twitter, social media, abusive language, classification
Related code/project: chase
Data folder: /hate speech
If you use this dataset, please cite: Z. Zhang, A. Gentile, E. Blomqvist, I. Augenstein, F. Ciravegna. 2016. An unsupervised data driven method to discover equivalent relations in large Linked Datasets. Semantic web 8 (2), 197-223
Description: dataset used for evaluating mapping relations collected from DBpedia
Keywords: ontology mapping, ontology alignment, DBpedia
Related Wikipedia page: Ontology alignment
Related code/project: LODIE
Data folder: /ontology mapping
If you use this dataset, please cite: Z. Zhang, P. Webster, V. Uren, A. Varga, F. Ciravegna. 2012. Automatically Extracting Procedural Knowledge from Instructional Texts using Natural Language Processing. LREC 2012 (520-527), 520-527
Description: dataset containing annotated instructions that describe procedures (e.g., how to cook a recipe, how to mount snow chain on wheels etc.
Keywords: procedure, instruction, annotation, classification
Related Wikipedia page: Procedural knowledge
Data folder: /procedural knowledge
If you use this dataset, please cite: Z. Zhang, A. N. Nuzzolese, and A. L. Gentile. Entity Deduplication on ScholarlyData. In Proceedings of ESWC 2017, pp 85-100, Lecture Notes in Computer Science. Springer, 2017.
Description: dataset used for evaluating author name and organisation linking in scholarly data
Keywords: author name disambiguation, link discovery, entity linking, entity disambiguation
Related Wikipedia page: Author name disambiguation
Related code/project: scholarlydata
Data folder: /scholarly data linking
If you use this dataset, please cite: Z. Zhang, J. Gao, F. Ciravegna. 2018. SemRe-Rank: Improving Automatic Term Extraction By Incorporating Semantic Relatedness With Personalised PageRank. Accepted at ACM Transactions of Knowledge Discovery from Data
Description: dataset used for evaluating automatic term extraction/recognition.
Keywords: automatic term extraction or recognition, ATE, ATR, text mining, terminology, thesaurus, glossary, ontology engineering
Related Wikipedia page: Terminology extraction
Related code/project: SemRe-Rank
Data folder: /terminology extraction
If you use this dataset, please cite: Zhang, Z. 2017. Effective and efficient semantic table interpretation using tableminer+. Semantic Web 8 (6), 921-957
Description: dataset used for evaluating entity linking in webtables, and also table header classification and relation annotation; contains 16,000+ annotated relational tables that can be used for many studies related to webtables.
Related Wikipedia page: Entity linking
Keywords: webtable, web table, entity linking, classification, relation extraction
Related code/project: sti
Data folder: /webtable entity linking