Skip to content

luoweiqi7/MAKG

 
 

Repository files navigation

Copyright 2024 by Nanjing University of Posts and Telecommunications &S outheast University

Time: 20/7/2024 Authors: Heng Zhou & Weizhuo Li & Buye Zhang & Weiqi Luo

Mail: [email protected] & [email protected] & [email protected] & [email protected]

Description: MAKG is a mobile applications knowledge graph, which is a high-quality knowledge graph about millions of applications and provide an open data resource to the researchers from communities in Semantic Web and CyberSecurity. You can use its original resources and visit the website to enjoy its services.

1. Introduction:

In this work, we present a mobile application knowledge graph, namely MAKG, which merge comprehensive resources (e.g., application markets, encyclopedias, news) to construct a high-quality knowledge graph about millions of applications.

We present a comprehensive framework to construct a mobile application knowledge graph for CyberSecurity, in which a lightweight ontology of apps is defined and concrete steps (App Crawling, Knowledge Extraction, Knowledge Alignment) are instantiated with promising algorithms. It can obtain more structured triples and correspondences among entities from different resources. Besides, we list three use-cases about MAKG that are helpful to provide better services for security analysts and users.

2. Usage:

MAKG consists of five important resources, including Ontology (Knowledge Schema), AppMarket-Triples, Encyclopedias, AppMarket-Alignments, Extraction-Triples.

A technical report with details of these resources and related evaluations can be downloaded in the same address.

Ontology:

We design one lightweight ontology of apps. It can bring a well-defined schema of collected apps so that these apps could share more linkage with each other. It contains 26 basic classes, 11 relations and 45 properties.

We provide two files (appOntology.owl and appSchema.xlsx) for researchers to use it. For the former file, it needs to install protege to open it.

AppMarket-Triples:

These datasets contain raw triples crawlled from Huawei AppGallery, Xiaomi App Store, Google Play, App Store.

All of the files of these triples from application markets are provided in the format of .nt.

Encyclopedias:

These dataset contain the triples of apps crawled from Baidu Baike, Toutiao Baike, Wikipedia.

As the number of Wikipedia is few, we only provide the extracted triples of apps from Baidu Baike, Toutiao Baike.

AppMarket-Alignments:

These datasets contain the alignments of apps, which can share and reuse the description information of apps so as to provide better services based on MAKG for security analysts and users. We utilize two kinds of entity alignment techniques (i.e., Rule miner method, Knowledge graph embedding-based platform) to obtain the best results of them.

We present all the manually labeled alignments among four mainstream application markekts for evaluation.

In addition, we also provide the correspondences that are automatically generated by RuleMiner and KG embedding methods. (i.e., MultiKE, RDGCN, NMN).

Extraction-Triples:

These datasets contain the triples extracted from textual descriptions of apps crawled from application markets. We utilize three strategies (i.e., Infobox-based Method, Named entity recognition, Relation extraction platforms including OpenNRE, DeepKE, FewRel) and select the best models to extract basic triples.

Labeled corpus of few-shot relation extraction models:

These dataset contain the labeled corpus for training the methods for few-shot relation extraction with different ratios (9:1, 8:2, 7:3)

Labeled corpus of NER models:

These dataset contain the labeled corpus for training the models for named entity recognition with different ratios (9:1, 8:2, 7:3)

Labeled corpus of relation extraction models:

These dataset contain the labeled corpus for training the methods for relation extraction with different ratios (9:1, 8:2, 7:3)

Applications-Relevance-Discovery:

These dataset are two bilingual datasets (i.e., Chinese and English) from MAKG, denoted by MAKG-S and MAKG-S$^+$. MAKG-$S$ contains some sensitive apps because their properties are triggered at least one heuristic principle tailored for snsitivity detection. MAKG-$S^+$ is an extended one that employs the TextRank algorithm to integrate external triples by important tokens and add the external relation "relatedTo". To further balance the number and quality of these tokens by TextRank algorithm, we set the threshold of $\theta$ to 0.5.

3. Use-Cases:

We list the main use-cases of MAKG about cybersecurity in our developed WebSite.

  • MAKG can provide semantic retrievalfor users and security analysts. For example, if one user queries one app, MAKG can present more comprehensive than application markets to the user.

  • MAKG can link the apps to their appearing textual descriptions (e.g., news) with entity linking techniques. Benefited from above cases, users can fully understand the information of apps and avoid downloading some invalid apps.

  • MAKG can help security analysts to detect some sensitive apps, which own more conditions or plausibility than normal apps that become the hotbeds for related cybercriminals. With comprehensive relations and properties of apps, analysts can induce more prior rules and employ promising algorithms to evaluate the sensitivity of apps. It is able to lower the risk of some sensitive apps in advance and delay them published in the application markets.

  • MAKG can recommend some similar apps by our hybrid method for users and security analysts when they request related services, which can further reduce the potential risks and maintain the security of mobile internet.

4. Citation:

If you want to employ this dataset, please cite our paper as follows:

###Normal:

Heng Zhou, Weizhuo Li, Buye Zhang, Qiu Ji, Yiming Tan, and Chongning Na. MAKG: 
A Mobile Application Knowledge Graph for the Research of Cybersecurity. In: Proceedings of China Conference on Knowledge Graph and Semantic Computing, 
Guangzhou, China, Springer, 2021, pp. 321–328.

###BibTeX:

@inproceedings{MAKG2021, 
author = {Heng Zhou, Weizhuo Li, Buye Zhang, Qiu Ji, Yiming Tan, and Chongning Na}, 
title = {Combining Knowledge Graph Embedding and Network Embedding for 
Detecting Similar Mobile Applications}, 
booktitle = {Proceedings of China Conference on Knowledge Graph and Semantic 
Computing,Guangzhou, China}, 
pages={321--328}, 
year={2021},
publisher={Springer}
}

About

移动app知识图谱

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published