This is the repo of paper "Detect Camouflaged Spam Content via StoneSkipping: Graph and Text Joint Embedding for Chinese Character Variation Representation" which accepted by EMNLP 2019. http://arxiv.org/abs/1908.11561. Since the dataset and code involved in this paper belong to Alibaba Group, due to data privacy, this repo only publishes the CCHIN(Chinese Character Heterogeneous Information Network) and Taobao review spam dataset.
@inproceedings{jiang2019detect,
title={Detect Camouflaged Spam Content via StoneSkipping: Graph and Text Joint Embedding for Chinese Character Variation Representation},
author={Jiang, Zhuoren and Gao, Zhe and He, Guoxiu and Kang, Yangyang and Sun, Changlong and Zhang, Qiong and Si, Luo and Liu, Xiaozhong},
booktitle={Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP2019)},
year={2019},
organization={ACM}
}