DISCO: Comprehensive and Explainable Disinformation Detection

"DISCO" is a disinformation detection toolkit. An online demo video is available here, a preprint paper is available here.

1. Function of DISCO

Input: A batch of susceptive information
Output:
- The fake news probability and real news probability for an news article query
- Misleading degree rankings of each word of in that query article

2. Required Library

numpy 1.20.1
scipy 1.6.2
pandas 1.2.4
nltk 3.6.2
gensim 4.0.1
sklearn 0.24.1

3. Quick Start

Download the code
Download pre-trained word2vec model (here or here) and put it in the "pretrained-word2vec" folder
Run the "gui_disco.py" to get the software as shown in the demo video
[Optional]: You can train DISCO from the scratch as below
- First, you can put raw fake news data and raw real news data in "raw-dataset" folder and run "data_preprocessing.py". Then feature_matrix.pkl and label_matrix.pkl will be automatically saved in the "preprocessed-dataset" folder.
- Then, you can run "model_training.py" to obtain the inner classifier of DISCO, the inner classifier of DISCO will be automatically saved in the "trained-classifier" folder.
- Now, you get the complete DISCO and could run "gui_disco.py" to get the software as shown in the demo video.

4. Technical Logic of DISCO

Building Word Graph. We contrust an undirected word graph for each input news article. Briefly, if two words co-occur in a length-specified sliding window, then there will be an edge connecting these two words. For example, "I eat an apple" and the length of the window is 3, then edges could be {I-eat, I-an, eat-an, eat-apple, an-apple} (with stop words kept). More details of constructing a word graph can be found at TextRank.
Geometric Feature Extraction. We use the idea of the SDG to obtain node embeddings. Briefy, a node's representation is aggregated based on its personalized PageRank vector weighted neighours' features. Then we call any pooling function (like sum pooling or mean pooling) to aggregate node embeddings into the graph-level representation vector for each constructed word graph.
Neural Detection. We train a model-agnostic classification module as the inner classifier of DISCO.
Misleading Degree Analysis. With the support of SDG, we can mask any word node in the contrusted word graph and fast track the new Personalized PageRank to get the new graph-level embedding vector. Without fine-tuning the inner classifier of DISCO, we can investigate each word's contribution (positive or negative) towards the ground-truth label prediction probability.
[Optional]: You can access our additional repository for a more thorough disinformation study, such as different inner classifiers, truncated feature dimensions, label noise injection, etc.

Reference

If you use the materials from this repositiory, please refer to our paper.

@inproceedings{DBLP:conf/cikm/FuBTMH22,
  author    = {Dongqi Fu and
               Yikun Ban and
               Hanghang Tong and
               Ross Maciejewski and
               Jingrui He},
  editor    = {Mohammad Al Hasan and
               Li Xiong},
  title     = {{DISCO:} Comprehensive and Explainable Disinformation Detection},
  booktitle = {Proceedings of the 31st {ACM} International Conference on Information
               {\&} Knowledge Management, Atlanta, GA, USA, October 17-21, 2022},
  pages     = {4848--4852},
  publisher = {{ACM}},
  year      = {2022},
  url       = {https://doi.org/10.1145/3511808.3557202},
  doi       = {10.1145/3511808.3557202},
  timestamp = {Wed, 19 Oct 2022 17:09:02 +0200},
  biburl    = {https://dblp.org/rec/conf/cikm/FuBTMH22.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
preprocessed-dataset		preprocessed-dataset
pretrained-word2vec		pretrained-word2vec
raw-dataset		raw-dataset
slides		slides
trained-classifier		trained-classifier
A Fake News Article.txt		A Fake News Article.txt
A Real News Article.txt		A Real News Article.txt
README.md		README.md
data_preprocessing.py		data_preprocessing.py
gui_disco.py		gui_disco.py
model_training.py		model_training.py
software_architecture.png		software_architecture.png
user_interface.jpg		user_interface.jpg
utils.py		utils.py
vanilla_disco.py		vanilla_disco.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DISCO: Comprehensive and Explainable Disinformation Detection

1. Function of DISCO

2. Required Library

3. Quick Start

4. Technical Logic of DISCO

Reference

About

Releases

Packages

Languages

DongqiFu/DISCO

Folders and files

Latest commit

History

Repository files navigation

DISCO: Comprehensive and Explainable Disinformation Detection

1. Function of DISCO

2. Required Library

3. Quick Start

4. Technical Logic of DISCO

Reference

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages