Skip to content

Kotwic4/fake-news-detector

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

fake-news-detector

Combating fake news and detecting false informations is extremely crucial, because of huge manipulation possibilities. It should not be surprising then that this area is the subject of research for many scientists.

Goal of this project was to invent a binary classificator, which task was to indicate if information is false or not. Fake news can be detected based on article content or its social context. In this project we have focused on prediction from news content. Our solution was based on context word embeddings from one of the best language model currently available - Flair.

Presented mechanism was evaluated on a dataset called FakeNewsNet, and the result was compared to many other results.FakeNewsNet contains set of real and fake news from politic and gossip area, however during this project we focused only on political news from PolitiFact.

Project created by Radomir Krawczykiewicz and Grzegorz Wątor

Results

We have achived very good results on PolitiFact from FakeNewsNet:

Model Accuracy Precision Recall F1
Flair Title 0.816 0.761 0.805 0.782
Flair Url 0.804 0.733 0.860 0.791
Flair Mix 0.759 0.648 0.908 0.756
AutoKeras Mix 0.863 0.811 0.918 0.861

Documentation

One can get more information from a short presentation(in english) or full documenation(in polish)

Installation

One needs to have installed Python on one's computer, then one can just install dependencies using pip:

pip install -r requirements.txt

Starting

jupyter notebook

Dataset

We have used FakeNewsNet as our dataset.

To evalute on dataset one needs to download it firsts.

For full data one needs to use [script]https://github.com/KaiDMML/FakeNewsNet/blob/master/code/main.py) provided by FakeNewsNet.

If one wants to use just CSV data one download them from github, seperate for fake and real

For proper working of jupyter noteboks(without chaning paths in them) user needs to put:

  • CSV files into fakenewsnet_dataset/dataset folder
  • Full dataset into fakenewsnet_dataset/politifact with separation for fake and real folders

Colab notebook have cell, which is downloading CSV file from github.

Database

  1. Start the PostgreSQL database instance. Depending on the needs, it can be run locally or on a remote server (under this project the database was running instance on AWS).
  2. Execute the SQL commands from the table_create.sql file. Three tables should be created - for articles, users and tweets.
  3. Open data_extractor.py
  4. Set the PATH_TO_DATA_DIRECTORY variable to the folder pointing to the data from the politifact portal (the path should point exactly to the politifact folder!). A guide on how to download the data can be found in another chapter of this manual.
  5. Set the CONN variable according to the information about access to the database you are running on (set host, dbname, user and password)
  6. Run script
  7. As a result of the script, the data should be properly uploaded to the database.

References

Papers

Libraries

About

Binary classifier for fake news

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published