Fake-News-Article

Some fake articles have relatively frequent use of terms seemingly intended to inspire outrage and the present writing skill in such articles is generally considerably lesser than in standard news.
Detecting fake news articles by analyzing patterns in writing of the articles.
Made using fine tuning BERT
With an Accuarcy of 80% on the custom dataset.

Installation

All the code required to get started.

Clone

Clone this repo to your local machine using https://github.com/abhilashreddys/Fake-News-Article.git

Setup

Install these libraries/packages.

$ pip3 install pandas numpy scikit-learn bs4
$ pip3 install torch
$ pip3 install keras
$ pip3 install pytorch_pretrained_bert
$ pip3 install transformers

Dataset

Data is collected by scraping the websites of popular news publishing sources.
The collected news articles are judged using the score, quality, bias as metric collected from Politilact and Media Charts.
Some basic preprocessing is also done on the text collected from scraping websites.

Preprocessing

Used BeautifulSoup for scraping articles from the web, Beautiful Soup is a Python library designed for quick turnaround projects like screen-scraping
Also used some custom made functions for removing punctuation etc.

scraping from websites listed in politifact_data.csv

$ python3 scrape_politifact.py

scraping from websites listed in Interactive Media Bias Chart - Ad Fontes Media.csv

$ python3 scrape_media.py

Data after scraping and preprocessing politifact_text.csv , pre_media.csv

Model

Trained by fine tuning the BERT
Used BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding with fine tuning
BERT, which stands for Bidirectional Encoder Representations from Transformers.
BERT is designed to pretrain deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be finetuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering andlanguage inference, without substantial taskspecific architecture modifications.

class BertBinaryClassifier(nn.Module):
    def __init__(self, dropout=0.1):
        super(BertBinaryClassifier, self).__init__()
        self.bert = BertModel.from_pretrained('bert-base-uncased')
        self.dropout = nn.Dropout(dropout)
        self.linear = nn.Linear(768, 1)
        self.sigmoid = nn.Sigmoid()
    
    def forward(self, tokens, masks=None):
        _, pooled_output = self.bert(tokens, attention_mask=masks, output_all_encoded_layers=False)
        dropout_output = self.dropout(pooled_output)
        linear_output = self.linear(dropout_output)
        proba = self.sigmoid(linear_output)
        return proba

Weights

Download here : Link

Inference

Run inference.py and mention url of the article you want to test in comand line

$ python3 inference.py url

Cautions & Suggestions

Check the file locations properly, change it if required.
If you face any problems with script files use notebooks transfrom_spam.ipynb for training and fake_article.ipynb for inference.
Trained only for 5 Epochs, trying to use a better model with more data.

References

For data Politilact and Media Charts
Keras: The Python Deep Learning library
A library of state-of-the-art pretrained models for Natural Language Processing
Pytorch Deep Learning framework
Pytorch BERT usage example
Attention Is All You Need
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

@article{Wolf2019HuggingFacesTS,
  title={HuggingFace's Transformers: State-of-the-art Natural Language Processing},
  author={Thomas Wolf and Lysandre Debut and Victor Sanh and Julien Chaumond and Clement Delangue and Anthony Moi and Pierric Cistac and Tim Rault and R'emi Louf and Morgan Funtowicz and Jamie Brew},
  journal={ArXiv},
  year={2019},
  volume={abs/1910.03771}
}

@article{devlin2018bert,
  title={BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding},
  author={Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina},
  journal={arXiv preprint arXiv:1810.04805},
  year={2018}
}

Other Implementaions

Triple Branch BERT Siamese Network for fake news classification on LIAR-PLUS dataset
Fake News Detection by Learning Convolution Filters through Contextualized Attention
Based on Click-Baits
Fake News Web
Fake News Pipeline Project, Explained article here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Fake-News-Article

Installation

Clone

Setup

Dataset

Preprocessing

Model

Weights

Inference

Cautions & Suggestions

References

Other Implementaions

Files

README.md

Latest commit

History

README.md

File metadata and controls

Fake-News-Article

Installation

Clone

Setup

Dataset

Preprocessing

Model

Weights

Inference

Cautions & Suggestions

References

Other Implementaions