This repo contains the Toxic Comment Classification project as part of my data science portfolio. [Google](https://www.technologyreview.com/s/603735/its-easy-to-slip-toxic-language-past-alphabets-toxic-comment-detector) defines a toxic comment as "a rude, disrespectful, or unreasonable comment that is likely to make you leave a discussion." The objective of this project is to identify and classify toxic comments to help online discussion become more productive and respectful. The toxicity scores are generated by a machine learning model trained using a dataset of comments from Wikipedia’s talk page edits downloaded from [Kaggle](https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge). The scores are akin to probabilities and range from 0 (non-toxic) to 1 (highly toxic). The model is deployed as a REST API using Flask. Flask is a micro web framework written in Python. It can create a REST API that allows you to send data, and receive a prediction as a response.
Examples of toxic comments in the dataset:
- "Fuck you, block me, you faggot pussy!"
- "Stupid peace of shit stop deleting my stuff asshole go die and fall in a hole go to hell!"
- "Well I dont give a fuck what you think you bitch ass motherfucker"
- "Mine dispeared, somebody wax my ass."
- "You are a raging faggot. Kill yourself."