Skip to content

panjek26/sentiment-analys

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sentiment Analysis using Naive Bayes

Introduction

A dataset of sample tweets is taken from the NLTK library and used to create a sentiment analysis model. The model is built using a Naive Bayes Classifier trained on a dataset of positive and negative tweets after preprocessing. The model takes a list of text tokens (that make up a comment) as input and predicts whether the corresponding comment is positive or negative.

Here is an screenshot of the app:

Run the app

The main.py file is a Streamlit app and is deployed to Streamlit Share. Visit the following link to run the app and test it:

Naive Bayes Theorem

For Example We Have dataset 10 Good Comments & 5 Negative Comments First, let's look at all the words in the good comments and negative comments and the number of occurrences as in the example below:

Good Comment
“Dear” : 20 times
“Precious” : 15 times
“Donation” : 1 times
“Love” : 15 times
“Hangout” : 3 times

Negative Comment
“Dear” : 2 times
“Bad” : 2 times
“Donation” : 15 times
“Worst” : 1 times
“Hangout”: 0 times

From the dataset above, let's say likelihood (the probability in discrete data is called likelihood) occurrence of each word as follows:

Good Comment
p(Dear|Normal) : 20/54 = 0.37
p(Precious|Normal) : 15/54 = 0.277
p(Donation|Normal) : 1/54 = 0.0185
p(Love|Normal) : 15/54 = 0.277
p(Hangout|Normal) : 3/54 = 0.055

Negative Comment
p(Dear|Negative): 2/20 = 0.1
p(Bad|Negative): 2/20 =0.1
p(Donation|Negative): 15/20 = 0.75
p(Worst|Negative): 1/20 = 0.05
p(Hangout|Negative) : 0/20 = 0

We already have the likelihood for the occurrence of each word in the Good and Negative Comments. Then we also have a chance in general Comment appears as Good or Negative, we call the prior probability.

p(G) = (summary good comment)/(summary good comment + summary negative comment)
p(G) = 10/(10+5) = 0.667

p(N) = 5/(5+10) = 0.333

Case Study

Now suppose we have a comment “Dear Bad Donation”. We will determine whether this comment is good or not.

by multiplying the prior probability of a good comment with the likelihood of the words “dear” “bad” and “donation” in the good comment as below.

p(G) x p(Dear|Good) x p(Bad|Good) x p(Donation|Good) = 0.667 x 0 x 0.277 x 0.0185 = 0

We count also for Negative Comment

p(N) x p(Dear|Negative) x p(Bad|Negative) x p(Donation|Negative) = 0.333 x 0.1 x 0.1 x 0.75 = 24.975

So, we can say “Dear Bad Donation” is Negative Comment because p(G|”Dear Bad Donation”) = 0 < 24.975 = p(N|”Dear Bad Donation”)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages