A dataset of sample tweets is taken from the NLTK library and used to create a sentiment analysis model. The model is built using a Naive Bayes Classifier trained on a dataset of positive and negative tweets after preprocessing. The model takes a list of text tokens (that make up a comment) as input and predicts whether the corresponding comment is positive or negative.
Here is an screenshot of the app:
The main.py file is a Streamlit app and is deployed to Streamlit Share. Visit the following link to run the app and test it:
For Example We Have dataset 10 Good Comments & 5 Negative Comments First, let's look at all the words in the good comments and negative comments and the number of occurrences as in the example below:
Good Comment
“Dear” : 20 times
“Precious” : 15 times
“Donation” : 1 times
“Love” : 15 times
“Hangout” : 3 times
Negative Comment
“Dear” : 2 times
“Bad” : 2 times
“Donation” : 15 times
“Worst” : 1 times
“Hangout”: 0 times
From the dataset above, let's say likelihood (the probability in discrete data is called likelihood) occurrence of each word as follows:
Good Comment
p(Dear|Normal) : 20/54 = 0.37
p(Precious|Normal) : 15/54 = 0.277
p(Donation|Normal) : 1/54 = 0.0185
p(Love|Normal) : 15/54 = 0.277
p(Hangout|Normal) : 3/54 = 0.055
Negative Comment
p(Dear|Negative): 2/20 = 0.1
p(Bad|Negative): 2/20 =0.1
p(Donation|Negative): 15/20 = 0.75
p(Worst|Negative): 1/20 = 0.05
p(Hangout|Negative) : 0/20 = 0
We already have the likelihood for the occurrence of each word in the Good and Negative Comments. Then we also have a chance in general Comment appears as Good or Negative, we call the prior probability.
p(G) = (summary good comment)/(summary good comment + summary negative comment)
p(G) = 10/(10+5) = 0.667
p(N) = 5/(5+10) = 0.333
Now suppose we have a comment “Dear Bad Donation”. We will determine whether this comment is good or not.
by multiplying the prior probability of a good comment with the likelihood of the words “dear” “bad” and “donation” in the good comment as below.
p(G) x p(Dear|Good) x p(Bad|Good) x p(Donation|Good) = 0.667 x 0 x 0.277 x 0.0185 = 0
We count also for Negative Comment
p(N) x p(Dear|Negative) x p(Bad|Negative) x p(Donation|Negative) = 0.333 x 0.1 x 0.1 x 0.75 = 24.975
So, we can say “Dear Bad Donation” is Negative Comment because p(G|”Dear Bad Donation”) = 0 < 24.975 = p(N|”Dear Bad Donation”)