An arabic chatbot that can detect sentiment and reply accordingly.

Pipeline Steps:

Read and merge train and test datasets
Combine all contexts into either positive or negative sentiment.
Use Arabic library ("qalsadi.lemmatizer") for tokenization, removing stop-words and lemmatization.
We create a new replicated column of the available sentences and then we add it to the current dataset but shifted up by 1.
We remove the last sentence from every conversation in the dataset as it doesn’t have a reply (the next sentence will be for another conversation).
We divide the dataset into training/testing datasets.
Train machine learning Logistic Regression model on the training dataset
Run the trained model on the entered query to classify its sentiment.
Create Tf-idf for all sentences that have the same sentiment as the query.
Create Tf-idf for the entered query
Calculate cosine similarity between the entered query and all sentences that have the same sentiment.
Choose the sentence with the highest cosine similarity
Output the following sentence as it was the reply for the most similar sentence.

Team Members: