This project utilizes machine learning and artificial intelligence techniques to analyze social media user engagement based on various factors such as sentiment, likes, retweets, and hashtags. The goal is to understand user behavior and identify patterns that contribute to higher engagement on social media platforms.
This project can either be executed on your local machines using Jupyter Notebook or using Google Colab.
- Go to the following link to use Google Colab.
- Clone the repository.
- Go to File option in the Menu Bar and click on "Open Notebook" to open the ipynb file saved on your device. Conversely, you can also use the Keyboard shortcut Ctrl+O.
- Click on "Upload" and select the downloaded ipynb file from the cloned repository.
- While executing the following code snippet, you will be presented with a prompt. For the code snippet to work, you need to allow Google Colab to access your Google Drive to access the CSV dataset file present on your Google Drive.
- Install Jupyter Notebook using default installation method.
- Clone the repository and open the ipynb file using Jupyter Notebook
- Execute the code sequentially
- Run the
sentimentdataset.csv
file containing the dataset in the same directory as the Python Notebook. - Execute the Python Notebook
social_media_engagement_analyzer.ipynb
. - Follow the prompts and instructions provided in the script to preprocess the data, train the machine learning model, and visualize the results.
The dataset used in this project (sentimentdataset.csv
) contains social media posts with information such as timestamp, sentiment, likes, retweets, and hashtags. The Dataset was taken from Kaggle. Link for the dataset used is given here.
The project employs a RandomForestClassifier model to analyze user engagement based on various features extracted from social media posts.
The performance of the model is evaluated using classification report metrics such as precision, recall, and F1-score.
-
Time of Day Distribution: Shows the distribution of user posts based on time of day, categorized as day or night.
-
Correlation Heatmap: Visualizes the correlation between different features of the dataset.
-
Feature Importance Plot: Illustrates the importance of various features in the RandomForestClassifier model.
-
Top Hashtags with Maximum User Engagement: Displays the top hashtags with the highest average combined user engagement scores.
- Anujesh Bansal - anujeshify
Contributions to this project are welcome! If you have any suggestions, bug fixes, or improvements, please feel free to open an issue or submit a pull request. For any queries reach me at [email protected]
Special thanks to the contributors and developers of the libraries and tools used in this project, including pandas, scikit-learn, matplotlib, and seaborn.