Please see this link for instructions on how to download your own data from chatgpt.
This project involves general analysis of chat data between users and an AI assistant. The data consists of multiple conversations, each containing various messages from both the user and the assistant. Key objectives of this analysis are:
- Understanding sentiment trends in the messages.
- Identifying the busiest times for user interaction.
- Analyzing assistant response times.
- Studying the distribution of user feedback ratings.
The insights gained from this analysis will aid in enhancing the performance of the AI assistant, improving user experience, and shaping future interaction strategies.
The project depends on the following Python libraries:
- pandas: Used for data manipulation and analysis.
- json: Used for parsing JSON files.
- zipfile: Used to extract chat data stored in a zip file.
- seaborn: Used for creating attractive and informative statistical graphics.
- textblob.TextBlob: Used for processing textual data and performing sentiment analysis.
- matplotlib.pyplot: Used for creating 2D graphics and visualizations.
The Jupyter notebook is structured into the following sections:
-
Data Extraction and Preprocessing: This involves extracting data from a zip file, loading it into Python, and preprocessing it for analysis.
-
Data Cleaning: This section covers the cleaning of the data to ensure its accuracy and readiness for analysis.
-
Sentiment Analysis: Sentiment analysis is performed on user and assistant messages to understand the overall sentiment of the conversations.
-
Data Visualization and Insights: This section contains various visualizations that provide insights into the chat data.
Key code snippets from the notebook include:
- Extraction of data from a zip file.
- Loading of JSON data.
- Data preprocessing and conversion into a pandas DataFrame.
- Data cleaning, including handling missing values, removing unwanted characters, converting timestamps, and removing duplicates.
These snippets form the backbone of the data extraction, preprocessing, and cleaning stages, setting the stage for subsequent analysis and visualization.
The notebook presents findings on sentiment trends, peak user interaction times, assistant response times, and the distribution of user feedback ratings. Detailed results and findings can be obtained through a complete run and analysis of the notebook.
Potential future directions for this project include:
- Enhancing the AI assistant's performance based on the findings of the analysis.
- Implementing more sophisticated sentiment analysis techniques.
- Developing more detailed visualization methods to better understand user interaction patterns.
These enhancements will drive the project towards its goal of understanding and improving user interactions with the AI assistant.
Most of the notebook was generated using CodeInterpereter and GPT-4. These tools greatly aided in the development and analysis process, providing powerful AI-assisted capabilities. (This note was generated by code-interpreter as well, as you may have expected)