Skip to content

Script Analysis of 'The Office'. Sentiment Analysis, Topic Modelling and Dialogue Generator.

License

Notifications You must be signed in to change notification settings

shantanu-555/The-Office-Script-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

86 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

"That's what she said!" - The Office Script Analysis

by Luuk Boekestein, Shantanu Motiani and Eline Westerbeek

Quick links


Abstract

This project presents a comprehensive analysis of the character dialogue in the popular US TV show "The Office." Through the application of sentiment analysis and topic modelling on the lines in the show, and the development of a dialogue generator using Markov chains, we aim to uncover insights into the differences in emotional undertones, character interactions and patterns of the show’s dialogue. The sentiment analysis revealed the diverse range of emotions expressed by the characters, providing a nuanced understanding of their changing interactions. The topic modelling identified recurring themes and topics within the dialogue, shedding light on the show's narrative structures. Finally, the dialogue generator utilized Markov chains to mimic the unique speech patterns of the characters, allowing us to generate simplistic new dialogues. This project offers a valuable contribution to the analysis of "The Office," showcasing the potential of computational techniques in analysing character dialogue in TV shows.

Research Questions

  • What are the interactions between characters throughout the seasons?
  • Sentiment analysis – What are the sentiments expressed in the dialogues, and how do they vary per character and season?
  • Topic modelling – Are there certain topics that certain characters tend to talk about? And can we identify the primary topics of an episode?
  • Dialogue generation: generate quote from each character

Dataset

For this project, we used a dataset that we found on Kaggle.com, containing all the scripts of the American TV series "The Office". It contains the season, episode, scene, line of dialogue, the speaker and whether the scene was deleted or not. The dataset contains almost 60,000 rows (lines), with 7 columns indicating season, episode, scene, speaker, line, deleted and episode name. The dataset is in a .csv format. The dataset can be found online here, or can be retreived from the data/dataset folder.

A simple overview of the dataset using ProfileReport from ydata_profiling can be found in the data_report.html file.

Documentation

Report

A detailed report of the project can be found here. The 4-page report contains a detailed description of the project, the research questions, the methods used, the results, conclusions and references.

Code

The main code used for this project can be found in the src folder. Within the src folder there is a subfolder for each adressed research question:

  • the code for vizualizing the data can be found in the notebook in the vizualizations folder,
  • the code for the sentiment analysis can be found in the notebook in the sentiment analysis folder,
  • and the code for the topic modelling can be found in the notebook in the topic modeling folder

Furthermore, the code used to generate readability scores for each character is located in the gunning fog folder.

The packages needed to run the code in this project (including the webapp) can be found in the requirements.txt file. To install these packages, run the following command in your terminal:

pip install -r requirements.txt

Webapp

To enable an easy exploration of our results, we created a simple webapp using the streamlit package. To run the webapp, run the following command in your terminal:

streamlit run .\app\Home.py

The code used to create the webapp can be found in the app folder.

Further documentation

The project updates and initial project plan can be found in the docs folder. Furthermore, the slides that we used for the final presentation can be found in the slides pdf.

About

Script Analysis of 'The Office'. Sentiment Analysis, Topic Modelling and Dialogue Generator.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •