This is a project for personalizing the utterances of a conversational agent (SlugBot for UCSC).
For now, you need just NLTK to run the above codes. Please use Python 3.5+.
Clone the repository to your machine.
The file clean.py is for cleaning the original dataset provided. The results from the code are stored in each character file named Chandler_all.txt, Ross_all.txt etc. You can run it by,
py clean.py
The other files available now for feature extraction are extract_pos_bigrams.py (for extracting POS bigrams)