Code for Research project of 2019-2020 on applying information theory to language.
Includes:
- scripts to parse corpus and build n gram model
- Training Logistic Regression and testing classfication of long and short form word usage in sentence
- Testing pre-trained Bert model for predicting next word
- Efficently computing measures such as entropy, surprisal, pmi on large corpus data
Still in progress