infotheory-research

Code for Research project of 2019-2020 on applying information theory to language.

Includes:

scripts to parse corpus and build n gram model
Training Logistic Regression and testing classfication of long and short form word usage in sentence
Testing pre-trained Bert model for predicting next word
Efficently computing measures such as entropy, surprisal, pmi on large corpus data

Still in progress

Name		Name	Last commit message	Last commit date
Latest commit History 82 Commits
.ipynb_checkpoints		.ipynb_checkpoints
classification		classification
data		data
ngram_models		ngram_models
ngram_stats		ngram_stats
tools		tools
.gitignore		.gitignore
Cosine Sim.ipynb		Cosine Sim.ipynb
LICENSE		LICENSE
README.md		README.md

Provide feedback