Hidden-Markov-Models

Michael Collins's NLP Coursera Course - Lab #1

Goal: Accurately generate part of speech tags using a trigram Hidden Markov Model. In this application, I distinguish between 'I-GENE' and normal words in biological text.

Tasks:

Figure out which words in the training data occur < 5 times, and can be used to estimate counts for rare words to smooth the probabilities. Replace those words in the counts file with 'RARE'. See replace_rare.py
Compile the unigram, bigram and trigram probabilities. See emission_probs.py
Implement the Vitterbi Algorithm to generated the most likely tags given the training data. Also see emission_probs.py
Classify rare types based on capitalization, digits, etc. SEe adv_replace_rare.py

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.gitignore		.gitignore
README.md		README.md
count_freqs.py		count_freqs.py
emission_probs.py		emission_probs.py
eval_gene_tagger.py		eval_gene_tagger.py
gene.counts		gene.counts
gene.dev		gene.dev
gene.key		gene.key
gene.test		gene.test
gene.train		gene.train
new.counts		new.counts
new.key		new.key
new.train		new.train
newlong.train		newlong.train
submit.py		submit.py
tester.py		tester.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hidden-Markov-Models

About

Releases

Packages

Languages

momandine/Hidden-Markov-Models

Folders and files

Latest commit

History

Repository files navigation

Hidden-Markov-Models

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages