Read SLP Chapter 3 #6

rohildshah · 2024-01-25T03:07:25Z

Probability of a sequence or single word occurring is useful for: speech recognition, spelling correction, grammatical error correction, machine translation, augmentative and alternative communication (word suggestion)

to calculate probability of a given 5-word sentence: don't just count the desired sentences and divide by all 5-word sentences in an entire corpus (too much!)
instead, we estimate the joint probability of a sequence of words by multiplying conditional probabilities
ex: a bigram is an approximation of $P(w_i|w_{1:n-1})$ by just doing $P(w_i|w_{i-1})$.
Markov assumption: the probability of the next thing only depends on some number of previous things (not looking too far in the past).
bi-gram looks 1 word in the past, trigram looks 2, so n-gram looks n-1 words in the past to product probability of nth word.
to compute the probability of a sequence of words, multiply all 2-word probabilities (for bigram)
estimate probabilities using maximum likelihood estimation (MLE) which is just relative frequency
trigrams are most common, 4/5-gram are less likely
use log probabilities because when multiplying many regular probabilities, you get a small number. log probabilities help keep the number large

Provide feedback