You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Probability of a sequence or single word occurring is useful for: speech recognition, spelling correction, grammatical error correction, machine translation, augmentative and alternative communication (word suggestion)
N-grams
to calculate probability of a given 5-word sentence: don't just count the desired sentences and divide by all 5-word sentences in an entire corpus (too much!)
instead, we estimate the joint probability of a sequence of words by multiplying conditional probabilities
ex: a bigram is an approximation of $P(w_i|w_{1:n-1})$ by just doing $P(w_i|w_{i-1})$.
Markov assumption: the probability of the next thing only depends on some number of previous things (not looking too far in the past).
bi-gram looks 1 word in the past, trigram looks 2, so n-gram looks n-1 words in the past to product probability of nth word.
to compute the probability of a sequence of words, multiply all 2-word probabilities (for bigram)
estimate probabilities using maximum likelihood estimation (MLE) which is just relative frequency
trigrams are most common, 4/5-gram are less likely
use log probabilities because when multiplying many regular probabilities, you get a small number. log probabilities help keep the number large
The text was updated successfully, but these errors were encountered:
N-grams
The text was updated successfully, but these errors were encountered: