Various jupyter notebooks are there using different Language Models for next word Prediction.
The users Enters a text sentence
Predicts a word which can follow the input sentence
Various Smoothing Techniques have been used in different Language Models along with combination of Interpolation and Backoff in these different Language Models.
1. Add 1
2. Good Turing
3. Simple Knesser Ney
4. Interpolated Knesser Ney
- Cleaning of training corpus ( Removing Punctuations etc)
- Creation of Language Model:
i) Formation of n-grams (Unigram, Bigram, Trigram, Quadgram)
ii) Probability Dictionary Creation with provision of various Smoothing Mechanism