PLease refer to my blog Transliterated Queries 2 – Deep Learning for the implementation details.
The project is inspired by my following papers:
- Language Identification and Disambiguation in Indian Mixed-Script.
- Construction of a Semi-Automated model for FAQ Retrieval via Short Message Service.
Refer to my blog for implementation of above papers: Simple Markov Model for correcting Transliterated Queries
Install the following packages for using the project:
pip install nltk
pip install keras
pip install tensorflow
pip install h5py
import auto_correct as auto
model = auto.auto_correct()
model.run()
enter a query
hw to lrn pythn anddeeplearning eas ily
how to learn python and deep learning easily 11.2134873867
auto_correct(data=,re_train=,vocab_size=,step=,batch_size=,nb_epoch=,embed_dims=)
For retraining the model, set re_train
= True
and pass the queries as the other argument. The queries must be given in the following format:
queries=[]
queries = ['how to handle a 1.5 year old when hitting',
'how can i avoid getting sick in china',
'how do male penguins survive without eating for four months',
'how do i remove candle wax from a polar fleece jacket',
'how do i find an out of print book']
model = auto.auto_correct(re_train=True,data=queries)
The other parameters to the model are
vocab_size
- The size of vocabulary used i.e. the number of unique wordsstep
- The size of sliding windowbatch_size
- Number of training samples to be passed on one iterationnb_epoch
- Total number of iterationembed_dims
- Embedding dimension size