The objective of this project is to build a model that can take a sentence with spelling mistakes as input, and output the same sentence, but with the mistakes corrected. The data that we will use for this project will be twenty popular books from Project Gutenberg. Our model is designed using grid search to find the optimal architecture, and hyperparameter values. The best results, as measured by sequence loss with 15% of our data, were created using a two-layered network with a bi-direction RNN in the encoding layer and Bahdanau Attention in the decoding layer. FloydHub's GPU service was used to train the model.
All of the books that I used for training can be found in books.zip.
To view my work most easily, see the .ipynb file.
I wrote an article that explains how to create the input data (the sentences with spelling mistakes) for this model.