Please use these Screenshots and notes to better understand how to use the application and how to navigate easily.
As the digital age continues to boom, the ability to translate handwriting to text autonomously is a necessary technology to have. Many older documents in an agency are handwritten. We aim to create a software tool to solve this problem. In this project specifically we created a web application that can translate a digit from 0 to 9 into a number using a neural network.
The solution to this would be image recognition, or more specifically, character recognition. This project takes a step towards character recognition by detecting digits. Our website consists of 3 pages: Home, Draw, Profile. The home page has general information about the neural network. Draw allows one to draw a number with the option of saving or clearing the pad. Saving will run the picture through our neural network and our machine will give a guess as to what digit was drawn. The image will be saved in our Profile page, where the user can view or delete a list of drawn images.
To setup a localhost web server for developmental purposes, install Python 3.6 with pip.
Git clone this repository and follow the steps below.
Open up the cloned repo and install the required packages in terminal
pip install -U -r requirements.txt
To run the application
python main.py
A neural network is both an algorithm and data structure. Although the neural network resembles a graph, it lacks the physical nodes that are present in a graph. It has biases and weights, which are lists. It uses a back propogation algorithm to train and adjust the biases and weights. The neural network itself is also often described as an algorithm to convert images to digits. Our neural network has a total of 4 layers: 1 input (784 nodes), 2 hidden (16 nodes each), and 1 output (10 nodes for each digit). The network has a total of 784*16 + 16*16 + 16*10 = 12960
weights. It has 16+16+10 = 42
biases. A handwritten explanation of the data and feedforward function can be shown here.
Aside from the neural network, we also used a variety of algorithms in our website. They are documented through sphinx documentation. To view a summary of each function, clone the repository and run ./docs/build/html/index.html
.
The data structure used most in our project is lists. Our neural network consists of three main lists: size, biases, weights. We also used lists in the back-end of the website. This was so we could store the information of the user generated numbers (their images), and the guessed value so the user could view and remove them later.
This neural network as a whole contained many functions, many of which have their own complexities.
Let:
- b: number of batches
- d: size of test data
- e: epoch number
- m: number of layers in the neural network
- n: average number of weights in each layer
- s: size of a batch
This function performed several duties, including feeding the test data forward and adjusting the nodes of the neural network based on the result from the feed forward portion. The time complexity of the feed forward portion is O(mn). The time complexity for the backward pass portion is O(mn). Thus, the time complexity of this function is O(mn) Unsurprisingly, the time complexity of both functions are the same. the feed forward function and backward pass function are just different methods of stepping through the neural network. They both consist of the O(n) complexity from the dot product and O(m) complexity from traversing the layers.
This function updated the neural network's weights and biases based on the data from the batch. The time complexity of upbatch is O(mns). Backprop
has a time complexity of O(mn) and the for
loop executes this function once for each element of batch.
This function is called to train the neural network. The time complexity of training is O(mnsbe). Within this function, there is a loop that runs as many times as the given epoch, hence the e. Inside the loop, there is a shuffle
function, which is O(d). Also within the epoch loop is a loop that passes each smaller batch into upbatch
. The time complexity of the batch loop is O(mnsb) from the function.
Feedforward is the primary function called when a user wants to see what digit was drawn. The time complexity of feedforward is O(mn). It is the same as the feedforward in backprop.
Evaluate is used in the training function to determine how much of the test data worked properly. The time complexity of this function is O(mnd). It runs feedforward
for every element within test data of size d.
__init__
is the constructor of the neural network. The default size of the neural network is [784,16,16,10]
, with 1 input layer of size 784, 2 hidden layers of size 16, and an output layer of 10. The time complexity of this function is O(mn), which occurs when setting up the weights.
PIL stands for Python imaging library and is also known as Pillow. PIL comes with a variety of imaging tools, including the ones we used to scale our image to 20x20 and back up to 28x28.
Numpy is a python library that comes with utilities for linear algebra. It provids more efficient methods of array manipulation. In our neural network, we used this for shuffling, randomizing, dot products, and finding the index of the maximum value.
Matplot was used in the initial testing of our resizing function and provided a better way of seeing our image rescaling results.
MNIST provided a dataset of 10,000 handwritten digits which we used to train our neural network in. We used this in mnist_loader.py
to return the test cases.
We used flask, which is a package for Python that can serve as a micro-web framework. We utilzed this to serve as the controller for our website so the user could navigate and route through different web pages with ease.
We utilized bootstrap and it's dependencies to make our website look much prettier.
We deployed our website on heroku, and it was successfully deployed on here.
We used sphinx documentation to document our functions and code.
This project really tested the abilities of our group. This project was a new concept for all of us since only two of us had just recently been exposed to machine learning. In the end, our website was able to work as intended, with a home, draw, and profile page. Our neural network struggled a little. It was unable to detect some (literally) edge cases where the number was drawn on the edge of the page. The digit had to be centered and even when it was, the neural network was not able to consistently detect the correct digit.
- Researched the neural network
- Created the mnist loader and took apart the test cases
- Created the neural network
- Saved the neural network nodes for the website
- Finished register/logout functionality
- Brainstormed ideas and methods of carrying out tasks
- Ensured each group member was on task and working
- Tested each feature and helped fix bugs
- Did documentation on neural network
- Designed and created the pages for the frontend
- Created drawing canvas for user to draw on (w/ save and clear functionality)
- Connected the frontend & backend to dynamically display results
- Helped save images/outputs of the user into the database
- Created initial loader for mnist data
- Created flask template, routes, and databases for website
- Created login/register form functionality
- Displayed cards (and delete all functionality) onto profile page
- Deployed project to heroku