Program 3: image classification
In this project your goal is to create an image classification system using the torch module for neural networks in python. For the first part you will do a subtrain/validation split with two different neural networks, compute gradients and network weight update steps using the subtrain set, then compute subtrain/validation loss and plot it against the number of epochs.
Tutorial explaining pytorch installation under anaconda.
The command I used to install was:
conda install pytorch tensorboard torchvision cpuonly -c pytorch
After that you should be able to do import torch
in python.
To test your installation try running the following using my sample code and visualizing the results in tensorboard
rm -r runs && python single_hidden_layer.py && tensorboard --logdir=runs
Here is some torch code that implements the regression model we saw in class lecture slides:
Here are tutorials that show you how to load, visualize, and transform the Fashion MNIST data set.
Here is a tutorial that shows you how to visualize data from training runs on tensorboard,
Please read the torch docs!
You should create a Python script:
- begin by initializing two SummaryWriter instances, one for subtrain, one for validation. (even before that in your script, you should remove the log directories so that you have fresh logs every time you re-run your script)
- implement two neural networks:
- LeNet, as described here.
- Fully connected network, as described here, but with the first step in forward using torch.flatten so that each input is represented as a 784-vector (1d tensor with that many elements).
- set up a transform using Compose, Resize, ToTensor from torchvision.transforms, which will convert each input into a 28x28 Tensor.
- download the Fashion MNIST train data using torchvision.datasets.FashionMNIST (use train=True and transform=the transformation you created in the previous step). This will automatically rescale inputs to [0,1] so you dont need any additional step to do that (also no need to transform the outputs to a one hot encoding – the torch CrossEntropyLoss accepts integer valued target outputs for classification problems).
- randomly split the train set into 1/6 validation, 5/6 subtrain (as in 6-fold cross-validation). If you use the full 60,000 observations in the train set (extra credit), there should be 10,000 validation, 50,000 subtrain. An easy way to do that is with torch.utils.data.random_split (if you use smaller sizes you need to create a third set other than subtrain/validation with the ignored/leftover data). If it is slow on your computer to use the full data, it is OK to reduce the data sizes (e.g., for debugging try 100 validation, 500 subtrain). To reduce repetition in your code, try defining a dictionary of sizes and then referring the set names and sizes defined therein, e.g., {“subtrain”:5000, “validation”:1000}
- Make a torch.utils.data.DataLoader for each set (subtrain/validation), which you will use later to compute the loss in each epoch with respect to all examples in the subtrain/validation sets. Also make a DataLoader for the subtrain set that you will use for computing gradients and weight update steps – in this one you can specify a batch_size argument which will be used to determine how many subtrain observations are used to compute gradients for one step. (the subtrain DataLoaders for computing gradients and loss can be the same or different, your choice)
- Make an instance of the loss function you should use for classification, torch.nn.CrossEntropyLoss.
- use a for loop over the two networks, to simplify your code.
- For each network, begin by instantiating an optimizer. If you want to use a network-specific optimizer (e.g., torch.optim.SGD for fully connected network and torch.optim.Adam for convolutional network), you can make the optimizer instance an attribute of the network class.
- use a for loop over epochs of learning. An epoch is defined as the
number of steps it takes for the learning algorithm to see the
entire subtrain set. For batch size = 1 and subtrain set size = N
there are N steps per epoch; for batch size k there are
approximately N/k steps per epoch. In each epoch you should
- take one or more steps using one of the optimizers (e.g., SGD) and
a batch size of your choice.
- this means writing a for loop over batches using your DataLoader (for x,y in data_loader).
- for each batch, first compute predictions, then compute loss, then zero the gradient via optimizer.zero_grad(), then call loss.backward() to compute gradients, then call optimizer.step() to update the neural network weights using the gradients.
- compute loss with respect to entire subtrain and validation sets. You should have a for loop over sets (subtrain/validation) and batches within each set. Compute the total or average loss over all batches, for each set.
- print these values on the screen and log these loss values to the tensorboard writer e.g.,
- take one or more steps using one of the optimizers (e.g., SGD) and
a batch size of your choice.
# I used a dictionary with three keys (epoch/subtrain/validation) to
# store and then print these values.
print('epoch=%(epoch)4d subtrain=%(subtrain)f validation=%(validation)f' % loss_dict)
# log to SummaryWriter instance for visualization on TensorBoard.
writer[set_name].add_scalar(network._get_name()+'/loss', loss, epoch)
After that please load the saved data into tensorboard for visualization, e.g.
rm -r runs && python your_script.py && tensorboard --logdir=runs
Then save/screenshot/export the subtrain/validation loss plots (one for fully connected network, one for convolutional network), as a function of the number of epochs.
IMPORTANT: the subtrain loss should always decrease, whereas the validation loss should be U-shaped.
- If the subtrain loss is not always decreasing then you probably need to decrease the step size (learning rate = lr parameter of optimizers).
- If the validation loss is not U-shaped, then you probably need to increase the number of iterations/epochs, or increase the step size.
Deliverable should be a PDF uploaded to bblearn with
- cover page
- result figures along with your comments / interpretation. For each
network,
- Show the plot of subtrain/validation loss versus number of epochs.
- What was the optimization algorithm (SGD, Adam, etc) / batch size / learning rate / max number of epochs you used?
- What was the number of epochs that minimized the validation loss?
- If you did the extra credit (used all 60,000 observations in train set) then please mention that.
- Python code.
If you are adapting my python script for 1d regression with one hidden layer:
- The loss function is different: mean squared error for regression, cross-entropy loss for classification.
- There is a for loop over three data sets (pattern variable), which you don’t need for part 1, but you may want to keep for part 2 (running your models on both MNIST/digits and FashionMNIST data).
- Use DataLoader and FashionMNIST instead of loading data using pandas.
- The number of inputs to the neural network is different: 1 input for 1d regression, 28x28 inputs for FashionMNIST.
- The number of outputs is different: 1 output for 1d regression, 10 outputs for FashionMNIST.
- How to use gpu for training?
- Do outputs need to be one-hot encoded? (binary vector indicating class) Actually NO, the CrossEntropyLoss docs say that target should be an integer vector, so no need to one-hot encode.
- How can I speed up my calculations? Instead of using the full data, try reducing the data size, e.g., 1000 validation, 5000 subtrain. The important thing is to see the characteristic decreasing subtrain loss, and U shaped validation loss.
- What is a good value for max epochs? It depends on a lot of other variables (data size, learning rate, batch size, optimizer). In general the more data you have the more epochs you will need to learn, and the lower learning rate the larger number of epochs you will need. If the validation loss is decreasing (not U shaped) then you are underfitting and you need to increase the max epochs. I found that with a subset of the data (5000 subtrain), max epochs=100 with a batch size of 100, and learning rate = 0.01 with the Adam optimizer, was sufficient to see the U shaped validation loss curve for both networks.
- How to fix RuntimeError: Expected 4-dimensional input for 4-dimensional weight [6, 1, 5, 5], but got 3-dimensional input of size [1, 28, 28] instead? This refers to incompatibility of the Conv2d operator with the input (here a single image with one channel, 28x28 pixels). Fix by adding a dimension to the input for observations, e.g. by using
- How to fix RuntimeError: mat1 and mat2 shapes cannot be multiplied (2800x28 and 784x300)? This is because the fully connected network needs a flat (1d) input vector. Use torch.flatten as the first step in your forward method of your fully connected network.