This project presents a deep convolutional autoencoder which I developed in collaboration with a fellow student Li Nguyen for an assignment in the Machine Learning Applications for Computer Graphics class at Tel Aviv University. To find out more about the assignment results please read the report.
Install the required dependencies. Consider creating a virtual environment first.
pip install -r requirements.txt
Save your data in a directory of your choice. We used the celebrity dataset CelebA from the paper Deep Learning Face Attributes in the Wild presented at ICCV 2015. If you would like to use your own data, the only thing you need to adjust are the training, validation and test splits in the prepare_data()
script supports the following command line arguments.
python -h
usage: [-h] [--data_root DATA_ROOT] [--log_dir LOG_DIR] [--num_workers NUM_WORKERS] [--image_size IMAGE_SIZE]
[--max_epochs MAX_EPOCHS] [--batch_size BATCH_SIZE] [--nc NC] [--nz NZ] [--nfe NFE] [--nfd NFD] [--lr LR] [--beta1 BETA1]
[--beta2 BETA2] [--gpus GPUS]
optional arguments:
-h, --help show this help message and exit
--data_root DATA_ROOT
Data root directory
--log_dir LOG_DIR Logging directory
--num_workers NUM_WORKERS
num_workers > 0 turns on multi-process data loading
--image_size IMAGE_SIZE
Spatial size of training images
--max_epochs MAX_EPOCHS
Number of maximum training epochs
--batch_size BATCH_SIZE
Batch size during training
--nc NC Number of channels in the training images
--nz NZ Size of latent vector z
--nfe NFE Size of feature maps in encoder
--nfd NFD Size of feature maps in decoder
--lr LR Learning rate for optimizer
--beta1 BETA1 Beta1 hyperparameter for Adam optimizer
--beta2 BETA2 Beta2 hyperparameter for Adam optimizer
--gpus GPUS Number of GPUs. Use 0 for CPU mode
To train the autoencoder simply pass in the path where your data is located and the number of GPUs you would like to train on.
python --data_root data --gpus 2
Fire up Tensorboard to visualize the training progress of your network in your browser under http://localhost:6006/
tensorboard --logdir logs
The input images with shape 3 * 128 * 128 are encoded into a 1D bottleneck of size 256. This corresponds to a compression of 95.31%. We decode the images such that the reconstructed images match the original images as closely as possible. We use a MSE reconstruction loss for this. We tried several different architectures and hyperparameters. The following is our best performing model and below we show some visual results (original images in top row, reconstructed images in bottom row).
Layer (type) Output Shape Param #
Conv2d-1 [-1, 64, 64, 64] 3,072
BatchNorm2d-2 [-1, 64, 64, 64] 128
LeakyReLU-3 [-1, 64, 64, 64] 0
Conv2d-4 [-1, 128, 32, 32] 131,072
BatchNorm2d-5 [-1, 128, 32, 32] 256
LeakyReLU-6 [-1, 128, 32, 32] 0
Conv2d-7 [-1, 256, 16, 16] 524,288
BatchNorm2d-8 [-1, 256, 16, 16] 512
LeakyReLU-9 [-1, 256, 16, 16] 0
Conv2d-10 [-1, 512, 8, 8] 2,097,152
BatchNorm2d-11 [-1, 512, 8, 8] 1,024
LeakyReLU-12 [-1, 512, 8, 8] 0
Conv2d-13 [-1, 1024, 4, 4] 8,388,608
BatchNorm2d-14 [-1, 1024, 4, 4] 2,048
LeakyReLU-15 [-1, 1024, 4, 4] 0
Conv2d-16 [-1, 256, 1, 1] 4,194,304
BatchNorm2d-17 [-1, 256, 1, 1] 512
LeakyReLU-18 [-1, 256, 1, 1] 0
ConvTranspose2d-19 [-1, 1024, 4, 4] 4,194,304
BatchNorm2d-20 [-1, 1024, 4, 4] 2,048
ReLU-21 [-1, 1024, 4, 4] 0
ConvTranspose2d-22 [-1, 512, 8, 8] 8,388,608
BatchNorm2d-23 [-1, 512, 8, 8] 1,024
ReLU-24 [-1, 512, 8, 8] 0
ConvTranspose2d-25 [-1, 256, 16, 16] 2,097,152
BatchNorm2d-26 [-1, 256, 16, 16] 512
ReLU-27 [-1, 256, 16, 16] 0
ConvTranspose2d-28 [-1, 128, 32, 32] 524,288
BatchNorm2d-29 [-1, 128, 32, 32] 256
ReLU-30 [-1, 128, 32, 32] 0
ConvTranspose2d-31 [-1, 64, 64, 64] 131,072
BatchNorm2d-32 [-1, 64, 64, 64] 128
ReLU-33 [-1, 64, 64, 64] 0
ConvTranspose2d-34 [-1, 3, 128, 128] 3,072
Tanh-35 [-1, 3, 128, 128] 0
Total params: 30,685,440
Trainable params: 30,685,440
Non-trainable params: 0
Input size (MB): 0.19
Forward/backward pass size (MB): 24.01
Params size (MB): 117.06
Estimated Total Size (MB): 141.25