Skip to content

Implementing Different Types of Deep Convolutional Generative Adversarial Networks using different datasets

License

Notifications You must be signed in to change notification settings

AdamClarkStandke/GAN_Models

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Theoretical underpinnings of GAN and DCGAN

As detailed in the book Advanced Deep Learning with Python: Design and implement advanced next-generation AI solutions using TensorFlow and PyTorch by Ivan Vasilev. The DCGAN stems from the landmark paper introduced in 2014 titled Generative Adversarial Nets The implementation of the paper's algorithm comes from a tensorflow tutorial titled dcgan

The generator:

To learn the generator’s distribution pg over data x, we define a prior on input noise variable pz(z), then represent a mapping to data space as G(zg), where G is a differential function represented by a multilayer perceptron with parameters θg 1. This is represented by the following code:

# function that builds the generator
def build_generator(latent_input, weight_initialization, channel):
  model = Sequential(name='generator')
  # first fully connected layer to take in 1D latent vector/tensor z
  # and output a 1D tensor of size 12,544
  model.add(keras.layers.Dense(7*7*256, input_shape=(latent_input,)))
  # applies a transformation that maintains the mean output close to 0 and the output standard deviation close to 1
  # stabilizes training after the conv layer and before the activation function
  model.add(keras.layers.BatchNormalization())
  # activation function
  model.add(keras.layers.ReLU())
  # reshape previous layer into a 3D tensor 
  model.add(keras.layers.Reshape((7, 7, 256)))
  # first layer of upsampeling(i.e. deconvolution) of the 3D tensor to output a 7x7 feature map as determined by the stride  
  model.add(keras.layers.Conv2DTranspose(filters=128, kernel_size=(5,5), strides=(1,1), padding='same', kernel_initializer=weight_initialization))
  model.add(keras.layers.BatchNormalization()) 
  model.add(keras.layers.ReLU()) 
  # second layer of upsampeling(i.e. deconvolution) in which the volume depth is reduced to 64 
  # and outputs a feature map of size 14x14 as determined by the stride
  model.add(keras.layers.Conv2DTranspose(filters=64, kernel_size=(5,5), strides=(2,2), padding='same', kernel_initializer=weight_initialization)) 
  model.add(keras.layers.BatchNormalization())
  model.add(keras.layers.ReLU()) 
  # third layer upsampeling(i.e. deconvolution) in which the volume depth is reduced to 1 and the image is output as 28x28x1
  model.add(keras.layers.Conv2DTranspose(filters=channel, kernel_size=(5,5), strides=(2,2), padding='same', activation='tanh'))
  return model

Which is represented by the below image as found in the 2016 paper titled UNSUPERVISED REPRESENTATION LEARNING WITH DEEP CONVOLUTIONAL GENERATIVE ADVERSARIAL NETWORKS alt text2.

The discriminator:

We also define a second multilayer perceptron D(xd) that outputs a single scalar. D(x) represents the probability that x came from the data rather than pg.1. This is represented by the following code:

def build_discriminator(width, height, depth, alpha=0.2):
  model = Sequential(name='discriminator')
  input_shape = (height, width, depth)
  # first layer of discriminator network that downsamples image to 14x14 as determined by stride and 
  # increases depth by 64
  model.add(keras.layers.Conv2D(filters=64, kernel_size=(5, 5), strides=(2,2), padding='same', input_shape = input_shape))
  model.add(keras.layers.BatchNormalization())
  model.add(keras.layers.LeakyReLU(alpha=alpha))
  # second layer of discriminator network that downsamples image to 7x7 and increaes depth to 128
  model.add(keras.layers.Conv2D(filters=128, kernel_size=(5, 5), strides=(2,2), padding='same'))
  model.add(keras.layers.BatchNormalization())
  model.add(keras.layers.LeakyReLU(alpha=alpha))
  # flatten 3D tensor to 1D tensor of size 7*7*128 = 6727
  model.add(keras.layers.Flatten()) 
  # apply dropout of 30% before feeding it to the dense layer
  model.add(keras.layers.Dropout(0.3))
  model.add(keras.layers.Dense(1, activation='sigmoid')) 
  return model

GAN Loss function:

We train D to maximize the probability of assigning the correct label to both training examples and samples from G. We simultaneously train G to minimize log(1 − D(G(z))). In other words, D and G play the following two-player minimax game with value function V(G, D): alt text1. However as the authors of the paper note this objective function does not perform in practice, since it may not provide sufficient gradients for the generator to acutally learn, especially during the early stages of learning when the discriminator is very accurate (i.e. outputing 0 rather than 1 so the gradient will be 0 and the weights of the generator will not move). So rather than training the generator to minimize log(1-D(G(z))), training is done to maximize log D(G(z)).1.

Example 1: Deep Convolutional GAN using MINST Fashion Dataset

Using a deep convolution GAN to create fashion clothes from a Gaussian distribution trained using the MINST fashion data set. Click below on the picture of Daphne to show the video of the transformation from random noise into actual fashion clothes that I think Daphne would include in her wardrobe! 👗 (espically if it is a White Party)

CLICK HERE

The sorce code for this example can be found here: DCGAN-MINST Fashion

Example 2: Deep Convolutional GAN using the Celeb-A Faces Dataset:

Using a deep convolution GAN to create new faces from a Gaussian distribution trained using the Celeb-A Faces dataset. After just training for five epochs these were fake faces that were generated(note that some of them look realistic, especially the women with the blond hair, lol):

alt text alt text alt text alt text alt text

The sorce code for this example can be found here: DCGAN-Celeb-A Faces


Theoretical underpinnings of Conditional GAN and Supervised Pix2Pix

As detailed in the book Advanced Deep Learning with Python: Design and implement advanced next-generation AI solutions using TensorFlow and PyTorch by Ivan Vasilev. The Pix2Pix paper Image-to-Image Translation with Conditional Adversarial Networks develops upon ideas from the paper titled Conditional Generative Adversarial Nets introduced in 2014. The implementation of the Pix2Pix algorithm comes from a tensorflow tutorial titled pix2pix: Image-to-image translation with a conditional GAN.

Conditional GAN:

Generative adversarial nets can be extended to a conditional model if both the generator and discriminator are conditioned on some extra information y. y could be any kind of auxiliary information, such as class labels or data from other modalities. We can perform the conditioning by feeding y into the both the discriminator and generator as additional input layer. In the generator the prior distribution of input noise pz(z), and y are combined in joint hidden representation/distribution, and the adversarial training framework allows for considerable flexibility in how this joint hidden representation/distribution is composed. 3.

alt text 4.

Conditional GAN Loss function:

D and G play the following two-player minimax game with the following value function V(G, D):

alt text3.

Supervised Pix2Pix:

Supervised Pix2Pix is a conditional GAN with an additional loss constraining the generator, which the paper outlines in section 3.1 is a L1 loss rather than the traidtional L2 loss. This helps with blurring.5.

Supervised Pix2Pix Loss function:

alt text5.

alt text5.

The architecture for the generator is described by the paper as the following:

To give the generator a means to circumvent the bottleneck for information like this, we add skip connections, following the general shape of a “U-Net”... Specifically, we add skip connections between each layer i and layer n − i, where n is the total number of layers. Each skip connection simply concatenates all channels at layer i with those at layer n − i.5. The architecture for the discriminator is described by the paper as the following: [To motivate the GAN discriminator to only model high-frequency structures in image that are generated] it is sufficient to restrict our attention to the structure in local image patches. Therefore, we design a discriminator architecture – which we term a PatchGAN – that only penalizes structure at the scale of patches. This discriminator tries to classify if each N ×N patch in an image as real or fake. We run this discriminator convolutionally across the image, averaging all responses to provide the ultimate output of D.5.

Example 3: Supervised Pix2Pix for Image Segmentation on the Cityscapes Dataset

The objective of this task is to transform a set of real-world images from the Cityscapes dataset6 into semantic segmentations. The dataset contains 5,000 finely annotated images split into training, and validation sets (i.e. 2975/500 split). The dense annotation contains 30 common classes of road, person, car, etc. as detailed by the following figure 7:

alt text

Pix2Pix Model: The Pix2Pix generator was trained for 25,000 steps and used a lambada value of 1000 for the l1 loss function. Since the L1 loss regularizes the generator model to output predicted images that are plausible translations of the source image, I decided to weight it 1 order of magnitude higher than 5 especially when it came to segmenting riders(seemed to help). The following five test results were outputted detailing the some of the preditions of the semantic segmentaion generator i.e. predicted image 8.

alt text
alt text
alt text alt text

The following colab notebook can be found here: Pix2Pix The segmentation generator model weights can be found here: Segmentation Generator Model


Theoretical underpinnings of Cycle-Consistent Adverserial Networks (CycleGAN)

Unlike Pix2Pix in which paired training was required i.e. need both input and target pairs, CycleGan works on unpaired data i.e. no information is provided as to which input matches to which target.9

Unsupervised CycleGAN Loss functions:

Adversarial loss

Our objective contains two types of terms: adversarial losses for matching the distribution of generated images to the data distribution in the target domain; and cycle consistency losses to prevent the learned mappings G and F from contradicting each other

alt text

We apply an adversarial losses to both mapping functions. For the mapping function G : X → Y and its discriminator DY , we express the objective as:

alt text

where G tries to generate images G(x) that look similar to images from domain Y , while DY aims to distinguish between translated samples G(x) and real samples y. G aims to minimize this objective against an adversary D that tries to maximize it, i.e., minGmaxDY LGAN(G, DY , X, Y). We introduce a similar adversarial loss for the mapping function F : Y → X and its discriminator DX as well: i.e., minFmaxDX LGAN(F, DX, Y, X).

Cycle loss

Adversarial training can, in theory, learn mappings G and F that produce outputs identically distributed as target domains Y and X respectively. However, with large enough capacity, a network can map the same set of input images to any random permutation of images in the target domain, where any of the learned mappings can induce an output distribution that matches the target distribution. Thus, adversarial losses alone cannot guarantee that the learned function can map an individual input xi to a desired output yi Thus to reduce the space of possible mapping functions, a constraint is introduced in which the mapping functions should be cycle-consistent in the forward direction i.e. x → G(x) → F(G(x)) ≈ x and in the backward direction y → F(y) → G(F(y)) ≈ y

alt text

Identity loss

Furthermore, for mapping paintings to photos (and thus also, photos to paintings), we find that it is helpful to introduce an additional loss to encourage the mapping to preserve color composition between the input and output. In particular, we adopt the technique of Taigman et al. and regularize the generator to be near an identity mapping when real samples of the target domain are provided as the input to the generator:

alt text

Without Lidentity, the generator G and F are free to change the tint of input images when there is no need to. For example, when learning the mapping between Monet’s paintings and Flickr photographs, the generator often maps paintings of daytime to photographs taken during sunset, because such a mapping may be equally valid under the adversarial loss and cycle consistency loss

9

Example 4: Unsupervised CyleGAN w/ ResNet backbone for Image Segmentation

The objective of this task is to transform a set of real-world images from the Cityscapes dataset6 into semantic segmentations. The dataset contains 5,000 finely annotated images split into training, and validation sets (i.e. 2975/500 split). The dense annotation contains 30 common classes of road, person, car, etc. as detailed by the following figure 7:

alt text

CyleGAN Model: After reading and implementing the tensorflow tutorial on CycleGan I decided implement CycleGAN with a ResNet backbone as was done in 9. The implementation basically follows the Jason Brownlee's implementation in his article: How to Implement CycleGAN Models From Scratch With Keras Jason Brownlee.The image buffer portion, which was taken from Xiaowei-hu which can be found here: CycleGAN (I could of used Jason's image buffer implementation, but I was working with eagertensors at the time, rather than numpy arrays, and I am lazy lol). At first I wanted to follow 9 and train for 200 epochs, however, I did not realize how intensive the training would be for this model. So,instead I trained the model for 50 epochs using Adam as the optimizer with a learning rate of 0.0002 and 0.5 for the first moment of the exponential rate decay. The following five test images were generated for this portion of the training:

alt text alt text alt text alt text

The following image shows the translations from photos to segmentations and vice versa:

alt text

And to see which of the two discriminators were less fooled in regards to the image translation task, one 16x16 output patch was generated, in which values closer to one meant that the discriminator was being fooled, while values closer to zero meant that the discriminator was not being fooled by the generator:

alt text

Then I trained the model for another 50 epochs using stochastic gradient descent with the same learning rate but used linear rate decay in which the learning rate was decayed over the numer of epochs (i.e.50). The following five test images were generated:

alt text alt text alt text alt text alt text

The following image shows the translations from photos to segmentations and vice versa:

alt text

And to see which of the two discriminators were less fooled in regards to the image translation task, one 16x16 output patch was generated, in which values closer to one meant that the discriminator was being fooled, while values closer to zero meant that the discriminator was not being fooled by the generator:

alt text

The following colab notebook can be found here: CycleGAN


Theoretical underpinnings of Neural style transfer Alogithms

alt text

Neural style transfer algorithms use Convolutional Neural Networks (CNN) to do content reconstructions and style reconstructions by computing correlations between the different features in the different layers of the CNN.10 As 10 states the VGG-19 CNN Network was used, but only the 16 convolutional and 5 pooling layers were used and none of the fully connected layers were used.10 Furthermore, the max pooling operations were replaced by average pooling operations, since the authors found that the gradient flow was improved leading to better results.10 Shown in the figure below is the complete VGG-19 CNN network:

alt text

Neural Style Transfer Loss functions:

Content loss

alt text10

Style loss

alt text10 alt text10

Total loss

alt text10

Example 5: Applying Neural Style Transfer Model on my Face!!!:

After reading and implementing the tensorflow tutorial on Neural style transfer. The implementation basically followed the tutorial and the orginal paper. Instead of using the VGG-19 model from the tutorial, I used the VGG-19 model from the paper; namely, instead of using max pooling, I used average pooling as stated in 10. I also used different weighting than the tutorial and followed the paper for the ratio amounts; namely I used 1 for beta and 1000 for alpha which gave a ratio of 1x10-3. I at first tried to implement LBFGS as the sgd method, but was a lot harder than just using tensorflow's implementation lbfgs_minimize or scipy's implementation fmin_l_bfgs_b, due to the fact that both implementations rely on the data being one dimensional!!! So I just used Adam a first-order sgd method rather than LBFGS which is a second-order sgd method and trained for 10 epochs. This was the content image I used:

alt text

These were the style images I used the first one was by Pierre Auguste Renoir, titled Portrait of Claude Monet and the second one is Ip Man from Tekken 7:

alt text

alt text

These were the resulting images generated by the model after training:

alt text

alt text

The following colab notebook can be found here: NeuralStyleTransfer



Footnotes

  1. Generative Adversarial Nets 2 3 4

  2. UNSUPERVISED REPRESENTATION LEARNING WITH DEEP CONVOLUTIONAL GENERATIVE ADVERSARIAL NETWORKS

  3. Conditional Generative Adversarial Nets 2

  4. Advanced Deep Learning with Python: Design and implement advanced next-generation AI solutions using TensorFlow and PyTorch

  5. pix2pix: Image-to-image translation with a conditional GAN 2 3 4 5 6

  6. Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR. (2016) 2

  7. ICNet for Real-Time Semantic Segmentation 2

  8. Pretty good results, shows that increasing the l1 loss term provides significant improvements, especally when it comes to identifying pedestrians

  9. Cycle-Consistent Adverserial Networks (CycleGAN) 2 3 4

  10. A Neural Algorithm of Artistic Style 2 3 4 5 6 7 8 9

About

Implementing Different Types of Deep Convolutional Generative Adversarial Networks using different datasets

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published