Theoretical underpinnings of GAN and DCGAN

As detailed in the book Advanced Deep Learning with Python: Design and implement advanced next-generation AI solutions using TensorFlow and PyTorch by Ivan Vasilev. The DCGAN stems from the landmark paper introduced in 2014 titled Generative Adversarial Nets The implementation of the paper's algorithm comes from a tensorflow tutorial titled dcgan

The generator:

To learn the generator’s distribution p_g over data x, we define a prior on input noise variable p_z(z), then represent a mapping to data space as G(z;θ_g), where G is a differential function represented by a multilayer perceptron with parameters θ_g ¹. This is represented by the following code:

# function that builds the generator
def build_generator(latent_input, weight_initialization, channel):
  model = Sequential(name='generator')
  # first fully connected layer to take in 1D latent vector/tensor z
  # and output a 1D tensor of size 12,544
  model.add(keras.layers.Dense(7*7*256, input_shape=(latent_input,)))
  # applies a transformation that maintains the mean output close to 0 and the output standard deviation close to 1
  # stabilizes training after the conv layer and before the activation function
  model.add(keras.layers.BatchNormalization())
  # activation function
  model.add(keras.layers.ReLU())
  # reshape previous layer into a 3D tensor 
  model.add(keras.layers.Reshape((7, 7, 256)))
  # first layer of upsampeling(i.e. deconvolution) of the 3D tensor to output a 7x7 feature map as determined by the stride  
  model.add(keras.layers.Conv2DTranspose(filters=128, kernel_size=(5,5), strides=(1,1), padding='same', kernel_initializer=weight_initialization))
  model.add(keras.layers.BatchNormalization()) 
  model.add(keras.layers.ReLU()) 
  # second layer of upsampeling(i.e. deconvolution) in which the volume depth is reduced to 64 
  # and outputs a feature map of size 14x14 as determined by the stride
  model.add(keras.layers.Conv2DTranspose(filters=64, kernel_size=(5,5), strides=(2,2), padding='same', kernel_initializer=weight_initialization)) 
  model.add(keras.layers.BatchNormalization())
  model.add(keras.layers.ReLU()) 
  # third layer upsampeling(i.e. deconvolution) in which the volume depth is reduced to 1 and the image is output as 28x28x1
  model.add(keras.layers.Conv2DTranspose(filters=channel, kernel_size=(5,5), strides=(2,2), padding='same', activation='tanh'))
  return model

Which is represented by the below image as found in the 2016 paper titled UNSUPERVISED REPRESENTATION LEARNING WITH DEEP CONVOLUTIONAL GENERATIVE ADVERSARIAL NETWORKS ².

The discriminator:

We also define a second multilayer perceptron D(x;θ_d) that outputs a single scalar. D(x) represents the probability that x came from the data rather than p_g.¹. This is represented by the following code:

def build_discriminator(width, height, depth, alpha=0.2):
  model = Sequential(name='discriminator')
  input_shape = (height, width, depth)
  # first layer of discriminator network that downsamples image to 14x14 as determined by stride and 
  # increases depth by 64
  model.add(keras.layers.Conv2D(filters=64, kernel_size=(5, 5), strides=(2,2), padding='same', input_shape = input_shape))
  model.add(keras.layers.BatchNormalization())
  model.add(keras.layers.LeakyReLU(alpha=alpha))
  # second layer of discriminator network that downsamples image to 7x7 and increaes depth to 128
  model.add(keras.layers.Conv2D(filters=128, kernel_size=(5, 5), strides=(2,2), padding='same'))
  model.add(keras.layers.BatchNormalization())
  model.add(keras.layers.LeakyReLU(alpha=alpha))
  # flatten 3D tensor to 1D tensor of size 7*7*128 = 6727
  model.add(keras.layers.Flatten()) 
  # apply dropout of 30% before feeding it to the dense layer
  model.add(keras.layers.Dropout(0.3))
  model.add(keras.layers.Dense(1, activation='sigmoid')) 
  return model

GAN Loss function:

We train D to maximize the probability of assigning the correct label to both training examples and samples from G. We simultaneously train G to minimize log(1 − D(G(z))). In other words, D and G play the following two-player minimax game with value function V(G, D): ¹. However as the authors of the paper note this objective function does not perform in practice, since it may not provide sufficient gradients for the generator to acutally learn, especially during the early stages of learning when the discriminator is very accurate (i.e. outputing 0 rather than 1 so the gradient will be 0 and the weights of the generator will not move). So rather than training the generator to minimize log(1-D(G(z))), training is done to maximize log D(G(z)).¹.

Example 1: Deep Convolutional GAN using MINST Fashion Dataset

Using a deep convolution GAN to create fashion clothes from a Gaussian distribution trained using the MINST fashion data set. Click below on the picture of Daphne to show the video of the transformation from random noise into actual fashion clothes that I think Daphne would include in her wardrobe! 👗 (espically if it is a White Party)

The sorce code for this example can be found here: DCGAN-MINST Fashion

Example 2: Deep Convolutional GAN using the Celeb-A Faces Dataset:

Using a deep convolution GAN to create new faces from a Gaussian distribution trained using the Celeb-A Faces dataset. After just training for five epochs these were fake faces that were generated(note that some of them look realistic, especially the women with the blond hair, lol):

The sorce code for this example can be found here: DCGAN-Celeb-A Faces

Theoretical underpinnings of Conditional GAN and Supervised Pix2Pix

As detailed in the book Advanced Deep Learning with Python: Design and implement advanced next-generation AI solutions using TensorFlow and PyTorch by Ivan Vasilev. The Pix2Pix paper Image-to-Image Translation with Conditional Adversarial Networks develops upon ideas from the paper titled Conditional Generative Adversarial Nets introduced in 2014. The implementation of the Pix2Pix algorithm comes from a tensorflow tutorial titled pix2pix: Image-to-image translation with a conditional GAN.

Conditional GAN:

Generative adversarial nets can be extended to a conditional model if both the generator and discriminator are conditioned on some extra information y. y could be any kind of auxiliary information, such as class labels or data from other modalities. We can perform the conditioning by feeding y into the both the discriminator and generator as additional input layer. In the generator the prior distribution of input noise p_z(z), and y are combined in joint hidden representation/distribution, and the adversarial training framework allows for considerable flexibility in how this joint hidden representation/distribution is composed. ³.

⁴.

Conditional GAN Loss function:

D and G play the following two-player minimax game with the following value function V(G, D):

³.

Supervised Pix2Pix:

Supervised Pix2Pix is a conditional GAN with an additional loss constraining the generator, which the paper outlines in section 3.1 is a L1 loss rather than the traidtional L2 loss. This helps with blurring.⁵.

Supervised Pix2Pix Loss function:

⁵.

The architecture for the generator is described by the paper as the following:

To give the generator a means to circumvent the bottleneck for information like this, we add skip connections, following the general shape of a “U-Net”... Specifically, we add skip connections between each layer i and layer n − i, where n is the total number of layers. Each skip connection simply concatenates all channels at layer i with those at layer n − i.⁵. The architecture for the discriminator is described by the paper as the following: [To motivate the GAN discriminator to only model high-frequency structures in image that are generated] it is sufficient to restrict our attention to the structure in local image patches. Therefore, we design a discriminator architecture – which we term a PatchGAN – that only penalizes structure at the scale of patches. This discriminator tries to classify if each N ×N patch in an image as real or fake. We run this discriminator convolutionally across the image, averaging all responses to provide the ultimate output of D.⁵.

Example 3: Supervised Pix2Pix for Image Segmentation on the Cityscapes Dataset

The objective of this task is to transform a set of real-world images from the Cityscapes dataset⁶ into semantic segmentations. The dataset contains 5,000 finely annotated images split into training, and validation sets (i.e. 2975/500 split). The dense annotation contains 30 common classes of road, person, car, etc. as detailed by the following figure ⁷:

Pix2Pix Model: The Pix2Pix generator was trained for 25,000 steps and used a lambada value of 1000 for the l1 loss function. Since the L1 loss regularizes the generator model to output predicted images that are plausible translations of the source image, I decided to weight it 1 order of magnitude higher than ⁵ especially when it came to segmenting riders(seemed to help). The following five test results were outputted detailing the some of the preditions of the semantic segmentaion generator i.e. predicted image ⁸.

The following colab notebook can be found here: Pix2Pix The segmentation generator model weights can be found here: Segmentation Generator Model

Theoretical underpinnings of Cycle-Consistent Adverserial Networks (CycleGAN)

Unlike Pix2Pix in which paired training was required i.e. need both input and target pairs, CycleGan works on unpaired data i.e. no information is provided as to which input matches to which target.⁹

Unsupervised CycleGAN Loss functions:

Adversarial loss

Our objective contains two types of terms: adversarial losses for matching the distribution of generated images to the data distribution in the target domain; and cycle consistency losses to prevent the learned mappings G and F from contradicting each other

We apply an adversarial losses to both mapping functions. For the mapping function G : X → Y and its discriminator D_Y , we express the objective as:

where G tries to generate images G(x) that look similar to images from domain Y , while D_Y aims to distinguish between translated samples G(x) and real samples y. G aims to minimize this objective against an adversary D that tries to maximize it, i.e., min_Gmax_{D_Y} L_GAN(G, D_Y , X, Y). We introduce a similar adversarial loss for the mapping function F : Y → X and its discriminator D_X as well: i.e., min_Fmax_{D_X} L_GAN(F, D_X, Y, X).

Cycle loss

Adversarial training can, in theory, learn mappings G and F that produce outputs identically distributed as target domains Y and X respectively. However, with large enough capacity, a network can map the same set of input images to any random permutation of images in the target domain, where any of the learned mappings can induce an output distribution that matches the target distribution. Thus, adversarial losses alone cannot guarantee that the learned function can map an individual input x_i to a desired output y_i Thus to reduce the space of possible mapping functions, a constraint is introduced in which the mapping functions should be cycle-consistent in the forward direction i.e. x → G(x) → F(G(x)) ≈ x and in the backward direction y → F(y) → G(F(y)) ≈ y

Identity loss

Furthermore, for mapping paintings to photos (and thus also, photos to paintings), we find that it is helpful to introduce an additional loss to encourage the mapping to preserve color composition between the input and output. In particular, we adopt the technique of Taigman et al. and regularize the generator to be near an identity mapping when real samples of the target domain are provided as the input to the generator:

Without L_identity, the generator G and F are free to change the tint of input images when there is no need to. For example, when learning the mapping between Monet’s paintings and Flickr photographs, the generator often maps paintings of daytime to photographs taken during sunset, because such a mapping may be equally valid under the adversarial loss and cycle consistency loss

⁹

Example 4: Unsupervised CyleGAN w/ ResNet backbone for Image Segmentation

The objective of this task is to transform a set of real-world images from the Cityscapes dataset⁶ into semantic segmentations. The dataset contains 5,000 finely annotated images split into training, and validation sets (i.e. 2975/500 split). The dense annotation contains 30 common classes of road, person, car, etc. as detailed by the following figure ⁷:

CyleGAN Model: After reading and implementing the tensorflow tutorial on CycleGan I decided implement CycleGAN with a ResNet backbone as was done in ⁹. The implementation basically follows the Jason Brownlee's implementation in his article: How to Implement CycleGAN Models From Scratch With Keras Jason Brownlee.The image buffer portion, which was taken from Xiaowei-hu which can be found here: CycleGAN (I could of used Jason's image buffer implementation, but I was working with eagertensors at the time, rather than numpy arrays, and I am lazy lol). At first I wanted to follow ⁹ and train for 200 epochs, however, I did not realize how intensive the training would be for this model. So,instead I trained the model for 50 epochs using Adam as the optimizer with a learning rate of 0.0002 and 0.5 for the first moment of the exponential rate decay. The following five test images were generated for this portion of the training:

The following image shows the translations from photos to segmentations and vice versa:

And to see which of the two discriminators were less fooled in regards to the image translation task, one 16x16 output patch was generated, in which values closer to one meant that the discriminator was being fooled, while values closer to zero meant that the discriminator was not being fooled by the generator:

Then I trained the model for another 50 epochs using stochastic gradient descent with the same learning rate but used linear rate decay in which the learning rate was decayed over the numer of epochs (i.e.50). The following five test images were generated:

The following image shows the translations from photos to segmentations and vice versa:

And to see which of the two discriminators were less fooled in regards to the image translation task, one 16x16 output patch was generated, in which values closer to one meant that the discriminator was being fooled, while values closer to zero meant that the discriminator was not being fooled by the generator:

The following colab notebook can be found here: CycleGAN

Theoretical underpinnings of Neural style transfer Alogithms

Neural style transfer algorithms use Convolutional Neural Networks (CNN) to do content reconstructions and style reconstructions by computing correlations between the different features in the different layers of the CNN.¹⁰ As ¹⁰ states the VGG-19 CNN Network was used, but only the 16 convolutional and 5 pooling layers were used and none of the fully connected layers were used.¹⁰ Furthermore, the max pooling operations were replaced by average pooling operations, since the authors found that the gradient flow was improved leading to better results.¹⁰ Shown in the figure below is the complete VGG-19 CNN network:

Neural Style Transfer Loss functions:

Content loss

¹⁰

Style loss

¹⁰ ¹⁰

Total loss

¹⁰

Example 5: Applying Neural Style Transfer Model on my Face!!!:

After reading and implementing the tensorflow tutorial on Neural style transfer. The implementation basically followed the tutorial and the orginal paper. Instead of using the VGG-19 model from the tutorial, I used the VGG-19 model from the paper; namely, instead of using max pooling, I used average pooling as stated in ¹⁰. I also used different weighting than the tutorial and followed the paper for the ratio amounts; namely I used 1 for beta and 1000 for alpha which gave a ratio of 1x10-3. I at first tried to implement LBFGS as the sgd method, but was a lot harder than just using tensorflow's implementation lbfgs_minimize or scipy's implementation fmin_l_bfgs_b, due to the fact that both implementations rely on the data being one dimensional!!! So I just used Adam a first-order sgd method rather than LBFGS which is a second-order sgd method and trained for 10 epochs. This was the content image I used:

These were the style images I used the first one was by Pierre Auguste Renoir, titled Portrait of Claude Monet and the second one is Ip Man from Tekken 7:

These were the resulting images generated by the model after training:

The following colab notebook can be found here: NeuralStyleTransfer

Generative Adversarial Nets ↩ ↩² ↩³ ↩⁴
UNSUPERVISED REPRESENTATION LEARNING WITH DEEP CONVOLUTIONAL GENERATIVE ADVERSARIAL NETWORKS ↩
Conditional Generative Adversarial Nets ↩ ↩²
Advanced Deep Learning with Python: Design and implement advanced next-generation AI solutions using TensorFlow and PyTorch ↩
pix2pix: Image-to-image translation with a conditional GAN ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR. (2016) ↩ ↩²
ICNet for Real-Time Semantic Segmentation ↩ ↩²
Pretty good results, shows that increasing the l1 loss term provides significant improvements, especally when it comes to identifying pedestrians ↩
Cycle-Consistent Adverserial Networks (CycleGAN) ↩ ↩² ↩³ ↩⁴
A Neural Algorithm of Artistic Style ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹

Name		Name	Last commit message	Last commit date
Latest commit History 174 Commits
.gitignore		.gitignore
1.jpg		1.jpg
CycleGANv2020_ResNetCompositeBufferModel.ipynb		CycleGANv2020_ResNetCompositeBufferModel.ipynb
CycleGanLoss.png		CycleGanLoss.png
DCGAN_Faces.ipynb		DCGAN_Faces.ipynb
DCGAN_Fashion.ipynb		DCGAN_Fashion.ipynb
GramMatrix.png		GramMatrix.png
LICENSE		LICENSE
NeuralTransferModel.ipynb		NeuralTransferModel.ipynb
Pix2pix.ipynb		Pix2pix.ipynb
README.md		README.md
Screenshot 2022-05-21 1.22.59 AM.png		Screenshot 2022-05-21 1.22.59 AM.png
Screenshot 2022-05-21 2.20.44 AM.png		Screenshot 2022-05-21 2.20.44 AM.png
Screenshot 2022-05-26 11.50.28 PM.png		Screenshot 2022-05-26 11.50.28 PM.png
Screenshot 2022-05-26 11.50.51 PM.png		Screenshot 2022-05-26 11.50.51 PM.png
Screenshot 2022-05-26 8.54.57 PM.png		Screenshot 2022-05-26 8.54.57 PM.png
Screenshot 2022-07-05 9.56.31 PM.png		Screenshot 2022-07-05 9.56.31 PM.png
VGG-19.png		VGG-19.png
adverserialLoss.png		adverserialLoss.png
both_image_translation_tasks.jpg		both_image_translation_tasks.jpg
conditional_value_func.png		conditional_value_func.png
contentloss.png		contentloss.png
cycleloss.png		cycleloss.png
diffiicultiesFoolingDiscriminator.png		diffiicultiesFoolingDiscriminator.png
discriminator_output.png		discriminator_output.png
download.png		download.png
epoch1.png		epoch1.png
epoch2.png		epoch2.png
epoch3.png		epoch3.png
epoch4.png		epoch4.png
epoch5.png		epoch5.png
first_50_epochs(2).png		first_50_epochs(2).png
first_50_epochs(3).png		first_50_epochs(3).png
first_50_epochs(4).png		first_50_epochs(4).png
first_50_epochs.png		first_50_epochs.png
generators.png		generators.png
identloss.png		identloss.png
labels of colors.png		labels of colors.png
me.jpg		me.jpg
mqdefault.jpg		mqdefault.jpg
pix2pixGen1000_l1loss_25k.pb		pix2pixGen1000_l1loss_25k.pb
pix2pixGen1000_l1loss_25k_1.png		pix2pixGen1000_l1loss_25k_1.png
pix2pixGen1000_l1loss_25k_2.png		pix2pixGen1000_l1loss_25k_2.png
pix2pixGen1000_l1loss_25k_3.png		pix2pixGen1000_l1loss_25k_3.png
pix2pixGen1000_l1loss_25k_4.png		pix2pixGen1000_l1loss_25k_4.png
portrait.png		portrait.png
renior-image.png		renior-image.png
saved_model.pb		saved_model.pb
style_loss.png		style_loss.png
tekken-image.png		tekken-image.png
tekken_dude.jpeg		tekken_dude.jpeg
test_four.png		test_four.png
test_image_lineardecay.png		test_image_lineardecay.png
test_image_lineardecay_1.png		test_image_lineardecay_1.png
test_image_lineardecay_2.png		test_image_lineardecay_2.png
test_image_lineardecay_3.png		test_image_lineardecay_3.png
test_image_lineardecay_4.png		test_image_lineardecay_4.png
test_one.png		test_one.png
test_three.png		test_three.png
test_two.png		test_two.png
total_loss.png		total_loss.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Theoretical underpinnings of GAN and DCGAN

The generator:

The discriminator:

GAN Loss function:

Example 1: Deep Convolutional GAN using MINST Fashion Dataset

Example 2: Deep Convolutional GAN using the Celeb-A Faces Dataset:

Theoretical underpinnings of Conditional GAN and Supervised Pix2Pix

Conditional GAN:

Conditional GAN Loss function:

Supervised Pix2Pix:

Supervised Pix2Pix Loss function:

Example 3: Supervised Pix2Pix for Image Segmentation on the Cityscapes Dataset

Theoretical underpinnings of Cycle-Consistent Adverserial Networks (CycleGAN)

Unsupervised CycleGAN Loss functions:

Example 4: Unsupervised CyleGAN w/ ResNet backbone for Image Segmentation

Theoretical underpinnings of Neural style transfer Alogithms

Neural Style Transfer Loss functions:

Example 5: Applying Neural Style Transfer Model on my Face!!!:

About

Releases

Packages

Languages

License

AdamClarkStandke/GAN_Models

Folders and files

Latest commit

History

Repository files navigation

Theoretical underpinnings of GAN and DCGAN

The generator:

The discriminator:

GAN Loss function:

Example 1: Deep Convolutional GAN using MINST Fashion Dataset

Example 2: Deep Convolutional GAN using the Celeb-A Faces Dataset:

Theoretical underpinnings of Conditional GAN and Supervised Pix2Pix

Conditional GAN:

Conditional GAN Loss function:

Supervised Pix2Pix:

Supervised Pix2Pix Loss function:

Example 3: Supervised Pix2Pix for Image Segmentation on the Cityscapes Dataset

Theoretical underpinnings of Cycle-Consistent Adverserial Networks (CycleGAN)

Unsupervised CycleGAN Loss functions:

Example 4: Unsupervised CyleGAN w/ ResNet backbone for Image Segmentation

Theoretical underpinnings of Neural style transfer Alogithms

Neural Style Transfer Loss functions:

Example 5: Applying Neural Style Transfer Model on my Face!!!:

Footnotes

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages