In this project, we are trying to implement supervised conditional generation on CartoonSet dataset. The simple method is to use ACGAN structure with adversarial loss and auxiliary loss jointly. Moreover, we applied different GAN loss and useful techniques to improve the performance.
- Git clone the code and install package
git clone https://github.com/hsiehjackson/Cartoon-Face-Generation.git
pip install -r requirements
- Download files and extract zip file
bash download.sh
unzip download.zip
- Train ACGAN with different model (default=concat and SN)
python src/acgan_train.py [folder_name] --generator_type=concat || noconcat --discriminator_type=noSN || SN || SNPJ
- Train ACGAN with different loss (default=WGGP)
python src/acgan_train.py [folder_name] --loss_type=MM || NS || WGCP || WGGP || WGDIV
- Files Saved
./saves/ (folder for all the files we save)
./saves/[folder_name]/models/ (folder for model checkpoints)
./saves/[folder_name]/train_sample/ (folder for some sample images when training)
./saves/[folder_name]/plot.json (model training procedure)
- Test Your ACGAN
python src/acgan_test.py ./saves/[folder_name] [epoch_num] --seed=[YOUR SEED]
- Test My Best ACGAN
python src/acgan_best.py --seed=[YOUR SEED]
- Test FID Score
cd test/FID_evaluation
python run_fid.py ../../saves/[folder_name]/test_images/ep-[epoch-num]/
- Files Saved
./saves/[folder_name]/test_sample/ (sample images with specific model checkpoints)
./saves/[folder_name]/test_images/ep-[epoch_num]/ (folder for test FID images with specific model checkpoints)
- Plot training progress
python src/plot.py ./saves/[folder_name]/plot.json
The original dataset has lots of attributes including 10 artwork, 4 color, and 4 proportion, which may be too complicated to learn. Therefore, we use the preprocessed images with small size and only 4 attributes, such as hair/eye/face color and w/wo glasses. The sample image is shown in the following and the label for each attribute is an one-hot vector.
Our baseline network is ACGAN, shown in the following. However, we applied several techniques to help training GAN. Besides real images and one-hot condition, we alse need gaussian noise for the whole training procedure.
For the auxiliary loss, we use binary cross entropy loss, which we simply concatenate all one-hot encoding as the 1D condition. For the adversarial loss, we use several WGAN tricks.
Generator |
---|
Discriminator |
---|
With concatenation conditions, generator has a better ability to generate specific images. The concatenation would prevent the condition information from disappearing.
Generator with hidden concatenation conditions |
---|
Unlike concatenation with condition, using projection can enable the discriminator to only use specific condition information to determine real/fake. This method may be more powerful because each conditions has different features to tell the real/fake.
Discriminator with conditions projection for adversarial loss |
---|
Previous studies had showed spectral normalzation is so powerful to reduce mode collapse problems. We remove all the batch normalization layer and add spectral normalization layer after each convolution and linear layer on discriminator.
There are several tricks on Wasserstein Distance to make the training procedure more stable. We implement WG-CLIP, WG-DIV, and WG-GP to show the difference performance. Our default setting is WG-GP.
The best loss results for discriminator are higher fake loss and lower real loss while generator are both lower adversarial and auxillary loss.
From the following results, we can find generator with hidden concatenation condition give a more stable auxillary loss.
G without hidden condition | G with hidden condition |
---|---|
With spectral normalization, we can obtain an impressive result, which is stable and without any explosion. However, the initial procedure of projection methods may see some turbulance due to the difficulty to learn specific condition information for adversarial loss.
D with SN layer | D with SN layer + projection |
---|---|
Using clipping techniques, we can see a stable but easily-converged result which may limit the learning procedure. Considering divergence techniques, it is better than clipping but with more disturbance.
WG-CLIP | WG-DIV |
---|---|
The default setting for our stucture is using generator hidden concatenation condition and discriminator WG-GP adversarial loss.
Default | without hidden | WG Clip | WG DIV | SN | SN+Proj | |
---|---|---|---|---|---|---|
Epoch | 300 | 200 | 400 | 500 | 450 | 800 |
FID↓ | 89 | 131 | 216 | 55 | 62 | 44 |