-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The use of the loss function. #2
Comments
@huangzh13, please check out the latest commit 4522d6772dd1d56b9eb073e4ea23c51562064812 which fixes the memory leak in the "wgan-gp" loss calculation. Also, in order for WGAN-GP to work, you need to tune the learning rates of the Discriminator and Generator differently. Try the values of: Please let me know how it works out. 👍 Best regards, |
Thanks for your reply! |
@huangzh13, Yes, this is the desired behaviour. You get exactly these kinds of colour blocks on the high resolution, but could you please check the output of lower resolutions? MSG-GAN indeed has this advantage over other GANs that you can check the training of lower resolutions as well. So you get highly informative feedback throughout training. Also, just have some patience, because these colour blocks very soon will convert into your required samples. For your reference: And at the same time, at 8_x_8 the output is: Basically, the point is that the training progresses from bottoms up. And then synchronizes everywhere. Best regards, |
Am I too impatient? |
@huangzh13, As I could understand, you wrote that you were not getting results at all. If you had mentioned that you have obtained good results using About this, I have mentioned in the paper, that With RaHinge or with other Relativistic versions of losses, you don't have to tune the learning rate so much, and that's why we used RaHinge loss. Could you please share your high resolution results with RaHinge? It would be helpful for others as well. Best regards, |
Hi,
Best Regards |
@fumoffu947, Also, please do post your trained results. It will help others. Best regards, |
I have updatet to the latest code, and do not have acces to the results so easily as i have some restrictions on me. And some question regarding the training time and loss. Thanks for the quick respose. |
@akanimax, can you comment more about this? What can one expect to see being returned as "loss" for both Genenerate and Discriminator networks as the higher resolutions are looking like solid color blocks? I'm assuming that there should be some wavering loss in both networks/functions, but I'm experiencing a rapid drop to Thanks, |
@BlindElephants, Well, firstly please check the new training gif add to the readme this more clearly explains how the training takes place. BTW, please note that our MSG-GAN uses the relativistic hinge loss which is indeed a margin-adaption loss at its heart. So please don't be discouraged by seeing a value of Also, there is, unfortunately, nothing that you can make out from the values of the losses here. It's just an indicator of the two player game. Hope this helps. |
@BlindElephants I had the same color blocks for about 20 epochs before any change. |
@fumoffu947 Thanks for the reply. I ended up just letting it run for a while, despite seeing loss=0.0 on the discriminator side from the beginning. Here's a time lapse video I posted on Vimeo of training: https://vimeo.com/330681428. Source material is the movie Edge of Tomorrow (yes, I know... this great work of sci-fi action...) which was frame dumped to produce about 38,000 images. I stopped this training roughly where this video ends, so things are still quite abstract and only just starting to form recognizable shapes. But great test. Thanks @akanimax, this repo is great and super interesting. |
That is really interesting. You'd have gotten even better results with a little more training. Also, please feel free to open a PR like @huangzh13, if you'd like to share your results (through the readme). Best regards, |
I used raw frames dumped with ffmpeg from the original source video. Have not played with additional preprocessing yet. I currently am running a follow-up training session that is further along than where I ended this sample and you're right, things are getting really interesting quickly. Will open PR when appropriate to share findings. |
@BlindElephants, Best regards, |
@BlindElephants @akanimax @huangzh13 @fumoffu947 can i get a bit more info on your runs? I'm trying to run this on Colab, 1xK80 GPU... i have a 10k image dataset i want to train with, 128x128. I've set 16 as the batch_size to see if the training moved a little faster(the log output...) don't know if this is right? Should i increase the batch_size? How long, with the 2xV100 GPU does it take to train a model? say 128 or 256 image size? I'm thinking of firing up a Google Cloud instance, as i'm on free tier, to train my model. Any recomendations on specs? vCPU, RAM, 2xV100, 4xV100? Thanks and awesome work, |
@talvasconcelos I can't really say what kind of hardware you should use. I used a Quadro P5000 16Gig graphics card with a batch size of 8 (because of the huge fluctuation in memory usage before it stabilizes). As you can see in my log, one epoch took about 11 hours. I got really good results after 5 days (when I ran on medical images of size 256x256). It should probably run for longer, which will result in better image. As for the batch size. You should have as large as the graphics card allows. Larger batch sizes give better gradients, because of the more varied images, and may speed up the convergence of the model. but the batch size should not be as large as the data set :D. For a newer graphics card (than the Quadro P5000) it will probably decrease the training time significantly, as they have become faster. But as far as I know, there are no good stopping criteria for a GAN. You will have to inspect the images and decide when they are good enough, so the model might have to train for a while. (if i am wrong in this, please correct me). Best Regards |
@talvasconcelos I've run training sessions with a few different data sets, currently one set is approx 9,000 images and the other is approx 60,000 images. I'm training at 1024x1024 resolution, on a machine that has two RTX 2080ti GPUs. With that, I can do batch size = 5. For the training set that has about 9,000 images, it takes just over an hour (about 65 minutes) per epoch. Also, probably most importantly, I'm not looking to generate 100% realistic images with this, I'm a working artist and use M.L. software to generate images/video for aesthetic and conceptual goals: what I look for in the outcome of a training session may be different than yours. I've so far let this run to about epoch 200 with some really interesting results. (200 epochs == approx 200 hours == 8.33333 days). Obviously the training set with 60,000 images takes much longer per epoch, but is also achieving some interesting visual outcomes on a similar time scale. I haven't fully explored this model yet, so can't comment further. Make sure that you have |
Wow guys, thanks a lot for the input. I've had some runs of GANs made a DCGAN in Keras and tryied a few alternatives. The problem with my DCGAN is that it gives some "good" results (@BlindElephants i'm also trying to do some art stuff, so no realistic output also) at 64x64. But for 128x128 apart from taking forever, it collapses after a while. I've let the BMSG-GAN run with default settings, other than my dataset, 10k images. It's running for about 2 hours, with 16 batchSize and a 256 latent. The second batch was done after 1h35m. If you think Colab stops the notebook after 12h, it's not going to be a pretty process... Might spin the VM after all, with a couple powerfull GPUs... EDIT: I'm so stupid i was training on CPU... forgot to set runtime to GPU. Now it's taking around 5m per epoch! |
So... epoch 114 with hyperparameters: latent=256 batch_size=32 (for the first 100 epochs) now is running on 48 10k images dataset My dataset is not as homogeneous as the faces... @akanimax how long did the flowers test, the second one, with a bigger difference in pictures, ran? What were your parameters? |
@BlindElephants and @fumoffu947, Thank you so much for the help. Best regards, |
@fumoffu947, You can monitor the fid scores of the training models. I am going to include the code to monitor the fid during training itself later. Currently, you have to run a post training script to calculate the fid of all the models and then use the one with the lowest FID score. |
@akanimax I am just worried that is might have already collapsed and putting anymore training time in does not make much of a difference. The results between Epoch75 and Epoch100 on the higher res are not significant better. Should I simply give it more epochs? Or would it be better to increase the dataset size? I am using relativistic-hinge loss btw. Also tried out ProGan with this dataset it has a similar issue there as well. Which is quiet interesting because with that one I also tried it with a very different but much smaller dataset (<5k) that does produce reasonable results in even fewer training epochs. |
@Mut1nyJD, @fumoffu947, @BlindElephants, @talvasconcelos I hope this will give you more information about the training dynamics of MSG-GAN. But one thing for sure, if you are getting good results on lower resolution, they always translate to the higher resolution eventually. Best regards, |
@talvasconcelos Use a larger latent size. You might also want to try something with a very small training set ( <= 5000, or even 1000-2000) just as a test to see what happens (for your own sake, I mean). If you provide a subset of training images that all conform to a particular type or subject, the model should converge quite quickly. If you observe this by outputting periodic samples, you should be able to get an idea for what to expect when you move to a much larger training set, which will possibly follow similar convergence behavior, albeit on a longer time scale with much more varied output. |
Okay I am going to post some results soon. Indeed things got better after waiting longer. I also increased the dataset size by a further 50% but even after 350 epochs it still struggles but I wonder if that is simply because unlike most GAN test datasets it has far less homogeneity. |
Hello Akanimax, I'm using your BMSG-GAN repo for text-to-face task and mode collapses. You'd used ProGAN for this task. I've stucked with it for a long time and exhausted. Any advice for this? Thank you. |
Hi @akanimax, I am trying your project on my data (>11k images of resolution higher than 512x512) |
Hi @akanimax, I started learning about GANs recently and I found this model really cool, great job! I have a question regarding the loss function. I've been following this discussion closely and you mention that with the RAHinge loss, it's expected for the discriminator to reach 0.0 loss early on in training. Could you comment a bit about how the generator loss should behave? I'm currently training a conditional version of this GAN for medical image synthesis and I notice that the discriminator reaches 0.0 but the generator loss increases gradually, as shown below: It must be noted that in this plot, I am showing the loss per epoch (averaging the loss over all batches). Despite this behavior in the learning curves, the images look reasonable from a quick visual inspection so I am not sure whether there is some underlying issue like mode collapse or divergence. Is there a way to tell this from the learning curves? Thanks again!
|
When I use wgan-gp as a loss function, the training fails. Any explanation?
The text was updated successfully, but these errors were encountered: