-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Distilling doesn't work as expected. #27
Comments
Maybe you could try to increase the reconstruction loss for student training. For example, increase |
@alanspike Thanks. Let me give it a try! |
Hi, @alanspike,
This one I referred from the Jupyter notebook tutorial
|
Thanks for sharing the results. It's a bit weird since the |
Hello @alanspike , thanks for checking in. yes the teacher model works fine. This is the Tfake image of the same epoch. Below are the logs with different options. I noticed G_recon increased, and D values decreased dramatically..almost zero's. Do you think increasing the learning rate can be helpful? |
Could you try to set the weight of the adversarial loss as zero and see whether the reconstruction loss is decreasing? |
@alanspike |
Could you maybe set this loss as zero, and comment the training of discriminator here? I'm not sure about the reason so I wonder maybe we could try to remove the discriminator (without adversarial training) and just use the reconstruction and distillation loss, to see whether the |
Thank you @alanspike! While waiting for the result, I'd like to share some photos I acquired during the last distilling.
I could see these patterns for all distilling, regardless of the options. |
@alanspike Hello, I could see that |
Maybe the obtained student network is too small using the default target FLOPs for the larger-resolution. Could you try using larger FLOPs to compress? |
@alanspike sure, thanks your opinion. Let me run with larger FLOPs and update the result here. |
Hello,
Since the last question, #24, I tried 512x512 resolution training for both teacher and student models.
I found that the teacher model in 512x512 works fine, but student training is not working.
I wonder if I can get some hints why
Tfake img
Sfake img (274/1000 epoch)
training options
The text was updated successfully, but these errors were encountered: