-
Notifications
You must be signed in to change notification settings - Fork 415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
64x64 hardwired crop limitation? #2
Comments
So, the hard-wired 64x64 is in the model. Getting around it is actually trivial. Let me write a bit more detailed post, with code references. |
I look forward to that. I know that simply increasing net depth/size didn't necessarily help much in past work, but we hope that with more visible detail, maybe that will make a difference. (Perhaps our datasets are too heterogeneous or we're using the wrong hyperparameters, but we're having a hard time getting past fuzzy blobs and getting the fantastic results like the faces/rooms/flowers.) |
Ok, so the data loader is pretty generic. It has two control variables. Now, coming to the next part. To do generations of size 128, all you have to do is make the following changes: local netG = nn.Sequential()
-- input is Z, going into a convolution
netG:add(SpatialFullConvolution(nz, ngf * 16, 4, 4))
netG:add(SpatialBatchNormalization(ngf * 16)):add(nn.ReLU(true))
-- state size: (ngf*16) x 4 x 4
netG:add(SpatialFullConvolution(ngf * 16, ngf * 8, 4, 4, 2, 2, 1, 1))
netG:add(SpatialBatchNormalization(ngf * 8)):add(nn.ReLU(true))
-- state size: (ngf*8) x 8 x 8
netG:add(SpatialFullConvolution(ngf * 8, ngf * 4, 4, 4, 2, 2, 1, 1))
netG:add(SpatialBatchNormalization(ngf * 4)):add(nn.ReLU(true))
-- state size: (ngf*4) x 16 x 16
netG:add(SpatialFullConvolution(ngf * 4, ngf * 2, 4, 4, 2, 2, 1, 1))
netG:add(SpatialBatchNormalization(ngf * 2)):add(nn.ReLU(true))
-- state size: (ngf * 2) x 32 x 32
netG:add(SpatialFullConvolution(ngf * 2, ngf, 4, 4, 2, 2, 1, 1))
netG:add(SpatialBatchNormalization(ngf)):add(nn.ReLU(true))
-- state size: (ngf) x 64 x 64
netG:add(SpatialFullConvolution(ngf, nc, 4, 4, 2, 2, 1, 1))
netG:add(nn.Tanh())
-- state size: (nc) x 128 x 128 And change the discriminator similarly: local netD = nn.Sequential()
-- input is (nc) x 128 x 128
netD:add(SpatialConvolution(nc, ndf, 4, 4, 2, 2, 1, 1))
netD:add(nn.LeakyReLU(0.2, true))
-- state size: (ndf) x 64 x 64
netD:add(SpatialConvolution(ndf, ndf * 2, 4, 4, 2, 2, 1, 1))
netD:add(SpatialBatchNormalization(ndf * 2)):add(nn.LeakyReLU(0.2, true))
-- state size: (ndf*2) x 32 x 32
netD:add(SpatialConvolution(ndf * 2, ndf * 4, 4, 4, 2, 2, 1, 1))
netD:add(SpatialBatchNormalization(ndf * 4)):add(nn.LeakyReLU(0.2, true))
-- state size: (ndf*4) x 16 x 16
netD:add(SpatialConvolution(ndf * 4, ndf * 8, 4, 4, 2, 2, 1, 1))
netD:add(SpatialBatchNormalization(ndf * 8)):add(nn.LeakyReLU(0.2, true))
-- state size: (ndf*8) x 8 x 8
netD:add(SpatialConvolution(ndf * 8, ndf * 16, 4, 4, 2, 2, 1, 1))
netD:add(SpatialBatchNormalization(ndf * 16)):add(nn.LeakyReLU(0.2, true))
-- state size: (ndf*16) x 4 x 4
netD:add(SpatialConvolution(ndf * 16, 1, 4, 4))
netD:add(nn.Sigmoid())
-- state size: 1 x 1 x 1
netD:add(nn.View(1):setNumInputDims(3))
-- state size: 1 You could write a function to essentially generate both the networks automatically for a given generation size, but to keep the code more readable, I defined them manually. As you see, and unlike what you think, Hope this helps. |
OK, I see. So to expand it we just need to add another base layer where the argument is the max-size and then we adjust each 'higher' layer to tweak the numbers appropriately. So if we wanted to try out not just 128x128 but 256x256, we would just add another line and tweak accordingly? (FWIW, I seem to be getting better fuzz from 128x128, but it's too soon for me to be sure that it'll get me nice images in the end when it finishes training. Maybe tomorrow morning I'll know.)
Oh. I was a little confused because in the Eyescream page, it mentions that if you penalize the discriminator's net size by giving it a fraction of the parameters that the generator gets, you get more stable training (presumably this is because the discriminator has the easier job and too often wins in my runs), and I found that making sure that BTW, I couldn't help but wonder: discriminator vs generator reminds me a lot of actor-critic in reinforcement learning, and seems to have many of the same problems. Has anyone ever tried to make dcgans stabler by borrowing some of the techniques from there, like freezing networks and having experience-replay buffers? I mean, for example, if D's errors drop to ~0.10, where it's about to collapse and abort training, D's weights could be frozen & no more learning done until G starts to do better at detecting fakes and D's error goes up to something more reasonable like 0.5/1/2; and similarly, if G starts winning almost 100% and is about to reach 0 errors and destroy learning, it could be frozen until D learns enough to start pushing its error rates up to 1. Or a buffer of past images which fooled D could be saved to train on occasionally (to prevent a total collapse when D becomes perfect & wins). |
It does indeed seem very similar to actor-critic. There's some trial on doing the network freezing / iterative scheduled optimization. I've tried it in eyescream with no luck. It did not help things overall: But since eyescream, lots of progress happened. DCGANs might be a good candidate to try this stuff. |
Tried the resize code posted above for 128x128 and finding that the Discriminator flatlines to 0.0000 around size 10. Anything I might be missing? Changed the discriminator & generator code as well as the command line parameters as specified. Testing 128x128 as a size greater than 64x64 toward eventually trying 320x200. |
You can try changing the learning rates to favor the discriminator, or changing the filter counts, or increasing minibatch size. Alternately, you could try switching to the new improved-gan in Tensorflow which has additional tricks intended to stabilize training. (It's not that hard to rewrite the Imagenet data processing script to use whatever set of images you may have; just need to edit all the hardwired paths and delete some asserts.) In my experience so far, improved-gan works faster and better but only if you can fit minibatches of at least 32 into your GPU (and probably, ideally 64 or 128) - the catch being that somehow the codebase is wired to assume minibatches which are powers of 2 or something, as anything else crashes and minibatches of 2/4/8/16 diverge almost instantly. |
I ran into a similar issue with flatlining to zero. Setting ndf to something around ngf/2 or ngf/4 led to stable learning. (That is for 128^2) |
I also ran into the flatlining issue when trying with 128x128, so I set ndf to ngf/4. The resulting images have a lovely crisp resolution, but after 1000 epochs are very repetitive, nowhere near as much variation as when using 64x64 and keeping ndf and ngf at 64. See the attached. Trying again with ndf at ngf/2. Will report back. |
@rjpeart Have you tried using any other "tricks" like label smoothing or injecting white noise into the input of the discriminator? That also helped stabilise training for me and is a recommended "fix" for problematic networks." Also how much variation is in your dataset? |
@LukasMosser Thanks for your response. I was not aware of those tricks so haven't tried (still on that steep learning curve), but I will do, thanks for the leads! I'm using ~950 samples in this dataset, which although not huge has given me great results at 64x64px, so I was surprised at the level of repetition at a higher resolution. I guess it's because of a diminished discriminator? |
@LukasMosser adding white noise stabilised the learning perfectly. Thanks so much for your advice. For anyone else struggling with this, here's how I defined the discriminator (it's the code provided by @soumith above, but with white noise added at the 5th line down)
|
@rjpeart glad I could help! Also interesting that you add white noise after the first LeakyRelu, I added it before the first convolutional layer and it worked as well, although I believe one can define it in any layer (or all of them) except the last. Here more tricks: https://github.com/soumith/ganhacks And here an article why adding noise works (and how): http://www.inference.vc/instance-noise-a-trick-for-stabilising-gan-training/ |
@LukasMosser @rjpeart hey guys! I'm having a hard time adding the WhiteNoise at the discriminator. When I replaced the original discriminator code with @rjpeart modified code, I'm getting this error when training
Any suggestions? (also on that steep learning curve) |
@kubmin you probably didn't import the dpnn torch package. Hope that helps! |
@LukasMosser thank you! totally didn't import the required package. cheers! |
@plugimi I got up to 128, but had the repetition problems as stated above. I think you could probably work out what the lines in the generator / discriminator look like by following that pattern. However, I seem to recall reading a thread that mentioned a 512 res would be too computationally intensive to complete. Can't find the thread right now though :/ |
Could someone please provide the code for the generator / discriminator nets (in main.py) with dimensions of 256x256? I can't figure it out from the examples - i'm super new to torch :( |
Soumith, if you see this, or anyone else if you know the solution, could you verify if the following is correct for 256x256? I believe I followed the pattern correctly, although am not certain. I was able to get training to work with these changes, but it took significantly longer and even after training for much longer the results were still just fuzzy/static. Ultimately, my goal is to create much larger AI generated images. Thanks in advance if you're able to help with this. For 256x256 I changed the following training config: Changes I made to the generator: -- input is Z, going into a convolution
-- changes by John for 256x256
netG:add(SpatialFullConvolution(nz, ngf * 32, 4, 4))
netG:add(SpatialBatchNormalization(ngf * 32)):add(nn.ReLU(true))
netG:add(SpatialFullConvolution(ngf * 32, ngf * 16, 4, 4, 2, 2, 1, 1))
netG:add(SpatialBatchNormalization(ngf * 16)):add(nn.ReLU(true))
-- / end changes by John for 256x256
netG:add(SpatialFullConvolution(ngf * 16, ngf * 8, 4, 4, 2, 2, 1, 1))
netG:add(SpatialBatchNormalization(ngf * 8)):add(nn.ReLU(true))
-- state size: (ngf*8) x 8 x 8
netG:add(SpatialFullConvolution(ngf * 8, ngf * 4, 4, 4, 2, 2, 1, 1))
netG:add(SpatialBatchNormalization(ngf * 4)):add(nn.ReLU(true))
-- state size: (ngf*4) x 16 x 16
netG:add(SpatialFullConvolution(ngf * 4, ngf * 2, 4, 4, 2, 2, 1, 1))
netG:add(SpatialBatchNormalization(ngf * 2)):add(nn.ReLU(true))
-- state size: (ngf * 2) x 32 x 32
netG:add(SpatialFullConvolution(ngf * 2, ngf, 4, 4, 2, 2, 1, 1))
netG:add(SpatialBatchNormalization(ngf)):add(nn.ReLU(true))
-- state size: (ngf) x 64 x 64
netG:add(SpatialFullConvolution(ngf, nc, 4, 4, 2, 2, 1, 1))
netG:add(nn.Tanh())
-- state size: (nc) x 128 x 128 And changes I made to the discriminator: -- input is (nc) x 128 x 128
netD:add(SpatialConvolution(nc, ndf, 4, 4, 2, 2, 1, 1))
netD:add(nn.LeakyReLU(0.2, true))
-- state size: (ndf) x 64 x 64
netD:add(SpatialConvolution(ndf, ndf * 2, 4, 4, 2, 2, 1, 1))
netD:add(SpatialBatchNormalization(ndf * 2)):add(nn.LeakyReLU(0.2, true))
-- state size: (ndf*2) x 32 x 32
netD:add(SpatialConvolution(ndf * 2, ndf * 4, 4, 4, 2, 2, 1, 1))
netD:add(SpatialBatchNormalization(ndf * 4)):add(nn.LeakyReLU(0.2, true))
-- state size: (ndf*4) x 16 x 16
netD:add(SpatialConvolution(ndf * 4, ndf * 8, 4, 4, 2, 2, 1, 1))
netD:add(SpatialBatchNormalization(ndf * 8)):add(nn.LeakyReLU(0.2, true))
-- state size: (ndf*8) x 8 x 8
netD:add(SpatialConvolution(ndf * 8, ndf * 16, 4, 4, 2, 2, 1, 1))
netD:add(SpatialBatchNormalization(ndf * 16)):add(nn.LeakyReLU(0.2, true))
-- state size: (ndf*16) x 4 x 4
-- changes by John for 256x256
netD:add(SpatialConvolution(ndf * 16, ndf * 32, 4, 4, 2, 2, 1, 1))
netD:add(SpatialBatchNormalization(ndf * 32)):add(nn.LeakyReLU(0.2, true))
-- state size: (ndf*16) x 4 x 4
netD:add(SpatialConvolution(ndf * 32, 1, 4, 4))
netD:add(nn.Sigmoid())
-- state size: 1 x 1 x 1
-- / end changes by John for 256x256
netD:add(nn.View(1):setNumInputDims(3))
-- state size: 1 |
if you want large GANs, look at https://github.com/ajbrock/BigGAN-PyTorch DCGAN is a bit outdated ;-) |
Soumith, thank you so much for the reply and info! Much appreciated! |
Hi Soumith, |
Hi JohnHamell, how it works to generate 256x256? DCGAN or Big GAN, want to generate picture with 256x256 and 32x32. A lot tutorial on 32, but less on 256, have you tried your altered architecture and does it work well? |
So I and another were trying out dcgan.torch to see how well it would work on image sets more complicated than faces (kudos on writing an implementation much easier to get up and running than the original dcgan-theano, BTW; we really weren't looking forward to figuring out how to get HDF5 image input working, although some details could use work - like, why is
nThreads=1
by default?), and I became concerned that 64x64 images were just too little to convey all the details and would lead to a poorly-trained NN.Experimenting with the options, it seems that one can get dcgan.torch to work with almost the whole image by setting the full image size to be very similar to that of the crop size:
loadSize=65 fineSize=64
. Or one could downscale all the images on disk with a command likels *.jpg | parallel mogrify -resize 65536@
. (I am still trying it out but dcgan appears to make much faster progress when trained on almost-full images at 65x65 than when trained on 64x64 crops of full-resolution images.)The full image still winds up being extremely low resolution, though. Reading through
main.lua
anddonkey_folder.lua
is a little confusing. It looks as if we're supposed to be able to increase the size of trained images by increasingfineSize
and also the two parameters governing the size of the base layer of the generator & discriminator NNs, so we thought that using better images would be as simple asloadSize=256 fineSize=255 ngf=255 ndf=255
- load a decent-resolution image, crop it minimally, and feed it into the NNs of same size.But that doesn't work. In fact, we can't find a setting of
fineSize
other than 64 which doesn't immediately crash dcgan.torch regardless of what we set the other options to. Are we misunderstanding the config options' intent, or is there a bug somewhere?The text was updated successfully, but these errors were encountered: