The use of the loss function. #2

huangzh13 · 2019-04-02T06:49:53Z

When I use wgan-gp as a loss function, the training fails. Any explanation?

akanimax · 2019-04-02T08:48:35Z

@huangzh13, please check out the latest commit 4522d6772dd1d56b9eb073e4ea23c51562064812 which fixes the memory leak in the "wgan-gp" loss calculation. Also, in order for WGAN-GP to work, you need to tune the learning rates of the Discriminator and Generator differently. Try the values of:
g_lr = 0.003 and d_lr = 0.0003.
Hope this helps.

Please let me know how it works out. 👍

Best regards,
@akanimax

huangzh13 · 2019-04-02T13:33:07Z

Thanks for your reply!
I have checked the latest code and tuned the learning rates. I have also tried the values of g_lr=0.0003 and d_lr=0.0003. But they didn't work.
The networks tends to generate the color blocks.

akanimax · 2019-04-02T14:23:13Z

@huangzh13, Yes, this is the desired behaviour. You get exactly these kinds of colour blocks on the high resolution, but could you please check the output of lower resolutions? MSG-GAN indeed has this advantage over other GANs that you can check the training of lower resolutions as well. So you get highly informative feedback throughout training. Also, just have some patience, because these colour blocks very soon will convert into your required samples.

For your reference:
This is output of my network at highest resolution (256_x_256) at 15th epoch (data is very less 5.5K only):

And at the same time, at 8_x_8 the output is:

Basically, the point is that the training progresses from bottoms up. And then synchronizes everywhere.
Hope this helps.

Best regards,
@akanimax

huangzh13 · 2019-04-02T14:51:45Z

Am I too impatient?
I opened this issue because after the same number of epochs, the model with RelativisticAverageHinge Loss had got quite good images (both high resolution and low resolution).
Does this mean that for BMSG-GAN, RelativisticAverageHinge Loss is a more suitable loss?

akanimax · 2019-04-02T14:58:48Z

@huangzh13, As I could understand, you wrote that you were not getting results at all. If you had mentioned that you have obtained good results using RaHinge, I wouldn't have written so. Also, I didn't say that you are impatient 😄.

About this, I have mentioned in the paper, that With RaHinge or with other Relativistic versions of losses, you don't have to tune the learning rate so much, and that's why we used RaHinge loss. Could you please share your high resolution results with RaHinge? It would be helpful for others as well.

Best regards,
@akanimax

huangzh13 · 2019-04-03T04:48:13Z

I train the model on CelebA dataset for only 3 epochs. But the validity of your method has been verified.

(16X16)

(128X128)

Your paper and code help me a lot!

fumoffu947 · 2019-04-09T08:02:18Z

Hi,
I have a quick question about the loss for the two networks.
I have just started using this code to try and generate better quality images, because the paper sounded interesting.
And as you have been able to generate good images i thought this was a good place to ask, instead of opeing a new issue.
So regarding the loss: for other training of GAN you usually look at the loss for the discriminator and generator to detect collapses. Should a dropp to zero of the discriminator be interpretaded as that the discriminator was won or is that normal here?
The generator loss is also all over the place, varies between 3 and 1000.

Generator Configuration: 
Generator(
  (layers): ModuleList(
    (0): GenInitialBlock(
      (conv_1): _equalized_deconv2d(512, 512, 4, 4)
      (conv_2): _equalized_conv2d(512, 512, 3, 3)
      (pixNorm): PixelwiseNorm()
      (lrelu): LeakyReLU(negative_slope=0.2)
    )
    (1): GenGeneralConvBlock(
      (conv_1): _equalized_conv2d(512, 512, 3, 3)
      (conv_2): _equalized_conv2d(512, 512, 3, 3)
      (pixNorm): PixelwiseNorm()
      (lrelu): LeakyReLU(negative_slope=0.2)
    )
    (2): GenGeneralConvBlock(
      (conv_1): _equalized_conv2d(512, 512, 3, 3)
      (conv_2): _equalized_conv2d(512, 512, 3, 3)
      (pixNorm): PixelwiseNorm()
      (lrelu): LeakyReLU(negative_slope=0.2)
    )
    (3): GenGeneralConvBlock(
      (conv_1): _equalized_conv2d(512, 512, 3, 3)
      (conv_2): _equalized_conv2d(512, 512, 3, 3)
      (pixNorm): PixelwiseNorm()
      (lrelu): LeakyReLU(negative_slope=0.2)
    )
    (4): GenGeneralConvBlock(
      (conv_1): _equalized_conv2d(256, 512, 3, 3)
      (conv_2): _equalized_conv2d(256, 256, 3, 3)
      (pixNorm): PixelwiseNorm()
      (lrelu): LeakyReLU(negative_slope=0.2)
    )
    (5): GenGeneralConvBlock(
      (conv_1): _equalized_conv2d(128, 256, 3, 3)
      (conv_2): _equalized_conv2d(128, 128, 3, 3)
      (pixNorm): PixelwiseNorm()
      (lrelu): LeakyReLU(negative_slope=0.2)
    )
    (6): GenGeneralConvBlock(
      (conv_1): _equalized_conv2d(64, 128, 3, 3)
      (conv_2): _equalized_conv2d(64, 64, 3, 3)
      (pixNorm): PixelwiseNorm()
      (lrelu): LeakyReLU(negative_slope=0.2)
    )
  )
  (rgb_converters): ModuleList(
    (0): _equalized_conv2d(3, 512, 1, 1)
    (1): _equalized_conv2d(3, 512, 1, 1)
    (2): _equalized_conv2d(3, 512, 1, 1)
    (3): _equalized_conv2d(3, 512, 1, 1)
    (4): _equalized_conv2d(3, 256, 1, 1)
    (5): _equalized_conv2d(3, 128, 1, 1)
    (6): _equalized_conv2d(3, 64, 1, 1)
  )
)
Discriminator Configuration: 
Discriminator(
  (rgb_to_features): ModuleList(
    (0): _equalized_conv2d(256, 3, 1, 1)
    (1): _equalized_conv2d(256, 3, 1, 1)
    (2): _equalized_conv2d(256, 3, 1, 1)
    (3): _equalized_conv2d(128, 3, 1, 1)
    (4): _equalized_conv2d(64, 3, 1, 1)
    (5): _equalized_conv2d(64, 3, 1, 1)
  )
  (final_converter): _equalized_conv2d(256, 3, 1, 1)
  (layers): ModuleList(
    (0): DisGeneralConvBlock(
      (conv_1): _equalized_conv2d(512, 512, 3, 3)
      (conv_2): _equalized_conv2d(256, 512, 3, 3)
      (downSampler): AvgPool2d(kernel_size=2, stride=2, padding=0)
      (lrelu): LeakyReLU(negative_slope=0.2)
    )
    (1): DisGeneralConvBlock(
      (conv_1): _equalized_conv2d(512, 512, 3, 3)
      (conv_2): _equalized_conv2d(256, 512, 3, 3)
      (downSampler): AvgPool2d(kernel_size=2, stride=2, padding=0)
      (lrelu): LeakyReLU(negative_slope=0.2)
    )
    (2): DisGeneralConvBlock(
      (conv_1): _equalized_conv2d(512, 512, 3, 3)
      (conv_2): _equalized_conv2d(256, 512, 3, 3)
      (downSampler): AvgPool2d(kernel_size=2, stride=2, padding=0)
      (lrelu): LeakyReLU(negative_slope=0.2)
    )
    (3): DisGeneralConvBlock(
      (conv_1): _equalized_conv2d(256, 256, 3, 3)
      (conv_2): _equalized_conv2d(256, 256, 3, 3)
      (downSampler): AvgPool2d(kernel_size=2, stride=2, padding=0)
      (lrelu): LeakyReLU(negative_slope=0.2)
    )
    (4): DisGeneralConvBlock(
      (conv_1): _equalized_conv2d(128, 128, 3, 3)
      (conv_2): _equalized_conv2d(128, 128, 3, 3)
      (downSampler): AvgPool2d(kernel_size=2, stride=2, padding=0)
      (lrelu): LeakyReLU(negative_slope=0.2)
    )
    (5): DisGeneralConvBlock(
      (conv_1): _equalized_conv2d(64, 64, 3, 3)
      (conv_2): _equalized_conv2d(64, 64, 3, 3)
      (downSampler): AvgPool2d(kernel_size=2, stride=2, padding=0)
      (lrelu): LeakyReLU(negative_slope=0.2)
    )
  )
  (final_block): DisFinalBlock(
    (batch_discriminator): MinibatchStdDev()
    (conv_1): _equalized_conv2d(512, 513, 3, 3)
    (conv_2): _equalized_conv2d(512, 512, 4, 4)
    (conv_3): _equalized_conv2d(1, 512, 1, 1)
    (lrelu): LeakyReLU(negative_slope=0.2)
  )
)
Starting the training process ... 

Epoch: 1
Elapsed [0:00:09.136577] batch: 1  d_loss: 2.338377  g_loss: 28.258415
Elapsed [0:07:07.622450] batch: 498  d_loss: 0.000000  g_loss: 9.889235
Elapsed [0:13:59.471320] batch: 996  d_loss: 0.000000  g_loss: 16.424404
Elapsed [0:20:50.408592] batch: 1494  d_loss: 0.000000  g_loss: 16.538080
Elapsed [0:27:41.144892] batch: 1992  d_loss: 0.077701  g_loss: 40.352341
Elapsed [0:27:48.926970] batch: 2000  d_loss: 0.000000  g_loss: 36.448349
Elapsed [0:34:32.944078] batch: 2490  d_loss: 0.000000  g_loss: 12.044651
Elapsed [0:41:23.530313] batch: 2988  d_loss: 0.000000  g_loss: 20.246790
Elapsed [0:48:14.023905] batch: 3486  d_loss: 0.026836  g_loss: 12.155825
Elapsed [0:55:04.547208] batch: 3984  d_loss: 0.000000  g_loss: 15.979989
Elapsed [0:55:18.911984] batch: 4000  d_loss: 0.000000  g_loss: 8.796050
Elapsed [1:01:56.169209] batch: 4482  d_loss: 0.000000  g_loss: 27.396168
Elapsed [1:08:46.572035] batch: 4980  d_loss: 0.000000  g_loss: 22.475296
Elapsed [1:15:37.071554] batch: 5478  d_loss: 0.000000  g_loss: 22.913460
Elapsed [1:22:27.626737] batch: 5976  d_loss: 0.000000  g_loss: 16.043606
Elapsed [1:22:48.539515] batch: 6000  d_loss: 0.000000  g_loss: 16.721962
Elapsed [1:29:19.261423] batch: 6474  d_loss: 0.000000  g_loss: 15.449608
Elapsed [1:36:09.594290] batch: 6972  d_loss: 0.000000  g_loss: 17.748825
Elapsed [1:43:00.011452] batch: 7470  d_loss: 0.282461  g_loss: 19.857460
Elapsed [1:49:50.445767] batch: 7968  d_loss: 0.000000  g_loss: 17.677956
Elapsed [1:50:17.950175] batch: 8000  d_loss: 0.000000  g_loss: 54.902863
Elapsed [1:56:42.026659] batch: 8466  d_loss: 0.000000  g_loss: 6.403756
Elapsed [2:03:32.422947] batch: 8964  d_loss: 0.000000  g_loss: 104.009857
Elapsed [2:10:22.878551] batch: 9462  d_loss: 0.000000  g_loss: 76.007889
Elapsed [2:17:13.808884] batch: 9960  d_loss: 0.000000  g_loss: 12.342272
Elapsed [2:17:47.965314] batch: 10000  d_loss: 0.069687  g_loss: 17.976465
Elapsed [2:24:06.157861] batch: 10458  d_loss: 0.000000  g_loss: 10.958153
Elapsed [2:30:57.107763] batch: 10956  d_loss: 0.000000  g_loss: 18.901848
Elapsed [2:37:48.229310] batch: 11454  d_loss: 0.069571  g_loss: 39.505516
Elapsed [2:44:39.324656] batch: 11952  d_loss: 0.000000  g_loss: 24.722958
Elapsed [2:45:20.027688] batch: 12000  d_loss: 0.000000  g_loss: 26.929533
Elapsed [2:51:31.513562] batch: 12450  d_loss: 0.000000  g_loss: 11.466362
Elapsed [2:58:22.621024] batch: 12948  d_loss: 0.012587  g_loss: 25.722101
Elapsed [3:05:13.673413] batch: 13446  d_loss: 0.000000  g_loss: 33.185280
Elapsed [3:12:04.843716] batch: 13944  d_loss: 0.000000  g_loss: 41.325165
Elapsed [3:12:52.146983] batch: 14000  d_loss: 0.000000  g_loss: 55.505829
Elapsed [3:18:57.129070] batch: 14442  d_loss: 0.000000  g_loss: 23.927341
Elapsed [3:25:48.259257] batch: 14940  d_loss: 0.000000  g_loss: 87.579803
Elapsed [3:32:39.568090] batch: 15438  d_loss: 0.000000  g_loss: 13.608449
Elapsed [3:39:31.122553] batch: 15936  d_loss: 0.000000  g_loss: 13.062461
Elapsed [3:40:25.185228] batch: 16000  d_loss: 0.213807  g_loss: 10.851057
Elapsed [3:46:24.286060] batch: 16434  d_loss: 0.000000  g_loss: 13.655262
Elapsed [3:53:16.516315] batch: 16932  d_loss: 0.023913  g_loss: 18.987953
Elapsed [4:00:09.128467] batch: 17430  d_loss: 0.000000  g_loss: 31.275900
Elapsed [4:07:02.309726] batch: 17928  d_loss: 0.000000  g_loss: 19.924543
Elapsed [4:08:03.928182] batch: 18000  d_loss: 0.000000  g_loss: 19.720871
Elapsed [4:13:57.862766] batch: 18426  d_loss: 0.000000  g_loss: 31.201002
Elapsed [4:20:51.300325] batch: 18924  d_loss: 0.000000  g_loss: 11.246223
Elapsed [4:27:44.639170] batch: 19422  d_loss: 0.000000  g_loss: 21.002312
Elapsed [4:34:37.392029] batch: 19920  d_loss: 0.000000  g_loss: 24.066872
Elapsed [4:35:45.667338] batch: 20000  d_loss: 0.000000  g_loss: 20.162798
Elapsed [4:41:33.112231] batch: 20418  d_loss: 0.000000  g_loss: 8.949071
Elapsed [4:48:27.750149] batch: 20916  d_loss: 0.000000  g_loss: 20.048477
Elapsed [4:55:22.106759] batch: 21414  d_loss: 0.000000  g_loss: 7.924605
Elapsed [5:02:16.560692] batch: 21912  d_loss: 0.000000  g_loss: 23.552919
Elapsed [5:03:32.165504] batch: 22000  d_loss: 0.000000  g_loss: 23.629990
Elapsed [5:09:14.060563] batch: 22410  d_loss: 0.000000  g_loss: 11.198383
Elapsed [5:16:08.628691] batch: 22908  d_loss: 0.000000  g_loss: 18.672043
Elapsed [5:23:02.965504] batch: 23406  d_loss: 0.000000  g_loss: 13.144236
Elapsed [5:29:57.806316] batch: 23904  d_loss: 0.000000  g_loss: 16.654057
Elapsed [5:31:20.712796] batch: 24000  d_loss: 0.000000  g_loss: 17.913643
Elapsed [5:36:56.825469] batch: 24402  d_loss: 0.925256  g_loss: 9.338171
Elapsed [5:43:52.270629] batch: 24900  d_loss: 0.000000  g_loss: 15.507483
Elapsed [5:50:47.319940] batch: 25398  d_loss: 0.000000  g_loss: 9.522266
Elapsed [5:57:42.477147] batch: 25896  d_loss: 0.024321  g_loss: 10.962959
Elapsed [5:59:11.415684] batch: 26000  d_loss: 0.000000  g_loss: 14.562057
Elapsed [6:04:40.540057] batch: 26394  d_loss: 0.000000  g_loss: 13.476106
Elapsed [6:11:35.331849] batch: 26892  d_loss: 0.001523  g_loss: 6.066289
Elapsed [6:18:31.255044] batch: 27390  d_loss: 0.000000  g_loss: 9.424282
Elapsed [6:25:27.569852] batch: 27888  d_loss: 0.615839  g_loss: 22.649738
Elapsed [6:27:04.382164] batch: 28000  d_loss: 0.000000  g_loss: 13.887462
Elapsed [6:32:27.148761] batch: 28386  d_loss: 0.000000  g_loss: 11.166329
Elapsed [6:39:22.719316] batch: 28884  d_loss: 0.000000  g_loss: 12.516731
Elapsed [6:46:18.041246] batch: 29382  d_loss: 0.000000  g_loss: 45.305389
Elapsed [6:53:13.288387] batch: 29880  d_loss: 0.000000  g_loss: 17.673706
Elapsed [6:54:56.054860] batch: 30000  d_loss: 0.000000  g_loss: 11.626278
Elapsed [7:00:11.837390] batch: 30378  d_loss: 0.000000  g_loss: 10.341488
Elapsed [7:07:06.584213] batch: 30876  d_loss: 0.000000  g_loss: 12.946634
Elapsed [7:14:00.786104] batch: 31374  d_loss: 0.000000  g_loss: 15.844328
Elapsed [7:20:54.952582] batch: 31872  d_loss: 0.733477  g_loss: 12.231483
Elapsed [7:22:43.626915] batch: 32000  d_loss: 0.998826  g_loss: 19.025818
Elapsed [7:27:52.113463] batch: 32370  d_loss: 0.009136  g_loss: 11.359642
Elapsed [7:34:46.386962] batch: 32868  d_loss: 0.000000  g_loss: 10.601990
Elapsed [7:41:40.555202] batch: 33366  d_loss: 0.000000  g_loss: 7.994129
Elapsed [7:48:34.853510] batch: 33864  d_loss: 0.000000  g_loss: 12.961597
Elapsed [7:50:31.224009] batch: 34000  d_loss: 0.000000  g_loss: 25.971003
Elapsed [7:55:33.052553] batch: 34362  d_loss: 0.000000  g_loss: 9.010212
Elapsed [8:02:27.217079] batch: 34860  d_loss: 0.000000  g_loss: 14.558136
Elapsed [8:09:21.306493] batch: 35358  d_loss: 0.009271  g_loss: 9.470531
Elapsed [8:16:15.476907] batch: 35856  d_loss: 0.000000  g_loss: 15.962500
Elapsed [8:18:17.330305] batch: 36000  d_loss: 0.000000  g_loss: 13.263047
Elapsed [8:23:12.587199] batch: 36354  d_loss: 0.000000  g_loss: 9.665047
Elapsed [8:30:06.782322] batch: 36852  d_loss: 0.000000  g_loss: 9.369083
Elapsed [8:37:01.056768] batch: 37350  d_loss: 0.000000  g_loss: 20.779835
Elapsed [8:43:55.301317] batch: 37848  d_loss: 0.000000  g_loss: 12.678933
Elapsed [8:46:03.871824] batch: 38000  d_loss: 0.000000  g_loss: 11.158394
Elapsed [8:50:52.478865] batch: 38346  d_loss: 0.000000  g_loss: 22.158953
Elapsed [8:57:46.757418] batch: 38844  d_loss: 0.261001  g_loss: 15.882168
Elapsed [9:04:40.888215] batch: 39342  d_loss: 0.000000  g_loss: 13.961426
Elapsed [9:11:34.938936] batch: 39840  d_loss: 0.000000  g_loss: 30.813374
Elapsed [9:13:50.013406] batch: 40000  d_loss: 0.000000  g_loss: 17.485420
Elapsed [9:18:31.741201] batch: 40338  d_loss: 0.050298  g_loss: 10.827607
Elapsed [9:25:25.269253] batch: 40836  d_loss: 0.000000  g_loss: 37.008732
Elapsed [9:32:18.905305] batch: 41334  d_loss: 0.000000  g_loss: 12.677896
Elapsed [9:39:12.592458] batch: 41832  d_loss: 0.000000  g_loss: 6.966478
Elapsed [9:41:34.326714] batch: 42000  d_loss: 0.000000  g_loss: 18.496456
Elapsed [9:46:09.368904] batch: 42330  d_loss: 0.000000  g_loss: 16.485332
Elapsed [9:53:02.944594] batch: 42828  d_loss: 0.000000  g_loss: 23.858597
Elapsed [9:59:56.496192] batch: 43326  d_loss: 0.000000  g_loss: 15.514748
Elapsed [10:06:49.922014] batch: 43824  d_loss: 0.000000  g_loss: 41.753380
Elapsed [10:09:18.270949] batch: 44000  d_loss: 0.000000  g_loss: 15.598457
Elapsed [10:13:46.774287] batch: 44322  d_loss: 0.118957  g_loss: 18.771698
Elapsed [10:20:40.530525] batch: 44820  d_loss: 0.000000  g_loss: 16.245186
Elapsed [10:27:34.159738] batch: 45318  d_loss: 0.000000  g_loss: 23.490396
Elapsed [10:34:27.848536] batch: 45816  d_loss: 0.021629  g_loss: 8.572326
Elapsed [10:37:02.696231] batch: 46000  d_loss: 0.000000  g_loss: 10.857470
Elapsed [10:41:24.695902] batch: 46314  d_loss: 0.000000  g_loss: 29.556044
Elapsed [10:48:18.148562] batch: 46812  d_loss: 0.030885  g_loss: 7.525697
Elapsed [10:55:11.625591] batch: 47310  d_loss: 0.045138  g_loss: 9.609584
Elapsed [11:02:05.220831] batch: 47808  d_loss: 0.129427  g_loss: 18.386642
Elapsed [11:04:46.484853] batch: 48000  d_loss: 0.000000  g_loss: 7.450067
Elapsed [11:09:01.766081] batch: 48306  d_loss: 0.000000  g_loss: 30.403725
Elapsed [11:15:55.357354] batch: 48804  d_loss: 0.000000  g_loss: 16.723495
Elapsed [11:22:48.939699] batch: 49302  d_loss: 0.000000  g_loss: 24.858213
Elapsed [11:29:42.613116] batch: 49800  d_loss: 0.000000  g_loss: 22.653553
Time taken for epoch: 41416.458 secs

Epoch: 2
Elapsed [11:30:18.414299] batch: 1  d_loss: 0.674042  g_loss: 14.901873
Elapsed [11:37:11.788918] batch: 498  d_loss: 0.000000  g_loss: 10.384844
Elapsed [11:44:05.690407] batch: 996  d_loss: 0.000000  g_loss: 21.126076
Elapsed [11:50:59.423473] batch: 1494  d_loss: 0.030664  g_loss: 11.375048
Elapsed [11:57:53.316281] batch: 1992  d_loss: 0.000000  g_loss: 15.586699
Elapsed [11:58:03.453155] batch: 2000  d_loss: 0.000000  g_loss: 18.800297
Elapsed [12:04:50.764074] batch: 2490  d_loss: 0.054183  g_loss: 9.096399
Elapsed [12:11:44.710594] batch: 2988  d_loss: 0.000000  g_loss: 9.935973
Elapsed [12:18:38.594535] batch: 3486  d_loss: 0.000000  g_loss: 9.296963
Elapsed [12:25:32.569276] batch: 3984  d_loss: 0.000000  g_loss: 17.901390
Elapsed [12:25:48.936851] batch: 4000  d_loss: 0.000000  g_loss: 15.580759
Elapsed [12:32:29.615201] batch: 4482  d_loss: 0.000000  g_loss: 16.824059
Elapsed [12:39:23.649362] batch: 4980  d_loss: 0.128205  g_loss: 14.636445
Elapsed [12:46:17.547570] batch: 5478  d_loss: 0.000000  g_loss: 12.853954
Elapsed [12:53:11.369015] batch: 5976  d_loss: 0.000000  g_loss: 21.049076
Elapsed [12:53:34.405860] batch: 6000  d_loss: 0.000000  g_loss: 11.767218
Elapsed [13:00:08.577302] batch: 6474  d_loss: 1.263216  g_loss: 9.647612
Elapsed [13:07:02.431205] batch: 6972  d_loss: 0.000000  g_loss: 9.879293
Elapsed [13:13:56.277202] batch: 7470  d_loss: 0.000000  g_loss: 14.576007
Elapsed [13:20:50.128382] batch: 7968  d_loss: 0.000000  g_loss: 25.199493
Elapsed [13:21:19.857440] batch: 8000  d_loss: 0.000000  g_loss: 12.980972
Elapsed [13:27:47.385592] batch: 8466  d_loss: 0.000000  g_loss: 11.132238
Elapsed [13:34:41.165060] batch: 8964  d_loss: 0.000000  g_loss: 16.122200
Elapsed [13:41:35.135643] batch: 9462  d_loss: 0.000000  g_loss: 12.721147
Elapsed [13:48:28.965222] batch: 9960  d_loss: 0.016159  g_loss: 12.740223
Elapsed [13:49:05.014303] batch: 10000  d_loss: 0.000000  g_loss: 17.202229
Elapsed [13:55:25.683578] batch: 10458  d_loss: 0.000000  g_loss: 12.412262
Elapsed [14:02:19.459729] batch: 10956  d_loss: 0.000000  g_loss: 17.442894
Elapsed [14:09:13.393001] batch: 11454  d_loss: 0.018738  g_loss: 13.172106
Elapsed [14:16:07.635433] batch: 11952  d_loss: 0.000000  g_loss: 10.256526
Elapsed [14:16:50.287763] batch: 12000  d_loss: 0.000000  g_loss: 6.879043
Elapsed [14:23:04.523150] batch: 12450  d_loss: 0.000000  g_loss: 9.990157
Elapsed [14:29:58.356799] batch: 12948  d_loss: 0.048164  g_loss: 11.465200
Elapsed [14:36:52.113379] batch: 13446  d_loss: 0.000000  g_loss: 9.312673
Elapsed [14:43:45.859499] batch: 13944  d_loss: 0.000000  g_loss: 12.853970
Elapsed [14:44:35.146990] batch: 14000  d_loss: 0.000000  g_loss: 6.012763
Elapsed [14:50:42.832900] batch: 14442  d_loss: 0.000000  g_loss: 20.063652
Elapsed [14:57:36.719575] batch: 14940  d_loss: 0.000000  g_loss: 21.113297
Elapsed [15:04:30.675369] batch: 15438  d_loss: 0.086522  g_loss: 15.928454
Elapsed [15:11:24.443290] batch: 15936  d_loss: 0.000000  g_loss: 8.222857
Elapsed [15:12:20.295643] batch: 16000  d_loss: 0.000000  g_loss: 14.333256
Elapsed [15:18:21.365570] batch: 16434  d_loss: 0.182449  g_loss: 15.999907
Elapsed [15:25:15.231426] batch: 16932  d_loss: 0.130269  g_loss: 16.058449
Elapsed [15:32:09.051052] batch: 17430  d_loss: 0.000000  g_loss: 23.200623
Elapsed [15:39:03.329984] batch: 17928  d_loss: 0.014015  g_loss: 8.625858
Elapsed [15:40:05.799918] batch: 18000  d_loss: 0.000000  g_loss: 17.969467
Elapsed [15:46:01.351715] batch: 18426  d_loss: 0.000000  g_loss: 22.537891
Elapsed [15:52:55.099180] batch: 18924  d_loss: 0.000000  g_loss: 13.364973
Elapsed [15:59:48.851702] batch: 19422  d_loss: 0.000000  g_loss: 11.498882

Best Regards

akanimax · 2019-04-09T08:09:49Z

@fumoffu947,
Well, this is a characteristic of the RelativisticHinge loss. For most of the training, the discriminator loss value remains 0 (adds to the stability of the training). It is indeed expected behaviour.
Also, please update your code to the latest changes. I have added colour correction to the generated samples.

Also, please do post your trained results. It will help others.
Hope this helps.

Best regards,
@akanimax

fumoffu947 · 2019-04-09T08:25:48Z

I have updatet to the latest code, and do not have acces to the results so easily as i have some restrictions on me.
Will update the previous post with the trained results when i have acces to it.

And some question regarding the training time and loss.
Will the use of this structure of training (and networks) increase the training time, as it learns from the bottom up? (thus have more epochs neede for good results)
And will you see a more stable loss for the generator as the layers will learn to generate better images? (or is the high variance a characteristic of the ReHinge loss)

Thanks for the quick respose.
Best Regards

BlindElephants · 2019-04-15T14:37:28Z

@huangzh13, Yes, this is the desired behaviour. You get exactly these kinds of colour blocks on the high resolution, but could you please check the output of lower resolutions? MSG-GAN indeed has this advantage over other GANs that you can check the training of lower resolutions as well. So you get highly informative feedback throughout training. Also, just have some patience, because these colour blocks very soon will convert into your required samples.

For your reference:
This is output of my network at highest resolution (256_x_256) at 15th epoch (data is very less 5.5K only):

And at the same time, at 8_x_8 the output is:

Basically, the point is that the training progresses from bottoms up. And then synchronizes everywhere.
Hope this helps.

Best regards,
@akanimax

@akanimax, can you comment more about this? What can one expect to see being returned as "loss" for both Genenerate and Discriminator networks as the higher resolutions are looking like solid color blocks? I'm assuming that there should be some wavering loss in both networks/functions, but I'm experiencing a rapid drop to Loss = 0.0 for Discriminator, no matter what learning rates I set. Currently testing with the Generator lr = 2 to 5 times the lr of the Discriminator.

Thanks,

akanimax · 2019-04-16T09:36:40Z

@BlindElephants, Well, firstly please check the new training gif add to the readme this more clearly explains how the training takes place. BTW, please note that our MSG-GAN uses the relativistic hinge loss which is indeed a margin-adaption loss at its heart. So please don't be discouraged by seeing a value of 0.0 as the discriminator loss. It is highly expected behaviour. This is unlike the other loss functions like the WGAN-GP where a 0.0 discriminator loss would indicate a complete collapse of training and no further training could happen (vanishing gradients). In our case though, 0.0 discriminator loss is a good sign of stability. Please do not spend time in cherry picking the learning rates. This was the main motivation behind our work. Please use the default values and let the model train. You will get good results.

Also, there is, unfortunately, nothing that you can make out from the values of the losses here. It's just an indicator of the two player game.

Hope this helps.

@akanimax

fumoffu947 · 2019-04-16T09:53:24Z

@BlindElephants I had the same color blocks for about 20 epochs before any change.
But the change came be seen in the lower layer long before it is seen in the last layer, as said by @akanimax. So look for changes in the previous layers first.

BlindElephants · 2019-04-21T13:19:19Z

@fumoffu947 Thanks for the reply. I ended up just letting it run for a while, despite seeing loss=0.0 on the discriminator side from the beginning.
You're totally right, the solid color patches precede some really interesting developments, and indeed, I did see changes and more detail in lower layers first.

Here's a time lapse video I posted on Vimeo of training: https://vimeo.com/330681428. Source material is the movie Edge of Tomorrow (yes, I know... this great work of sci-fi action...) which was frame dumped to produce about 38,000 images. I stopped this training roughly where this video ends, so things are still quite abstract and only just starting to form recognizable shapes. But great test.

Thanks @akanimax, this repo is great and super interesting.

akanimax · 2019-04-23T12:39:21Z

@BlindElephants,

That is really interesting. You'd have gotten even better results with a little more training.
BTW, did you use the raw frames or did some preprocessing to the extracted images before training?

Also, please feel free to open a PR like @huangzh13, if you'd like to share your results (through the readme).

Best regards,
@akanimax

BlindElephants · 2019-04-23T16:25:25Z

I used raw frames dumped with ffmpeg from the original source video. Have not played with additional preprocessing yet.

I currently am running a follow-up training session that is further along than where I ended this sample and you're right, things are getting really interesting quickly. Will open PR when appropriate to share findings.

akanimax · 2019-04-23T16:56:22Z

@BlindElephants,
Glad to know that.
Happy to help.

Best regards,
@akanimax

talvasconcelos · 2019-06-05T12:05:58Z

@BlindElephants @akanimax @huangzh13 @fumoffu947 can i get a bit more info on your runs?

I'm trying to run this on Colab, 1xK80 GPU... i have a 10k image dataset i want to train with, 128x128. I've set 16 as the batch_size to see if the training moved a little faster(the log output...) don't know if this is right? Should i increase the batch_size?

How long, with the 2xV100 GPU does it take to train a model? say 128 or 256 image size?

I'm thinking of firing up a Google Cloud instance, as i'm on free tier, to train my model. Any recomendations on specs? vCPU, RAM, 2xV100, 4xV100?

Thanks and awesome work,
Tiago

fumoffu947 · 2019-06-05T13:10:41Z

@talvasconcelos I can't really say what kind of hardware you should use. I used a Quadro P5000 16Gig graphics card with a batch size of 8 (because of the huge fluctuation in memory usage before it stabilizes). As you can see in my log, one epoch took about 11 hours. I got really good results after 5 days (when I ran on medical images of size 256x256). It should probably run for longer, which will result in better image.

As for the batch size. You should have as large as the graphics card allows. Larger batch sizes give better gradients, because of the more varied images, and may speed up the convergence of the model. but the batch size should not be as large as the data set :D.

For a newer graphics card (than the Quadro P5000) it will probably decrease the training time significantly, as they have become faster. But as far as I know, there are no good stopping criteria for a GAN. You will have to inspect the images and decide when they are good enough, so the model might have to train for a while. (if i am wrong in this, please correct me).

Best Regards

BlindElephants · 2019-06-05T13:23:57Z

@talvasconcelos I've run training sessions with a few different data sets, currently one set is approx 9,000 images and the other is approx 60,000 images.

I'm training at 1024x1024 resolution, on a machine that has two RTX 2080ti GPUs. With that, I can do batch size = 5.

For the training set that has about 9,000 images, it takes just over an hour (about 65 minutes) per epoch.

Also, probably most importantly, I'm not looking to generate 100% realistic images with this, I'm a working artist and use M.L. software to generate images/video for aesthetic and conceptual goals: what I look for in the outcome of a training session may be different than yours.

I've so far let this run to about epoch 200 with some really interesting results. (200 epochs == approx 200 hours == 8.33333 days).

Obviously the training set with 60,000 images takes much longer per epoch, but is also achieving some interesting visual outcomes on a similar time scale. I haven't fully explored this model yet, so can't comment further.

Make sure that you have cudnn benchmark enabled. Parallelize what you can if you have multiple GPUs. And ensure that your data loader is set up properly with enough workers that it will keep up with your needs.

talvasconcelos · 2019-06-05T13:43:01Z

Wow guys, thanks a lot for the input. I've had some runs of GANs made a DCGAN in Keras and tryied a few alternatives. The problem with my DCGAN is that it gives some "good" results (@BlindElephants i'm also trying to do some art stuff, so no realistic output also) at 64x64. But for 128x128 apart from taking forever, it collapses after a while.

I've let the BMSG-GAN run with default settings, other than my dataset, 10k images. It's running for about 2 hours, with 16 batchSize and a 256 latent. The second batch was done after 1h35m. If you think Colab stops the notebook after 12h, it's not going to be a pretty process...

Might spin the VM after all, with a couple powerfull GPUs...

EDIT: I'm so stupid i was training on CPU... forgot to set runtime to GPU. Now it's taking around 5m per epoch!

talvasconcelos · 2019-06-06T16:22:38Z

So... epoch 114 with hyperparameters: latent=256 batch_size=32 (for the first 100 epochs) now is running on 48 10k images dataset

My dataset is not as homogeneous as the faces... @akanimax how long did the flowers test, the second one, with a bigger difference in pictures, ran? What were your parameters?

akanimax · 2019-06-07T05:21:29Z

@BlindElephants and @fumoffu947, Thank you so much for the help.
@talvasconcelos,
Your latent size 256 is too small. Please increase it to 512 at least. You will have to reduce your batch size appropriately. Rest everything seems fine to me. Your data is 10K right? I suppose you should start getting good results at around 1000 epochs. I have trained the oxford-flowers 8K dataset for 3000 epochs. And obtained good results at around 800-1000 epochs.
Hope this helps.

Best regards,
@akanimax

akanimax · 2019-06-07T05:23:14Z

@talvasconcelos I can't really say what kind of hardware you should use. I used a Quadro P5000 16Gig graphics card with a batch size of 8 (because of the huge fluctuation in memory usage before it stabilizes). As you can see in my log, one epoch took about 11 hours. I got really good results after 5 days (when I ran on medical images of size 256x256). It should probably run for longer, which will result in better image.

As for the batch size. You should have as large as the graphics card allows. Larger batch sizes give better gradients, because of the more varied images, and may speed up the convergence of the model. but the batch size should not be as large as the data set :D.

For a newer graphics card (than the Quadro P5000) it will probably decrease the training time significantly, as they have become faster. But as far as I know, there are no good stopping criteria for a GAN. You will have to inspect the images and decide when they are good enough, so the model might have to train for a while. (if i am wrong in this, please correct me).

Best Regards

@fumoffu947, You can monitor the fid scores of the training models. I am going to include the code to monitor the fid during training itself later. Currently, you have to run a post training script to calculate the fid of all the models and then use the one with the lowest FID score.

Mut1nyJD · 2019-06-07T07:13:45Z

@akanimax
This is an interesting discussion as I am running into similar issues my dataset is about 10k. (512x512)
After 50 epochs I see reasonable structural results up to 64x64 but even after 100 epochs anything above that level still looks very wongy none of the finner details seems to be translated correctly and there are clear blob shaped artifacts all over the place?

I am just worried that is might have already collapsed and putting anymore training time in does not make much of a difference. The results between Epoch75 and Epoch100 on the higher res are not significant better. Should I simply give it more epochs? Or would it be better to increase the dataset size?

I am using relativistic-hinge loss btw.

Also tried out ProGan with this dataset it has a similar issue there as well. Which is quiet interesting because with that one I also tried it with a very different but much smaller dataset (<5k) that does produce reasonable results in even fewer training epochs.

akanimax · 2019-06-07T08:28:29Z

@Mut1nyJD, @fumoffu947, @BlindElephants, @talvasconcelos
The dynamics of MSG-GAN are a bit unique. So, from my experience you usually need around 1000 epochs to get good (crisp and clear results like the ones displayed on the Repo's README) for below 10K dataset. The info related to how long all the datasets (Flowers, CelebA, Lsun bedrooms, CelebA-HQ, etc.) were trained for is elaborated in the Supplementary material which is not included in the ArXiv version of our paper yet. But for your reference guys, I am sharing the FID plot of the Oxford flowers run here since the dataset size of Oxford flowers (8K) is quite similar to the ones you are experimenting with.

I hope this will give you more information about the training dynamics of MSG-GAN. But one thing for sure, if you are getting good results on lower resolution, they always translate to the higher resolution eventually.

Best regards,
@akanimax

BlindElephants · 2019-06-09T16:01:48Z

@talvasconcelos Use a larger latent size.

You might also want to try something with a very small training set ( <= 5000, or even 1000-2000) just as a test to see what happens (for your own sake, I mean). If you provide a subset of training images that all conform to a particular type or subject, the model should converge quite quickly. If you observe this by outputting periodic samples, you should be able to get an idea for what to expect when you move to a much larger training set, which will possibly follow similar convergence behavior, albeit on a longer time scale with much more varied output.

Mut1nyJD · 2019-07-09T12:55:55Z

Okay I am going to post some results soon. Indeed things got better after waiting longer. I also increased the dataset size by a further 50% but even after 350 epochs it still struggles but I wonder if that is simply because unlike most GAN test datasets it has far less homogeneity.
Faces simply are too easy. :)
BMSG-GAN is doing a lot better on this dataset than ProGAN though which even with relativistic hinge loss collapsed before it reached the final resolution. I think finding the right number of epochs for each resolution step with ProGAN is tricky.
I am training StyleGAN to compare which seems to fair better than ProGAN.

nguyenpbui · 2020-06-16T03:03:38Z

Hello Akanimax, I'm using your BMSG-GAN repo for text-to-face task and mode collapses. You'd used ProGAN for this task. I've stucked with it for a long time and exhausted. Any advice for this? Thank you.

MaratZakirov · 2020-09-03T09:16:44Z

Hi @akanimax, I am trying your project on my data (>11k images of resolution higher than 512x512)

Could you please suggest me easiest way to make it work? I started it but got bad results possibly due to small number of epochs (I used 100). Recently I started again with all defaults but num_epochs=2000. Epoch takes a lot of time in my setup (2x 2080 Ti), could you please kindly suggest me optimal parameters for my issue?
I particularly interested in latent_size=512 which as I understood determines overall model size? but also I am interested in more general question do GAN need more than classic 100 element z noise and if it does why it does?
I also have some more complex and more general question: I need a way to place some predefined objects (digits in my case) in image, how could I possibly do that? Could I produce some layout (using some other gan or just using some formula) and make something like "detector" in critic to add loss on detections?

Aveturi13 · 2021-07-10T11:20:58Z

Hi @akanimax, I started learning about GANs recently and I found this model really cool, great job! I have a question regarding the loss function. I've been following this discussion closely and you mention that with the RAHinge loss, it's expected for the discriminator to reach 0.0 loss early on in training. Could you comment a bit about how the generator loss should behave? I'm currently training a conditional version of this GAN for medical image synthesis and I notice that the discriminator reaches 0.0 but the generator loss increases gradually, as shown below:

It must be noted that in this plot, I am showing the loss per epoch (averaging the loss over all batches).

Despite this behavior in the learning curves, the images look reasonable from a quick visual inspection so I am not sure whether there is some underlying issue like mode collapse or divergence. Is there a way to tell this from the learning curves?

Thanks again!

Advaith

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The use of the loss function. #2

The use of the loss function. #2

huangzh13 commented Apr 2, 2019

akanimax commented Apr 2, 2019

huangzh13 commented Apr 2, 2019 •

edited

Loading

akanimax commented Apr 2, 2019

huangzh13 commented Apr 2, 2019

akanimax commented Apr 2, 2019

huangzh13 commented Apr 3, 2019

fumoffu947 commented Apr 9, 2019 •

edited

Loading

akanimax commented Apr 9, 2019

fumoffu947 commented Apr 9, 2019 •

edited

Loading

BlindElephants commented Apr 15, 2019

akanimax commented Apr 16, 2019

fumoffu947 commented Apr 16, 2019

BlindElephants commented Apr 21, 2019

akanimax commented Apr 23, 2019

BlindElephants commented Apr 23, 2019

akanimax commented Apr 23, 2019

talvasconcelos commented Jun 5, 2019

fumoffu947 commented Jun 5, 2019

BlindElephants commented Jun 5, 2019

talvasconcelos commented Jun 5, 2019 •

edited

Loading

talvasconcelos commented Jun 6, 2019

akanimax commented Jun 7, 2019

akanimax commented Jun 7, 2019

Mut1nyJD commented Jun 7, 2019

akanimax commented Jun 7, 2019

BlindElephants commented Jun 9, 2019

Mut1nyJD commented Jul 9, 2019

nguyenpbui commented Jun 16, 2020

MaratZakirov commented Sep 3, 2020

Aveturi13 commented Jul 10, 2021

The use of the loss function. #2

The use of the loss function. #2

Comments

huangzh13 commented Apr 2, 2019

akanimax commented Apr 2, 2019

huangzh13 commented Apr 2, 2019 • edited Loading

akanimax commented Apr 2, 2019

huangzh13 commented Apr 2, 2019

akanimax commented Apr 2, 2019

huangzh13 commented Apr 3, 2019

fumoffu947 commented Apr 9, 2019 • edited Loading

akanimax commented Apr 9, 2019

fumoffu947 commented Apr 9, 2019 • edited Loading

BlindElephants commented Apr 15, 2019

akanimax commented Apr 16, 2019

fumoffu947 commented Apr 16, 2019

BlindElephants commented Apr 21, 2019

akanimax commented Apr 23, 2019

BlindElephants commented Apr 23, 2019

akanimax commented Apr 23, 2019

talvasconcelos commented Jun 5, 2019

fumoffu947 commented Jun 5, 2019

BlindElephants commented Jun 5, 2019

talvasconcelos commented Jun 5, 2019 • edited Loading

talvasconcelos commented Jun 6, 2019

akanimax commented Jun 7, 2019

akanimax commented Jun 7, 2019

Mut1nyJD commented Jun 7, 2019

akanimax commented Jun 7, 2019

BlindElephants commented Jun 9, 2019

Mut1nyJD commented Jul 9, 2019

nguyenpbui commented Jun 16, 2020

MaratZakirov commented Sep 3, 2020

Aveturi13 commented Jul 10, 2021

huangzh13 commented Apr 2, 2019 •

edited

Loading

fumoffu947 commented Apr 9, 2019 •

edited

Loading

fumoffu947 commented Apr 9, 2019 •

edited

Loading

talvasconcelos commented Jun 5, 2019 •

edited

Loading