You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks a lot for sharing your implementation of ESRGAN.
I have been testing some of the GAN based superresolution network recently. I have got a lot of training HR/LR images and would like to train the ESRGAN (PSNR+ESRGAN) network using your training code.
I have followed your instructions on data preparation and converted my 1,825,587 pairs of LR/HR samples to *bin.tfrecord checked dataset_checker no problem, LR/HR images displayed well, modified few lines of your code for the hardcoded paths etc. and started PSNR training on the RTX3090 GPU. However, the calculated and printed out "loss" is always "nan" in every iteration, and even after "successfully" finished PSNR training, the loss_D and loss_G in ESRGAN training is also shown as "nan".
in psnr training:
...
Training [>> ] 20004/600000, loss=nan, lr=2.0e-04 2.0 step/sec
...
in esrgan training:
...
Training [>>> ] 40000/285240, loss_G=nan, loss_D=nan, lr_G=1.0e-04, lr_D=1.0e-04 1.4 step/sec
[*] save ckpt file at ./checkpoints/esrgan/ckpt-32
Training [>>>> ] 47877/285240, loss_G=nan, loss_D=nan, lr_G=1.0e-04, lr_D=1.0e-04 1.4 step/sec
...
Hello PeteryuX,
Thanks a lot for sharing your implementation of ESRGAN.
I have been testing some of the GAN based superresolution network recently. I have got a lot of training HR/LR images and would like to train the ESRGAN (PSNR+ESRGAN) network using your training code.
I have followed your instructions on data preparation and converted my 1,825,587 pairs of LR/HR samples to *bin.tfrecord checked dataset_checker no problem, LR/HR images displayed well, modified few lines of your code for the hardcoded paths etc. and started PSNR training on the RTX3090 GPU. However, the calculated and printed out "loss" is always "nan" in every iteration, and even after "successfully" finished PSNR training, the loss_D and loss_G in ESRGAN training is also shown as "nan".
in psnr training:
...
Training [>> ] 20004/600000, loss=nan, lr=2.0e-04 2.0 step/sec
...
in esrgan training:
...
Training [>>> ] 40000/285240, loss_G=nan, loss_D=nan, lr_G=1.0e-04, lr_D=1.0e-04 1.4 step/sec
[*] save ckpt file at ./checkpoints/esrgan/ckpt-32
Training [>>>> ] 47877/285240, loss_G=nan, loss_D=nan, lr_G=1.0e-04, lr_D=1.0e-04 1.4 step/sec
...
Do you have any suggestions on this issue?
I here attach the psnr+esrgan parameter files:
psnr.yaml:
batch_size: 64
input_size: 32
gt_size: 128
ch_size: 3
scale: 4
sub_name: 'psnr_pretrain'
pretrain_name: null
network_G:
nf: 64
nb: 23
train_dataset:
path: '/data/EOSC/EOSC_sub_bin.tfrecord'
num_samples: 1825587
using_bin: True
using_flip: True
using_rot: True
test_dataset:
EOSC_path: '/data2/EOSC_test'
niter: 600000
lr: !!float 2e-4
lr_steps: [200000, 300000, 400000, 500000]
lr_rate: 0.5
adam_beta1_G: 0.9
adam_beta2_G: 0.99
w_pixel: 1.0
pixel_criterion: l1
save_steps: 20000
esrgan.yaml:
batch_size: 64
input_size: 32
gt_size: 128
ch_size: 3
scale: 4
sub_name: 'esrgan'
pretrain_name: 'psnr_pretrain'
network_G:
nf: 64
nb: 23
network_D:
nf: 64
train_dataset:
path: '/data/EOSC/EOSC_sub_bin.tfrecord'
num_samples: 1825587
using_bin: True
using_flip: False
using_rot: False
test_dataset:
EOSC_path: '/data2/EOSC_test'
niter: 285240
lr_G: !!float 1e-4
lr_D: !!float 1e-4
lr_steps: [60000, 120000, 180000, 240000]
lr_rate: 0.5
adam_beta1_G: 0.9
adam_beta2_G: 0.99
adam_beta1_D: 0.9
adam_beta2_D: 0.99
w_pixel: !!float 1e-2
pixel_criterion: l1
w_feature: 1.0
feature_criterion: l1
w_gan: !!float 5e-3
gan_type: ragan # gan | ragan
save_steps: 20000
Any help would be much appreciated! Thank you!
The text was updated successfully, but these errors were encountered: