Single Image Super-Resolution with WDSR, EDSR and SRGAN

A Keras-based implementation of

Wide Activation for Efficient and Accurate Image Super-Resolution (WDSR), winner of the NTIRE 2018 super-resolution challenge.
Enhanced Deep Residual Networks for Single Image Super-Resolution (EDSR), winner of the NTIRE 2017 super-resolution challenge.
Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network (SRGAN).

This projects also supports fine-tuning of EDSR models as generators in SRGAN-like networks.

NEWS

Jun 20, 2019

On branch wip-tf2 EDSR, WDSR and SRGAN models have been migrated to Tensorflow 2.0. Jupyter notebooks are provided to demonstrate how to train these models. For SRGAN pre-trained models are available. A bug in the SRGAN discriminator has been fixed which significantly improves SRGAN results.

Weight normalization is now available as layer wrapper in Tensorflow Addons which has been used to implement WDSR models. Also, DIV2K images don't need to be downloaded manually any more but are automatically downloaded by a DIV2K data provider.

Although branch wip-tf2 is work in progress model implementations are complete and can be trained as described in the corresponding papers with the provided Jupyter notebooks. Model and training code has been dramatically simplified, not only because of new features available in Tensorflow 2.0 and Tensorflow Addons.

Environment setup

On a system with a GPU create a new conda environment with^*)

conda env create -f environment-gpu.yml

On a system without a GPU create an environment with

conda env create -f environment-cpu.yml

Activate the environment with

source activate sisr

^*) It is assumed that appropriate CUDA and cuDNN versions for the current tensorflow-gpu version are already installed on your system. These libraries are not automatically installed when using environment-gpu.yml.

Getting started

This section uses pre-trained models to super-resolve images with factor x4.

WDSR

Here, the pre-trained WDSR-A model wdsr-a-32-x4 is used. Click on the link to download the model. It is an experimental model not described in the WDSR paper that was trained with a pixel-wise loss function (mean absolute error). Assuming that the path to the downloaded model is ~/Downloads/wdsr-a-32-x4-psnr-29.1736.h5, the following command super-resolves images in directory ./demo with factor x4 and writes the results to directory ./output:

python demo.py -i ./demo -o ./output --model ~/Downloads/wdsr-a-32-x4-psnr-29.1736.h5

The output directory only contains the super-resolved images. Below are figures that additionally compare the super-resolution (SR) results with the corresponding low-resolution (LR) and high-resolution (HR) images and an x4 resize with bicubic interpolation (code for generating these figures not included yet).

EDSR + SRGAN

A problem with pixel-wise loss functions is that they fail to recover high-frequency details. Super-resolution results are typically overly smooth with lower perceptual quality, especially at scale x4. A perceptual loss as described in the SRGAN paper (a combination of a VGG-based content loss and an adversarial loss) is able to generate more realistic textures with higher perceptual quality but at the cost of lower PSNR values.

An EDSR baseline model that has been fine-tuned as generator in an SRGAN-like network can be downloaded from here. Please note that support for SRGAN training is still work in progress. Assuming that the path to the downloaded model is ~/Downloads/edsr-16-x4-gen-epoch-088.h5, the following command super-resolves the image in directory ./demo/gan with factor x4 and writes the result to directory ./output:

python demo.py -i ./demo/gan -o ./output --model ~/Downloads/edsr-16-x4-gen-epoch-088.h5

The output directory only contains the super-resolved image. The following figure additionally compares the result with that obtained from an EDSR model that has been trained with a pixel-wise loss only (mean squared error). One can clearly see how training with a perceptual loss in a GAN improves recovery of high-frequency content.

Dataset

If you want to train and evaluate models, you need to download the DIV2K dataset and extract the downloaded archives to a directory of your choice (DIV2K in the following example). The resulting directory structure should look like:

DIV2K
  DIV2K_train_HR
  DIV2K_train_LR_bicubic
    X2
    X3
    X4
  DIV2K_train_LR_unknown
    X2
    X3
    X4
  DIV2K_valid_HR
  DIV2K_valid_LR_bicubic
    ...
  DIV2K_valid_LR_unknown
    ...

You only need to download DIV2K archives for those downgrade operators (unknown, bicubic) and super-resolution scales (x2, x3, x4) that you'll actually use for training.

Before the DIV2K images can be used they must be converted to numpy arrays and stored in a separate location. Conversion to numpy arrays dramatically reduces image pre-processing times. Conversion can be done with the convert.py script:

python convert.py -i ./DIV2K -o ./DIV2K_BIN numpy

In this example, converted images are written to the DIV2K_BIN directory. By default, training and evaluation scripts read from this directory which can be overriden with the --dataset command line option.

Training

Pixel-wise loss

WDSR, EDSR and SRResNet^*) models can be trained with a pixel-wise loss function with train.py. Default for WDSR and EDSR is mean absolute error, for SRResNet it is mean squared error. For example, a WDSR-A baseline model with 8 residual blocks can be trained for scale x2 with

python train.py --dataset ./DIV2K_BIN --outdir ./output --profile wdsr-a-8 --scale 2

The --dataset option sets the location of the DIV2K dataset and the --output option the output directory (defaults to ./output). Each training run creates a timestamped sub-directory in the specified output directory which contains saved models, all command line options (default and user-defined) in an args.txt file as well as TensorBoard logs. The super-resolution factor is set with the --scale option. The downgrade operator can be set with the --downgrade option. It defaults to bicubic and can be changed to unknown, bicubic_jpeg_75 or bicubic_jpeg_90 (see also section JPEG compression).

By default, the model is validated against randomly cropped images from the DIV2K validation set. If you'd rather want to evaluate the model against full-sized DIV2K validation images after each epoch you need to set the --benchmark command line option. This however slows down training significantly and makes only sense for smaller models. Alternatively, you can evaluate saved models later with evaluate.py as described in the section Evaluation.

To train models for higher scales (x3 or x4) it is possible to re-use the weights of models pre-trained for a smaller scale (x2). This can be done with the --pretrained-model option. For example,

python train.py --dataset ./DIV2K_BIN --outdir ./output --profile wdsr-a-8 --scale 4 \ 
    --pretrained-model ./output/20181016-063620/models/epoch-294-psnr-34.5394.h5

trains a WDSR-A baseline model with 8 residual blocks for scale x4 re-using the weights of model epoch-294-psnr-34.5394.h5, a WDSR-A baseline model with the same number of residual blocks trained for scale x2.

For a more detailed overview of available command line options and profiles take a look at train.py or run python train.py -h. Section Pre-trained models also shows the training command for each available pre-trained model.

^*) SRResNet is the super-resolution model used in the SRGAN paper.

Perceptual loss (SRGAN)

Training with a perceptual loss as described in the SRGAN paper requires a model that has been pre-trained with a pixel-wise loss. At the moment, only SRResNet and EDSR models at scale x4 are supported for SRGAN training. For example, SRResNet can be pre-trained with

python train.py --dataset ./DIV2K_BIN --profile sr-resnet

An EDSR baseline model that can be used as generator in an SRGAN-like network can be pre-trained with

python train.py --dataset ./DIV2K_BIN --profile edsr-gen --scale 4 --num-res-blocks 16

Selected models from pre-training can then be used as starting point for SRGAN training. For example,

python train_gan.py --dataset ./DIV2K_BIN --generator sr-resnet --label-noise 0.0 \
    --pretrained-model <path-to-pretrained-model>

starts SRGAN training as described in the SRGAN paper using a VGG54 content loss and SRResNet as generator whereas

python train_gan.py --dataset ./DIV2K_BIN --generator edsr-gen --scale 4 --num-res-blocks 16 \
    --pretrained-model <path-to-pretrained-model>

uses an EDSR baseline model with 16 residual blocks as generator. SRGAN training is still work in progress.

Evaluation

An alternative to the --benchmark training option is to evaluate saved models with evaluate.py and then select the model with the highest PSNR. For example,

python evaluate.py --dataset ./DIV2K_BIN -i ./output/20181016-063620/models -o eval.json

evaluates all models in directory ./output/20181016-063620/models and writes the results to eval.json. This JSON file maps model filenames to PSNR values. The evaluate.py script also writes the model with the best PSNR to stdout at the end of evaluation:

Best PSNR = 34.5394 for model ./output/20181016-063620/models/epoch-294-psnr-37.4630.h5

The higher PSNR value in the model filename must not be confused with the value generated by evaluate.py. The PSNR value in the filename was generated during training by validating against smaller, randomly cropped images which tends to yield higher PSNR values.

Pre-trained models

The following list contains available pre-trained models. They were trained with images 1-800 from the DIV2K training set using the specified downgrade operator. Random crops and transformations were made as described in the EDSR paper. Model performance is measured in dB PSNR on the DIV2K validation set (images 801-900, RGB channels, without self-ensemble).

Model	Scale	Residual blocks	Downgrade	Parameters	PSNR	Training
edsr-16-x2¹⁾	x2	16	bicubic	1.37M	34.64 dB	command python train.py --profile edsr-16 \ --scale 2
edsr-16-x4-gen-pre²⁾	x4	16	bicubic	1.52M	28.89 dB	command python train.py --profile edsr-gen \ --scale 4 --num-res-blocks 16
edsr-16-x4-gen³⁾	x4	16	bicubic	1.52M	-	command python train_gan.py --generator edsr-gen \ --scale 4 --num-res-blocks 16 \ --pretrained-model edsr-16-x4-gen-pre-psnr-28.8885.h5
wdsr-a-16-x2⁴⁾	x2	16	bicubic	1.19M	34.68 dB	command python train.py --profile wdsr-a-16 \ --scale 2
wdsr-a-32-x2⁵⁾	x2	32	bicubic	3.55M	34.80 dB	command python train.py --profile wdsr-a-32 \ --scale 2 --res-expansion 6
wdsr-a-32-x4⁵⁾	x4	32	bicubic	3.56M	29.17 dB	command python train.py --profile wdsr-a-32 \ --scale 4 --res-expansion 6 \ --pretrained-model wdsr-a-32-x2-psnr-34.8033.h5
wdsr-b-32-x2⁶⁾	x2	32	bicubic	0.59M	34.63 dB	command python train.py --profile wdsr-b-32 \ --scale 2

¹⁾ EDSR baseline, see also EDSR project page.
²⁾ EDSR baseline pre-trained for usage as generator in an SRGAN-like network.
³⁾ EDSR baseline fine-tuned as generator in an SRGAN-like network.
⁴⁾ WDSR baseline, see also WDSR project page.
⁵⁾ Experimental WDSR-A models trained with an expansion ratio of 6 (default is 4).
⁶⁾ Experimental WDSR-B model.

JPEG compression

There is experimental support for adding JPEG compression artifacts to LR images and training with compressed images. The following commands convert bicubic downscaled DIV2K training and validation images to JPEG images with quality 90:

python convert.py -i ./DIV2K/DIV2K_train_LR_bicubic \
                  -o ./DIV2K/DIV2K_train_LR_bicubic_jpeg_90 \
                   --jpeg-quality 90 jpeg

python convert.py -i ./DIV2K/DIV2K_valid_LR_bicubic \
                  -o ./DIV2K/DIV2K_valid_LR_bicubic_jpeg_90 \
                   --jpeg-quality 90 jpeg

After having converted these JPEG images to numpy arrays, as described in section Dataset, models can be trained with the --downgrade bicubic_jpeg_90 option to additionally learn to recover from JPEG compression artifacts. Two models trained in that manner are available as pre-trained models:

Model

Scale

Residual
blocks

Downgrade

Parameters

PSNR

Training

wdsr-a-32-x2-q90

x2

32

bicubic + JPEG

3.55M

32.12 dB

command

python train.py --profile wdsr-a-32 \
--scale 2 --res-expansion 6 \
--downgrade bicubic_jpeg_90

wdsr-a-32-x4-q90

x4

32

bicubic + JPEG

3.56M

27.63 dB

command

python train.py --profile wdsr-a-32 \
--scale 4 --res-expansion 6 \
--downgrade bicubic_jpeg_90 \
--pretrained-model wdsr-a-32-x2-q90-psnr-32.1198.h5

Weight normalization

WDSR models are trained with weight normalization. This branch uses a modified Adam optimizer for that purpose. The meanwhile outdated branch wip-conv2d-weight-norm instead uses a specialized Conv2DWeightNorm layer and a default Adam optimizer (experimental work inspired by the official WDSR Tensorflow port). Current plan is to replace this layer with a default Conv2D layer and a Tensorflow weight normalization wrapper when the wrapper becomes officially available.

Other implementations

WDSR

Official PyTorch implementation
Official Tensorflow implementation

EDSR

Official PyTorch implementation
Official Torch implementation
Tensorflow implementation by Josh Miller.

Limitations

Code in this project requires the Keras Tensorflow backend.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Single Image Super-Resolution with WDSR, EDSR and SRGAN

NEWS

Jun 20, 2019

Table of contents

Environment setup

Getting started

WDSR

EDSR + SRGAN

Dataset

Training

Pixel-wise loss

Perceptual loss (SRGAN)

Evaluation

Pre-trained models

JPEG compression

Weight normalization

Other implementations

WDSR

EDSR

Limitations

Files

README.md

Latest commit

History

README.md

File metadata and controls

Single Image Super-Resolution with WDSR, EDSR and SRGAN

NEWS

Jun 20, 2019

Table of contents

Environment setup

Getting started

WDSR

EDSR + SRGAN

Dataset

Training

Pixel-wise loss

Perceptual loss (SRGAN)

Evaluation

Pre-trained models

JPEG compression

Weight normalization

Other implementations

WDSR

EDSR

Limitations